Tsoding's JSON parser VOD -- This is the video that inspired me to do this, And is where I got most of the code to do this.
Part related to what we're doing but not to the actual code part of it. Here I'll be talking about the syntax I'm aiming to parse, I'll explain a little tad about the approach I'll be using, and other snippets that didn't fit into the main part of this page thingy.
; separated by whitespace
<expression> ::= <factor> "+" <expression>
| <factor> "-" <expression>
| <factor>
<factor> ::= <atom> "*" <factor>
| <atom> "/" <factor>
| <atom>
<atom> ::= <number>
| "(" <expression> ")"
| "-" <atom> ; Whitespace sensitive
; whitespace sensitive
where <number> = /([0-9]+(?:\.[0-9]+)?)/ as Float
fig 1.0
Okay... That's the syntax done, Most of it as BNF but numbers are weird so regex works nicely. I'm using A different approach to BNF, the elements (the parts in <brackets>) all terminate as tokens (the parts in "quotes") or the regex. Each element can be seperated by whitespace, but tokens are not; This is something we'll have to keep in mind. BNF is useful because when we write the parser we know how to structure the code. <expression>
is out `metastatement` we start there. whith <number>, "+", "-", "*", "/", "(", and ")" as tokens, the actual text that we'll be reading. We'll be using recursion on the right hand operand wich means we'll end up with right-associative expressions. So 5 + 3 + 2
will be interpreted as 5 + (3 + 2)
. We also explicitly set the precedence of operators to negate as the highest precedence, then multiply and divide, then add and subtract. Or in more graphic form (-x => x * y x / y => x + y x - y), Or 5 + 2 * 3 == 5 + (2 * 3)
and 5 * 2 + 3 == (5 * 2) + 3
God that sounded boring, and unguided. But there's the syntax ramble, and that's all you need to know about the syntax we'll be parsing. Now onto the approach We'll be using. Since we're programming this in Haskell(A purely Functional programming Language) we'll be using Parser Combinators, Basically we'll structuring our parser in terms of how we transform the input text and then we work in the processing into our parser through similar transformations. We can then take advantage of typeclasses in Haskell to make our code a little more readable and take advantage of the built-in libraries.
In short, I don't actually know. I've used parser combinators before (parsec for Haskell, and Pyparsing for python). As i described above, my understanding of them is as transformations. You have transformation a, then transformation b, you can use a combinator to merge them into a new, more complicated, transformation. Before I go further I have to mention how we'll implement it. Our "parsers" Will be functions with the profile of String -> Maybe (String, a)
, so they'll take a string(text), they'll eat up some of it, then they can either return the part of the string they didn't eat and something else, or they can return Nothing (hence the Maybe
). Maybe
is neat since it's a monad so we can use haskells do-notation, so my code feels more natural to me, plus some other things.
Because the parsers might return a tuple of string and something, We can have a parser that eats up a sequence of numbers and then we can fold in a function that converts the string to a number(Float). Now, we want to be able to work these functions into the parsers themselves. That'll allow us to use them after that parser has finished parsing. Well since each parser is a function, we can meld functions in by composing them together. So in haskell terms (a -> b) -> (String -> Maybe (String, a)) -> (String -> Maybe (String, b))
, Or if we define the parser function as it's own type, (a -> b) -> Parser a -> Parser b
. That pattern is recognised in Haskell, There's a function called fmap
which has that pattern... (a -> b) -> f a -> f b
, A function that takes a and returns b, then a something with a, and that will return something with b. What is that something? A Functor, which is a typeclass. We can prove to the haskell compiler that our Parser type is a functor then the compiler knows how to do the fmap function on our type. Plus, I've never used the instance keyword in haskell before, so that's something new.
In the above paragraph we got really close to writing some code down. So let's start with the piecemeal writing. Firstly defining what is a parser... not to you, I've already rambled past that, But to the compiler. We don't have to do this, But it's an easy way to get the <$>
operator, and if we want to use a function that needs our type to be an instance of the Functor typeclass, we can. In the code snippet below[fig 2.0] We do what this paragraph, and the paragraph above, have discussed so far. Anyway, Since our parser type is basically a structure that holds a single function, If we want to use the parser we need to pull that function out of the structure. Because structure stores a function, We can extract the parser function and give it a String without treating it as two function calls. In haskell we call functions by just putting their arguments after the function. So a function call looks like add 5 10
in haskell, but so is add 5
and (add 5) 10
, but add (5 10)
isn't, okay it is but its calling 5 with 10 and then passing the result to add. An equivalent function call in python would be add(5)(10)
. tangent over, We can merge the functionality into one.
newtype Parser a = Parser (String -> Maybe (String, a))
parse :: Parser a -> String -> Maybe (String, a)
parse (Parser p) = p
instance Functor Parser where
fmap f (Parser p) = Parser (\inp -> do
(inp', x) <- p inp
return (inp', f x)
)
fig 2.0
One fun tidbid about haskell is the fact it's a declarative language, So we've written more functionality than you'd expect from one little snippet of code. But our code is useless at the moment. Let's Make More of our Parser. To start lets make a character parser, It takes the string and consumes one character and checks it, if it matches the character it wants it returns what it consumes,but if it fails then it'll return nothing. But, since a parser is just a function, We'll have to make the function that checks and responds, And we'll have to make a function that does that for each time we want to check a character. But, this is haskell, working with functions is what it does best. We can make a function that creates a parser that parses a specific character. We have another thing to consider, what if a parser is given an Empty string, how should it react? For our character parser we'll just return Nothing.
parseChar :: Char -> Parser Char
parseChar c = Parser fnc
where fnc (x:xs) = if x == c then Just (xs, x) else Nothing
fnc [] = Nothing
fig 2.1
Boom, it's that simple. Okay... We don't have the abilty to chain together multiple parsers easily, and We'll have to make a brand new parser from the ground up for each new thing we want to parse. This isn't very good since it's easier to discribe how to parse something in terms/combination of other parsers. It's also more convenient, you can make a parser for one specific pattern and then reuse it whenever you want that pattern again. So, we have one parser already, but we need more flexibility and applyability. We already have an aspect of that due to the functor we defined earlier. We can transform the data our parsers parse, but we can't really do too much. The functor only gives us the ability to meld functions with a profile of a -> b
, This is cool, but we can do better. With how functions work in Haskell we can make a function with the profile a -> b -> c
and then when we meld it in we get a Parser of type Parser (b -> c)
and then we could write a function that took the two parsers and handled everything needed for applying the function and putting the result back into a parser. This would be useful since we could meld any function into a parser.
This pattern, this "functor extending" as it turns out is quite common. To a point that haskell has a special typeclass for these functors. their called applicative functors And they give us a special operator <*>
which allows us to write out how we want these functions melded into a parser, and allowing us to merge more parsers into our final parser. Plus, there are some really neat(or shortcut enabling) functions. So we need to prove to the compiler that our functor is an applicative one. This is easy, but we have to define two functions. One for taking anything and putting it into a parser, and another which is just the extension of our functor. The first one is simple, the second one isn't as obvious. But the difference between <*>
and <$>
is where the function we're melding in comes from. For <$>
the function is just given to it, where <*>
the function is "stored" by a parser. yeah... not to different, though we should keep in mind that the parsers we're melding this function into ARE sequencial, so we need to ferry along the remainder string from parser to parser.
instance Applicative Parser where
-- This just creates a parser that doesn't consume anything and returns a value
pure x = Parser $ \inp -> Just (inp,x)
(Parser p1) <*> (Parser p2) = Parser $ \inp -> do
(inp', f) <- p1 inp -- getting the function from the first Parser
(inp'', a) <- p2 inp'
Just (inp'', f a)
fig 2.2
Not too different from how we proved our parser was an instance of the Functor typeclass. And now we can do a little more with this. There is a function we can use called traverse, which has the profile of (Applicative f, Traversable t) => (a -> f b) -> t a -> f (t b)
The traversable part doesn't matter, for our case it's going to a list and the applicative is a list. We can rewrite the profile as (a -> Parser b) -> [a] -> Parser [b]
. If you havent caught on yet, we can take a function (like our parseChar one) and give it a list of values (like a String) and then it gives us a something that applies our function to the list of values and re-jiggs it so that the result is in an applicative. More in terms of what we're doing, traverse takes our parser generator function and a list of input that generator takes, then produces a parser. The parser it produces is the list version of the parser the'd generator function would produce.
So, after that long paragraph, lets make make our second parser generator function. Not too much work, Because how haskell works we can define functions without putting explicitly the arguments. due to the traverse function works we can just kinda use the result of giving some of the arguments it wants and then using that as our parseString
function;
parseString :: String -> Parser String
parseString = traverse parseChar
fig 2.3
violá, we a have parser generator that takes a string and returns a parser that consumes the string. Neat, isn't it?
Anyway, We're almost done writing our boilerplate code. Before this I was balancing usable code with boilerplate code. We have a few more things to define before we can hit the ground with parser generator after parser generator. So to get there we need to define One More Thing. So far our parsers can only be sequenced If they're seperate arguments to functions. This is useful but not good. We also have no way of redirecting if one parser fails, So our parsers are quite all or nothing. There is a typeclass though which allows us to use <|>
, *>
, and <*
. In english they're like <|> => do this, but if it fails do that
, *> => do this, and if it succeeds do that
, and <* => do this then that, if either fail then also fail
. This is very useful for arbitrarily stringing together parsers and selecting which ones to capture the output of. So, This typeclass is called Alternative
and annoyingly (or funnily enough) it comes from the Control.Applicative
module... yeah, So we can import that and get to proving.
import Control.Alternative -- we'll put this close to the top of the file for tidyness
instance Alternative Parser where
-- this is necessary for a full definition. It Creates a parser that always fails
-- const is a function that takes two arguments and returns the first, we can curry That
-- for to make a function which becomes (String -> Maybe (String, a)) very easily
empty = Parser $ const Nothing
-- now we define <|> and we'll get *> and <* automatically
-- because the function in our parser return a Maybe and Maybe is Alternative
-- we can just piggyback off that
(Parser a) <|> (Parser b) = Parser $ \inp -> a inp <|> b inp
fig 2.4
Okay that was almost as long(line wise) as the BNF[fig 1.0] I gave you earlier. I used Comments to skip some rambling. So Yeah, We're done with proving and more indirect definitions, time to write some parsers...
This subheading is for just the parser generators which will make a brand new function to put in our parser. We had one earlier, the parseChar
generator. We'll be making more parser generators that are like that. These coming subsections are going to be in the format of code, ramble, code, ramble. Coolio, Let's start with something that is relatively useful. It'll be the only one in this section because the rest are more differing applications or combinations of the ones we've already defined
-- This one is like parseChar except we'll check if our input character is an element
-- of a string, using the elem function
parseCharset :: String -> Parser Char
parseCharset s = Parser fnc
where fnc (x:xs) = if elem x s then Just (xs,x) else Nothing
fnc "" = Nothing
fig 3.0
Yeah, That's the entire function, That function got an entire subsection all to itself. It's a lonely function, Not just here, In the code i write it's like three lines away from it's closest neighbour. But that is due to how i wrote it. I wrote it in above a bunch of other functions after the fact and messed up formatting.
These are functions/operators that are purely to manupulate how parsers are applied. This can repitition to how a parsers output relates to another parser, like the operators we got from the functor, applicative, and alternative typeclasses. Our special ones will just relate to list output.
-- we define parseOneMin first because it's easier to do
parseOneMin :: Parser a -> Parser [a]
parseOneMin (Parser p) = Parser scan
where scan inp = do
(inp', x) <- p inp -- parse once
(inp'', r) <- scan inp' <|> Just (inp',[]) -- recurse or just empty
return (inp'', x:r)
-- we can then define parseMany as parseOneMin or return an empty list
-- I know this should be in the next section but it makes more sense here
parseMany :: Parser a -> Parser [a]
parseMany (Parser p) = Parser scan <|> pure []
fig 3.1
(+>) :: Parser a -> Parser [a] -> Parser [a]
(Parser a) +> (Parser b) = Parser $ \inp -> do
(inp', r) <- a inp
(inp'',rs) <- b inp'
return (inp'', r:rs)
(<+) :: Parser [a] -> Parser a -> Parser [a]
(Parser a) <+ (Parser b) = Parser $ \inp -> do
(inp', rs) <- a inp
(inp'', r) <- b inp'
return (inp'', append rs r)
where append (x:xs) a = x : append xs a
append [] a = [a] -- Bubbling A to the end of the list
(++>) :: Parser [a] -> Parser [a] -> Parser [a]
(Parser a) ++> (Parser b) = Parser $ \inp -> do
(inp', ra) <- a inp
(inp'',rb) <- b inp'
return (inp'', ra ++ rb)
fig 3.2
These are useful for concatinating the output of two parsers. Haskells Type system makes these really easy to define in a generic way. We define the relationships between our types and boom, it knows what we mean. And thanks to Maybe, we don't have to worry about it returning Nothing. Just Walking through The operators, +>
prepends the result of Parser 1 to the Result of Parser 2 where <+
appends. ++>
concatenates the two results. Although these don't mind explicitly what types, They're going to be used with strings and characters.
This is where most of the parsers functionality lays. We'll start by parsing floats then the three levels of expression. I'll wrap those up in an interact thingy and boom we're done. It's almost that simple. Okay. Before this most of the parsers we've seen have either been generic, or String based. This time we'll be working with our preferred processing type, A Floating Point Number. This will be a slight change, but once we get Floats flowing through our functions, Our expression solver will be done.
parseFloat :: Parser Float
parseFloat = read <$> (
parseOneMin (parseCharset "0123456789") ++> (
(parseChar '.' +> parseOneMin (parseCharset "0123456789"))
<|> pure ""
)
)
fig 3.3
Okay, So ignoring read
for the time being, how does our float parser parse? Well, First we parse atleast one digit, then we try to parse the dot and the trailing digits. If that parse fails then just pretend it never happened and say we got an empty string. Whatever result we get from the second one is concatenated onto the first digits we parsed. the read
function we ignored earlier is magical. In haskell read has the profile String -> a
meaning we need a way to tell Haskell what read should return then it'll take a string and parse it as that type. look at the code above[fig 3.3] and look for something that tells you this information... See it... I'll explain quickly
Our Parser is a Functor, with a fmap(<$>) profile of
(a -> b) -> Parser a -> Parser b
Read has the profile(ignoring constraints)
String -> a
we're melding Read into our Parser so our "meld" gives us the profile
(String -> b) -> Parser String -> Parser b
Remember we need the resulting parser to be of type `Parser Float`, So with a little inference
(String -> Float) -> Parser String -> Parser Float
meaning read has to parse a float. It must have the profile
String -> Float
fig 3.4
Haskell did all of that inference for us. Just by us saying what parseFloat
should be type-wise, Haskell knows everything it needs to figure out what version of read it needs to use. Anyway, From this point on we'll mostly be using haskells type checker/inferer for making sure our code makes sense type-wise. "If Haskell compiles it, It's most likely typesafe" is how the saying goes. Now onto the actual meat and skin of our Solver, the part we defined in BNF right at the top of this page[fig 1.0] and let's start by declaring the three steps, then we'll define atom. Once we're done with atom the next two, factor and expression, are very similar, SO onto our coding.
-- declaring the three
atom :: Parser Float
factor :: Parser Float
expression :: Parser Float
-- defining atom
atom = parseFloat
<|> (parseChar '(' *> ws *> expression <* ws <* parseChar ')')
<|> (negate <$> (parseChar '-' *> atom))
-- the magic of functors, we can just negate the result of the previous result from atom
fig 3.5
Now that we're working with floats we can make our intents more clear. The next two will just look like "now meld the add function into this parser" and it makes sense. Defining our parser is alternative means it's really easy to select what a parser returns. For our case with (expression)
part we can say "parse this, then parse an expression, then parse that but return the expression". This makes writing the parser a lot easier and we don't clutter our code with temporary variables and several if-bubbles. Which is nice, yn fy marn i.
factor = ((*) <$> atom <* ws <* parseChar '*' <* ws <*> factor) -- these recurse on the right operand
<|> ((/) <$> atom <* ws <* parseChar '/' <* ws <*> factor)
<|> atom -- fallback
expression = ((+) <$> factor <* ws <* parseChar '+' <* ws <*> expression)
<|> ((-) <$> factor <* ws <* parseChar '-' <* ws <*> expression)
<|> factor -- fallback
fig 3.6
There we go, We have everything we need to finish making our solver complete. Anyway, a little explaining i should've done much earlier. We're doing a little recursion to allow us to string multiple operations of the same precedence without parentheses. You could've just done a + ( b + c )
, but this recursive element does it for us. Both factor
and expression
are very similar because they're basically from one bigger function.But predictability is nice, and from maths lessons we natively expect some level of precedence between the operators. So it's easy to split up the function and nest one part inside the other. Now, with most things if you cut it in half and put one half into the other half, you'd be sent to prison. But functions don't have rights and, as far as we know, they don't have feelings; so it's okay.
Anyway, After about 103-ish lines of text, about 93-ish lines of code. So, we now need code that splits up our input into lines and then parses, solves, each line. Then we need to convert the result from a float to a string then we zip up the results back into a single block of text rather than a list of strings and then display them. That must take up a lot of lines, It might be really bulky... right?..
main = interact $ unlines . map (show . parse expression) . lines
fig 3.7
... ah, One line... A single Line... One wafer thin mint. The function that makes this possible is the interact
function. It basically turns your program into a blackbox between stdin and stdout. As far as we're concerned, `interact` has the profile (String -> String) -> IO ()
which basically is haskell speak for "gimme a function and I'll produce some **effect" so the expression on the other side of the $
has to resolve to some sort of function with the profile String -> String
I want to apply the expression solver on each line of text. I'll go through each step of the thing in a pre block for simplicity
-- we start off by splitting up our text into lines
f :: String -> [String]
f = lines
-- then for each line we solve it (by parsing it)
f :: String -> [Float]
f = map (parse expression) . lines
-- then we convert the floats to strings
f :: String -> [String]
f = map (show . parse expression) . lines
-- then we recombine the lines by putting a newline between them
f :: String -> String
f = unlines . map (show . parse expression) . lines
Once you look at it in steps, programming in Haskell is quite similar to programming in procedural languages, in some ways. But yeah... We're done... you can find the source code here, and uhmm yeah.
This was going to a conclusions section but due to personal events, mainly procrastination, but I've made similar Solver since and I just have notes about this little mini project. So, I'll give them as a list
ParseMany
and parseOneMin
can be replaced with the base many
and some
respectively. I've basically updated them to them in the when i made the gist. Up until
I still have no way to explain my Haskell code properly.
both +>
and ++>
can be done with a single expression if i had remembered that applicatives exist. Below is the re-write I came up with
(+>) :: Parser a -> Parser [a] -> Parser [a]
a +> b = (:) <$> a <*> b
(++>) :: Parser [a] -> Parser [a] -> Parser [a]
a ++> b = (++) <$> a <*> b
which is shorter and a little more terse.
Implementing error reporting is actually quite difficult... So this parser will return Nothing for each line it fails on 'till the end of time