Deliver to Romania
IFor best experience Get the App
Parsing Techniques: A Practical Guide (Monographs in Computer Science)
M**L
Non-technical details of all the major parsing techniques and their implementations
Often parsing is presented and studied only in a larger context of compiler construction. Usually this only exposes one to a few specialized parsing techniques, such as LALR(1), which are useful in parsing programming languages, but are inadequate at handling general context free languages. As a result, parsing techniques haven't been embraced by a wider circle of programmers outside those working on compilers, and are usually not used for parsing in every-day projects.This is a pity, since parsing algorithms have a much wider applicability than just interpreting source code. It is usual that, when confronted with a parsing problem, one tends to write a specialized ad-hock parser from scratch every time. This is not only time consuming, but is also error prone, not easily extendable and leads to code that is hard to maintain and debug. If some part of the input changes, one often has to make non-trivial changes to the original code in multiple places.Instead, one has to realize that there are three major parts to any parsing problem, only two of which are domain specific. Tokenization (or lexer) is the first step, where the input is broken into independent tokens, possibly with attached values. Tokenization helps to abstract your input into units of data that can't be broken any further. This part is dependent on the problem, and there are no general algorithms that can solve it. The second part is writing the grammar for your input. The grammar defines the structure or the syntax on the incoming token stream - what combinations of tokens are allowed to appear. It is also domain specific, but no code has to be written in this step. Finally, the third step is to apply a general parsing algorithm to your grammar and token stream. Instead of implementing the algorithm from scratch, the user just has to define a couple of semantic actions to execute for each production.When one writes ad-hock parsing code for each problem from scratch, all three steps - tokenization, syntactic analysis, and semantic analysis - are conflated into one large piece of code, usually in an obfuscated way, that is hard to debug and maintain. On the other hand, when the three steps are clearly separated, there is less domain specific code to write. The code becomes clearer and easier to maintain. Furthermore, it now becomes much more extendable by just a few tweaks of the grammar or the tokenizer, and is much easier to share between different parser modules. The actual structural analysis of the input, which is usually the trickiest to write and is a source of major bugs, is now offloaded to a well known algorithm that has been thoroughly tested and is shared between various projects.The ability to break the problem this way, and choosing the right parsing algorithm that is fast, but at the same time general enough to handle your grammar requires knowledge of various parsing techniques that are usually not taught in undergraduate CS curriculum.This is where this book comes in. It provides an accessible, non-technical introduction to parsing, in sufficient detail for one to understand and use these formal parsing techniques in everyday projects.In it's full generality, parsing can be viewed as a graph search problem, where the nodes are the sentential forms (streams of terminals and non-terminals), and the edges are productions for each non-terminal of a node. This is a useful abstraction, but quickly becomes impractical since in general the search requires exponential time. To deal with this, various methods and heuristics have been developed over the past 50 years that speed up the process, but at the same time narrow the set of languages that can be accepted by them.Grune and Jacobs book starts out with this general view, and then narrows down to more specialized and less general but faster methods. They divide parsing algorithms into a few general categories: top-down vs bottom-up, directional (on-line) vs non-directional (where whole input must be known ahead of time), and deterministic vs non-deterministic (where back tracking is necessary). They devote a chapter on each combination of these categories. They start out with the general exponential time depth first and breadth first back-tracking parsers, followed by CYK and Earley parsers that can parse any context free grammar in polynomial time, and then go on to linear time LL, LR, and LALR methods that are used in compilers. Along the way they cover regular languages and their parsers which are used to implement regular expressions, as well as generalized LL and LR techniques, that can parse any context free grammar but still retain the speed of LL and LR parsers for most practical grammars. There are also chapters on more esoteric topics that might be of interest only to select few. They include non-canonical parsers, parsers as intersection of context free and regular languages, and context sensitive grammars. On the other hand, there are also chapters on the applied aspects of parsers: error handling, parallel parsing, and practical parser writing.The language of the book is mostly non-technical. It seems like there has been great effort made to keep CS jargon to the minimum and to explain everything in simple terms. Most of the concepts are presented as examples first, followed by general description. There is almost no pseudo code, most of the algorithms are described in plain English. The authors do a surprisingly good job at that, leaving few ambiguities and at the same time not bogging down the readers with unnecessary details. This makes the book suitable for gentle night time reading, without a need for paper and pencil at hand. The material is presented in sufficient detail for basic implementation of most of the presented algorithms, and there are plenty of references to consult for extensions.I really liked the first half of the book. It was clear and easy to understand. However things do speed up considerably in the second half.Explanations become terser, there are fewer details, and sometimes it requires several re-reading of each section to fully absorb the material. I think the authors wanted to cram in more material towards the end and had to make some sacrifices in presentation.There are some additional things that I think could have been done better. Sometimes the authors use terms that haven't been fully defined yet and only define them a few sections further. This makes it a little hard to read, since you're constantly wondering if you have somehow missed the definition earlier. I also wish that more connections to graph theory were presented. For example, there are lots of algorithms that could be viewed as a graph search. The authors do allude to that but never mention it explicitly. I wish that they would have stated clearly what the nodes and edges of those graphs are.Some of their examples are too simplistic and may lead the reader to believe that the algorithms are simpler than they really are. For example, the authors spend only two paragraphs on parse tree construction from the Earley parser and their example makes it look like the algorithm is deterministic. In reality tree construction for the Earley parser requires back tracking and I had to consult outside sources to understand that. In addition, I wish the authors have spent more time giving examples of the LALR(1) parser. This is a very widely used method (it's used in Unix's yacc), but it's limitation are not well understood by beginners. It's difficult for someone new to it to understand why their grammars produce conflicts and are not parsable using LALR, and I wish the authors spent more time on explaining how to detect those conflicts in various grammars.Overall, I think this is a great overview of the parsing landscape, with enough detail to understand, use, and even implement some of the methods. Every serious programmer should be aware that these techniques and algorithms are available to them, and be ready to at least consider them as a possibility the next time they're thinking of writing a parser by hand. Should the reader be interested in implementing their own parsing algorithm, there is a whole chapter devoted to bibliography, with a short description for each reference. Equipped with the background provided by this book the reader can go on to explore the various books or papers on parsing and should have no trouble understanding their content.
R**V
Absolutely recommend it, easier than any other LR/LALR explanation
I will talk only about LR/LALR algorithms coverage - this is my personal focus. I'm the creator of Irony parser kit, and this book was my guidance in implementation of efficient LALR automaton construction. The Dragon book is quite dated today, there had been much new research and advancement in parsing algorithm since dragon times. Like DeRemer/Penello algorithm and its variations - it is much more efficient than original method described in Dragon book. I first tried to understand DeRemer/Penello reading academic papers, but did not quite succeed. Then I discovered this book - it was my savior! The algorithm is carefully explained on several pages, step by step, using clear and helpful diagrams. I carefully reproduced the algorithm in code, and got almost 10-fold perf increase compared to old dragon method.Note that the book covers many other algorithms and parser types, and I'm sure readers interested in them would find excellent coverage. This is one of the nice things about this book - you can pick up a chapter covering your particular area, and read it, without need to read the entire book from start to end. It works as an excellent detailed reference on wide variety of parsing techniques.
J**N
The clearest, most comprehensive survey of the field
I have spent the last six months of my life learning as much as I can about parsing. I own half a shelf of compiler books, and I have flipped through the pages of half a shelf more.No other book approaches the clarity and comprehensiveness of this book.When you try to read most literature about parsing, authors tend to throw around a lot of terms without explaining them. What exactly is a "deterministic" parser, a "canonical" parser, a "directional" parser? Grune and Jacobs explain every one of these distinctions lucidly, and put all known algorithms in context of how they compare to the rest of the field. How do the algorithms compare in what languages they can parse, how fast they are, and how much of the work can be done ahead of time? The book addresses all of these trade-offs, but doesn't stop at asymptotic complexity: in chapter 17 (the comparative survey), they note that general parsers may be a factor of ten or so slower than deterministic methods, even though both are linear. This high-level overview and comparative survey are something I was desperately seeking, and I've found nothing comparable to them anywhere.There is also a lot of important background information that other authors tend to assume you know: for example, did you know that when authors say "LL" they almost always mean "strong LL" unless they specifically say "full LL?" Are you totally clear on the difference between strong LL, simple LL, and full LL? If you're not sure, Grune and Jacobs will give you all the explanation you need to fully understand.This book strikes a perfect balance between breadth and depth. All significant algorithms are covered, most with enough detail to fully understand and implement them, but Grune and Jacobs punt on less practical material like proofs or rigorous formal descriptions. That information is never more than a citation away though, thanks to the 417-entry annotated bibliography, which gives you not only references to source material but a paragraph or two describing their key results.I couldn't be happier about adding this book to my bookshelf of compiler books -- it quickly became the book I refer to most often, and I thank Grune and Jacobs for this superb guide to this vast and diverse field of computer science.
B**L
Five Stars
Thanks!
S**G
Great text book but out of date
After reading this I assumed that left recursion in recursive decent parsers/PEG parsers is impossible. Many different techniques exist and many papers have been written on the subject.Lots of papers have been written on PEG parsers too but this book only briefly mentions PEG parsers.
A**O
Very good practical book
This is a practical book. All subjects are well explained. It covers almost everything it is supposed about parsing techniques. This is not for you, if you are searching for something theoretical and with demonstrations.
Trustpilot
1 month ago
2 weeks ago