<div class="xblock xblock-public_view xblock-public_view-vertical" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="vertical" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="VerticalStudentView" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@vertical+block@Reading_4_Objectives">
<h2 class="hd hd-2 unit-title">Reading 4 Objectives</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_7f24ae660515">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="html" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="XBlockToXModuleShim" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_7f24ae660515">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<link href="/assets/courseware/v1/b2188f0a95a7a9062748278f7d5a584f/asset-v1:MITx+6.005.2x+1T2017+type@asset+block/syntax-highlighting.css" rel="stylesheet" type="text/css" />
<h4>Software in 6.005</h4>
<table class="table table-striped no-markdown">
<tbody>
<tr>
<th width="33%">Safe from bugs</th>
<th>Easy to understand</th>
<th>Ready for change</th>
</tr>
<tr>
<td>
Correct today and correct in the unknown future.
</td>
<td>
Communicating clearly with future programmers, including future you.
</td>
<td>
Designed to accommodate change without rewriting.
</td>
</tr>
</tbody>
</table>
<h4>Objectives</h4>
<p>After today's class, you should:</p>
<ul>
<li>Be able to use a grammar in combination with a parsing library to parse a character sequence into a parse tree</li>
<li>Be able to convert a parse tree into a useful data type</li>
</ul>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="vertical" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="VerticalStudentView" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@vertical+block@vertical-parser-generators">
<h2 class="hd hd-2 unit-title">Parser Generators</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_5fcbddbfb798">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="html" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="XBlockToXModuleShim" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_5fcbddbfb798">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<link href="/assets/courseware/v1/b2188f0a95a7a9062748278f7d5a584f/asset-v1:MITx+6.005.2x+1T2017+type@asset+block/syntax-highlighting.css" rel="stylesheet" type="text/css" />
<h2 id="parser_generators">Parsers</h2>
<div data-outline="parser_generators">
<p>A <em>parser generator</em> is a good tool that you should make part of your toolbox. A parser generator takes a grammar as input and automatically generates source code that can parse streams of characters using the grammar. </p>
<p>The generated code is a <em>parser</em>, which takes a sequence of characters and tries to match the sequence against the grammar. The parser typically produces a <em>parse tree</em>, which shows how grammar productions are expanded into a sentence that matches the character sequence. The root of the parse tree is the starting nonterminal of the grammar. Each node of the parse tree expands into one production of the grammar. We'll see how a parse tree actually looks in the next section.</p>
<p>The final step of parsing is to do something useful with this parse tree. We're going to translate it into a value of a recursive data type. Recursive abstract data types are often used to represent an expression in a language, like HTML, or Markdown, or Java, or algebraic expressions. A recursive abstract data type that represents a language expression is called an <em>abstract syntax tree</em> (AST).</p>
<p>For this class, we are going to use ParserLib, a parsing library for Java that we have developed specifically for 6.005. The library is similar in spirit to more widely used parser generators like <a href="http://www.antlr.org/">ANTLR</a>, but it has a simpler interface and is generally easier to use.</p>
</div>
<h2 id="an_antlr_grammar">A ParserLib Grammar</h2>
<div>
<p>Here is what our HTML grammar looks like as a ParserLib source file:</p><div class="pull-margin"><pre><code class="language-python hljs">root ::= html;
html ::= ( italic | normal ) *;
italic ::= <span class="hljs-string">'<i>'</span> html <span class="hljs-string">'</i>'</span>;
normal ::= text;
text ::= [^<>]+; /* represents a string of one <span class="hljs-keyword">or</span> more characters that are <span class="hljs-keyword">not</span> < <span class="hljs-keyword">or</span> > */</code></pre></div><p>Let’s break it down. </p><p>Each ParserLib rule consists of a name, followed by a <code>::=</code>, followed by its definition, terminated by a semicolon. The ParserLib grammar can also include Java-style comments, both single line and multiline.</p><p>By convention, we use lower-case for nonterminals: <code>root</code>, <code>html</code>, <code>normal</code>, <code>italic</code>, <code>text</code>.
The ParserLib library is actually case insensitive with respect to nonterminal names; internally, it canonicalizes
names to all-lowercase, so even if you don’t write all your names into lowercase, you will see them as
lowercase when you print your grammar.
Terminals are either quoted strings, like <code>'<i>'</code>, or regular expressions, like <code>[^<>]+</code>.</p><pre><code class="language-python hljs">root ::= html;</code></pre><p><code>root</code> is the entry point of the grammar. This is the nonterminal that the whole input needs to match. We don’t have to call it <code>root</code>. When loading the grammar into our program, we will tell the library which nonterminal to use as the entry point.</p><pre><code class="language-python hljs">html ::= ( normal | italic ) *;</code></pre><p>This rule shows that ParserLib rules can have the alternation operator <code>|</code>, the repetition operators <code>*</code> and <code>+</code>, and parentheses for grouping, in the same way we’ve been using in the <a href="/courses/course-v1:MITx+6.005.2x+1T2017/jump_to_id/vertical-grammars">grammars reading</a>. Optional parts can be marked with <code>?</code>, just like we did earlier, but this particular grammar doesn’t use <code>?</code>.</p><pre><code class="language-python hljs">italic ::= <span class="hljs-string">'<i>'</span> html <span class="hljs-string">'</i>'</span>;
normal ::= text;
text ::= [^<>]+;</code></pre><p>Note that the <code>text</code> production uses the notation <code>[^<>]</code> from the last reading, to represent all characters except <code><</code> and <code>></code>.</p>
<p>In general, terminals do not have to be a fixed string; they can be a regular expression as in the example. For example, here are some other terminal patterns we used in the URL grammar earlier in the reading, now written in ParserLib syntax:</p><pre><code class="language-python hljs">identifier ::= [a-z]+;
integer ::= [<span class="hljs-number">0</span><span class="hljs-number">-9</span>]+;</code></pre>
<h2 id="whitespace">Whitespace</h2>
<div data-outline="whitespace"><p>Consider the grammar shown below.</p><pre><code class="language-python hljs">root ::= sum;
sum ::= primitive (<span class="hljs-string">'+'</span> primitive)*;
primitive ::= number | <span class="hljs-string">'('</span> sum <span class="hljs-string">')'</span>;
number ::= [<span class="hljs-number">0</span><span class="hljs-number">-9</span>]+;</code></pre><p>This grammar will accept an expression like <code>42+2+5</code>, but will reject a similar expression that has any spaces between the numbers and the <code>+</code> signs. We could modify the grammar to allow white space around the plus sign by modifying the production rule for <code>sum</code> like this:</p><pre><code class="language-python hljs">sum ::= primitive (whitespace* <span class="hljs-string">'+'</span> whitespace* primitive)*;
whitespace ::= [ \t\r\n];</code></pre><p>However, this can become cumbersome very quickly once the grammar becomes more complicated. ParserLib allows a shorthand to indicate that certain kinds of characters should be skipped. </p><pre><code class="language-python hljs">//The IntegerExpression grammar
<span class="hljs-meta">@skip whitespace{</span>
root ::= sum;
sum ::= primitive (<span class="hljs-string">'+'</span> primitive)*;
primitive ::= number | <span class="hljs-string">'('</span> sum <span class="hljs-string">')'</span>;
}
whitespace ::= [ \t\r\n];
number ::= [<span class="hljs-number">0</span><span class="hljs-number">-9</span>]+;</code></pre><p>The <code>@skip whitespace</code> notation indicates that any text matching the whitespace nonterminal should be skipped in between the parts that make up the definitions of <code>sum</code> <code>root</code> and <code>primitive</code>. Two things are important to note. First, there is nothing special about <code>whitespace</code>. The <code>@skip</code> directive works with any nonterminal defined in the grammar. Second, note how the definition of <code>number</code> was intentionally left outside the <code>@skip</code> block. This is because we want to accept expressions like <code>42 + 2 + 5</code>, but we want to reject expressions like <code>4 2 + 2 + 5</code>. In the rest of the text, we refer to this grammar as the <em>IntegerExpression</em> grammar.</p></div>
</div>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="vertical" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="VerticalStudentView" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@vertical+block@Generating_the_parser">
<h2 class="hd hd-2 unit-title">Generating the parser</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_a612d6aa44a1">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="html" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="XBlockToXModuleShim" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_a612d6aa44a1">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<link href="/assets/courseware/v1/b2188f0a95a7a9062748278f7d5a584f/asset-v1:MITx+6.005.2x+1T2017+type@asset+block/syntax-highlighting.css" rel="stylesheet" type="text/css" />
<h2 id="generating_the_parser">Generating the parser</h2>
<div data-outline="generating_the_parser"><p>The rest of this reading will use as a running example the <em>IntegerExpression</em> grammar defined earlier, which we’ll store in a file called <code>IntegerExpression.g</code>. </p>
<p>The <a href="/assets/courseware/v1/1ef434a83e589f315a433298cef627c0/asset-v1:MITx+6.005.2x+1T2017+type@asset+block/parserlib-api.pdf">ParserLib parser generator tool</a> converts a grammar source file like <code>IntegerExpression.g</code> into a parser. In order to do this, you need to follow three steps. First, you need to import the ParserLib library, which resides in a package <code>lib6005.parser</code>:</p><pre><code class="language-java hljs"><span class="hljs-keyword">import</span> lib6005.parser;</code></pre><p>The second step is to define an <code>Enum</code> type that contains all the terminals and non-terminals used by your grammar. This will tell the compiler which definitions to expect in the grammar and will allow it to check for any missing ones.</p><pre><code class="language-java hljs"><span class="hljs-keyword">enum</span> IntegerGrammar {ROOT, SUM, PRIMITIVE, NUMBER, WHITESPACE};</code></pre><p>Note that ParserLib itself is case insensitive, but by convention, the names of <code>enum</code> values are all upper case. </p><p>From within your code, you can create a parser by calling the <code>compile</code> static method in <code>GrammarCompiler</code>.</p><pre><code class="language-java hljs">...
Parser<IntegerGrammar> parser =
GrammarCompiler.compile(<span class="hljs-keyword">new</span> File(<span class="hljs-string">"IntegerExpression.g"</span>), IntegerGrammar.ROOT);</code></pre><p>This code opens the file <code>IntegerExpression.g</code> and compiles it using the <code>GrammarCompiler</code> into a <code>Parser</code> object. The <code>compile</code> method takes as a second argument the name of the nonterminal to use as the entry point of the grammar; <code>root</code> in the case of this example.</p><p>Assuming you don’t have any syntax errors in your grammar file, the result will be a <code>Parser</code> object that can be used to parse text in either a string or a file. Notice that the Parser is a <em>generic</em> type that is parameterized by the <code>enum</code> you defined earlier.</p></div>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="vertical" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="VerticalStudentView" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@vertical+block@vertical_Questions_42478e77160d">
<h2 class="hd hd-2 unit-title">Questions</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@Parsing">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="html" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="XBlockToXModuleShim" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@Parsing">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<link href="/assets/courseware/v1/b2188f0a95a7a9062748278f7d5a584f/asset-v1:MITx+6.005.2x+1T2017+type@asset+block/syntax-highlighting.css" rel="stylesheet" type="text/css" />
<script src="/assets/courseware/v1/c9cee8830885f59e75b8782f44026226/asset-v1:MITx+6.005.2x+1T2017+type@asset+block/edx-script-v1.js"></script>
</div>
</div>
<div class="vert vert-1" data-id="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-parsing-50">
<div class="xblock xblock-public_view xblock-public_view-problem xmodule_display xmodule_ProblemBlock" data-has-score="True" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="problem" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="XBlockToXModuleShim" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-parsing-50">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "Problem"}
</script>
<div id="problem_04-parsing-50" class="problems-wrapper" role="group"
aria-labelledby="04-parsing-50-problem-title"
data-problem-id="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-parsing-50" data-url="/courses/course-v1:MITx+6.005.2x+1T2017/xblock/block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-parsing-50/handler/xmodule_handler"
data-problem-score="0"
data-problem-total-possible="1"
data-attempts-used="0"
data-content="
<h3 class="hd hd-3 problem-header" id="04-parsing-50-problem-title" aria-describedby="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-parsing-50-problem-progress" tabindex="-1">
Parse trees
</h3>
<div class="problem-progress" id="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-parsing-50-problem-progress"></div>
<div class="problem">
<div>
<p>Which of the following statements are true of a parse tree?</p>
<div class="wrapper-problem-response" tabindex="-1" aria-label="Question 1" role="group"><div class="choicegroup capa_inputtype" id="inputtype_04-parsing-50_2_1">
<fieldset aria-describedby="status_04-parsing-50_2_1">
<div class="field">
<input type="checkbox" name="input_04-parsing-50_2_1[]" id="input_04-parsing-50_2_1_choice_0" class="field-input input-checkbox" value="choice_0"/><label id="04-parsing-50_2_1-choice_0-label" for="input_04-parsing-50_2_1_choice_0" class="response-label field-label label-inline" aria-describedby="status_04-parsing-50_2_1"> the root node of the tree corresponds to the starting symbol of the grammar
</label>
</div>
<div class="field">
<input type="checkbox" name="input_04-parsing-50_2_1[]" id="input_04-parsing-50_2_1_choice_1" class="field-input input-checkbox" value="choice_1"/><label id="04-parsing-50_2_1-choice_1-label" for="input_04-parsing-50_2_1_choice_1" class="response-label field-label label-inline" aria-describedby="status_04-parsing-50_2_1"> the leaves of the tree correspond to terminals
</label>
</div>
<div class="field">
<input type="checkbox" name="input_04-parsing-50_2_1[]" id="input_04-parsing-50_2_1_choice_2" class="field-input input-checkbox" value="choice_2"/><label id="04-parsing-50_2_1-choice_2-label" for="input_04-parsing-50_2_1_choice_2" class="response-label field-label label-inline" aria-describedby="status_04-parsing-50_2_1"> the internal nodes of the tree correspond to nonterminals
</label>
</div>
<div class="field">
<input type="checkbox" name="input_04-parsing-50_2_1[]" id="input_04-parsing-50_2_1_choice_3" class="field-input input-checkbox" value="choice_3"/><label id="04-parsing-50_2_1-choice_3-label" for="input_04-parsing-50_2_1_choice_3" class="response-label field-label label-inline" aria-describedby="status_04-parsing-50_2_1"> only a grammar with recursive productions can generate a parse tree
</label>
</div>
<div class="field">
<input type="checkbox" name="input_04-parsing-50_2_1[]" id="input_04-parsing-50_2_1_choice_4" class="field-input input-checkbox" value="choice_4"/><label id="04-parsing-50_2_1-choice_4-label" for="input_04-parsing-50_2_1_choice_4" class="response-label field-label label-inline" aria-describedby="status_04-parsing-50_2_1"> a parse tree is the same as an abstract syntax tree
</label>
</div>
<span id="answer_04-parsing-50_2_1"/>
</fieldset>
<div class="indicator-container">
<span class="status unanswered" id="status_04-parsing-50_2_1" data-tooltip="Not yet answered.">
<span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/>
</span>
</div>
</div></div>
<div class="solution-span">
<span id="solution_04-parsing-50_solution_1"/>
</div></div>
<div class="action">
<input type="hidden" name="problem_id" value="Parse trees" />
<div class="submit-attempt-container">
<button type="button" class="submit btn-brand" data-submitting="Submitting" data-value="Submit" data-should-enable-submit-button="True" aria-describedby="submission_feedback_04-parsing-50" >
<span class="submit-label">Submit</span>
</button>
<div class="submission-feedback" id="submission_feedback_04-parsing-50">
<span class="sr">Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.</span>
</div>
</div>
<div class="problem-action-buttons-wrapper">
</div>
</div>
<div class="notification warning notification-gentle-alert
is-hidden"
tabindex="-1">
<span class="icon fa fa-exclamation-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="04-parsing-50-problem-title">
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification warning notification-save
is-hidden"
tabindex="-1">
<span class="icon fa fa-save" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="04-parsing-50-problem-title">None
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification general notification-show-answer
is-hidden"
tabindex="-1">
<span class="icon fa fa-info-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="04-parsing-50-problem-title">Answers are displayed within the problem
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
</div>
"
data-graded="True">
<p class="loading-spinner">
<i class="fa fa-spinner fa-pulse fa-2x fa-fw"></i>
<span class="sr">Loading…</span>
</p>
</div>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="vertical" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="VerticalStudentView" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@vertical+block@Calling_the_parser">
<h2 class="hd hd-2 unit-title">Calling the parser</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_df8b7d6f7f91">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="html" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="XBlockToXModuleShim" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_df8b7d6f7f91">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<link href="/assets/courseware/v1/b2188f0a95a7a9062748278f7d5a584f/asset-v1:MITx+6.005.2x+1T2017+type@asset+block/syntax-highlighting.css" rel="stylesheet" type="text/css" />
<h2 id="calling_the_parser">Calling the parser</h2>
<div data-outline="calling_the_parser"><p>Now that you’ve generated the parser object, you are ready to parse your own text. The parser has a method called <code>parse</code> that takes in the text to be parsed (in the form of either a <code>String</code>, an <code>InputStream</code>, a <code>File</code> or a <code>Reader</code>) and returns a <code>ParseTree</code>.
Calling it produces a parse tree:</p><pre><code class="language-java hljs">ParseTree<IntegerGrammar> tree = parser.parse(<span class="hljs-string">"5+2+3+21"</span>);</code></pre><p>Note that the <code>ParseTree</code> is also a generic type that is parameterized by the enum type <code>IntegerGrammar</code>. </p><p>For debugging, we can then print this tree out:</p><pre><code class="language-java hljs">System.out.println(tree.toString());</code></pre><p>You can also try calling the method <code>display()</code> which will attempt to open a browser window that will show you
a visualization of your parse tree. If for any reason it is not able to open the browser window, the method
will print a URL to the terminal which you can copy and paste to your browser to view the visualization.</p></div>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="vertical" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="VerticalStudentView" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@vertical+block@Constructing_an_abstract_syntax_tree">
<h2 class="hd hd-2 unit-title">Constructing an abstract syntax tree</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_e1f62858a40f">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="html" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="XBlockToXModuleShim" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_e1f62858a40f">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<link href="/assets/courseware/v1/b2188f0a95a7a9062748278f7d5a584f/asset-v1:MITx+6.005.2x+1T2017+type@asset+block/syntax-highlighting.css" rel="stylesheet" type="text/css" />
<h2 id="traversing_the_parse_tree">Traversing the parse tree</h2>
<div data-outline="traversing_the_parse_tree"><p>So we’ve used the parser to turn a stream of characters into a parse tree, which shows how the grammar matches the stream.
Now we need to do something with this parse tree.
We’re going to translate it into a value of a recursive abstract data type.</p><p>The first step is to learn how to traverse the parse tree. The <code>ParseTree</code> object has four methods that you need to be most familiar with. </p><pre><code class="language-java hljs"><span class="hljs-comment handout-javadoc-comment">/**
* Returns the substring of the original string that corresponds to this parse tree.
* <span class="hljs-doctag">@return</span> String containing the contents of this parse tree.
*/</span>
<span class="hljs-function"><span class="hljs-keyword">public</span> String <span class="hljs-title">getContents</span><span class="hljs-params">()</span>
<span class="hljs-comment handout-javadoc-comment">/**
* Ordered list of all the children nodes of this ParseTree node.
* @return a List of all children of this ParseTree node, ordered by position in input
*/</span>
<span class="hljs-keyword">public</span> List<ParseTree<Symbols>> <span class="hljs-title">children</span><span class="hljs-params">()</span>
<span class="hljs-comment handout-javadoc-comment">/**
* Tells you whether a node corresponds to a terminal or a non-terminal.
* If it is terminal, it won't have any children.
* @return true if it is a terminal value.
*/</span>
<span class="hljs-keyword">public</span> <span class="hljs-keyword">boolean</span> <span class="hljs-title">isTerminal</span><span class="hljs-params">()</span>
<span class="hljs-comment handout-javadoc-comment">/**
* Get the symbol for the terminal or non-terminal corresponding to this parse tree.
* @return T will generally be an Enum representing the different symbols
* in the grammar, so the return value will be one of those.
*/</span>
<span class="hljs-keyword">public</span> Symbols <span class="hljs-title">getName</span><span class="hljs-params">()</span>
</span></code></pre><p>Additionally, you can query the ParseTree for all children that match a particular production rule:</p><pre><code class="language-java hljs"><span class="hljs-comment handout-javadoc-comment">/**
* Get all the children of this ParseTree node corresponding to a particular production rule
* <span class="hljs-doctag">@param</span> name
* Name of the non-terminal corresponding to the desired production rule.
* <span class="hljs-doctag">@return</span>
* List of children ParseTree objects that match that name.
*/</span>
<span class="hljs-keyword">public</span> List <ParseTree<Symbols>> childrenByName(Symbols name);</code></pre><p>Note that like the <code>Parser</code> itself, the <code>ParseTree</code> is also parameterized by the type of the <code>Symbols</code>, which is expected to
be an <code>enum</code> type that lists all the symbols in the grammar.</p><p>The <code>ParseTree</code> implements the iterable inerface, so you can iterate over all the children using a <code>for</code> loop.
One way to visit all the nodes in a parse tree is to write a recursive function. For example, the recursive
function below prints all nodes in the parse tree with proper indentation.</p><pre><code class="language-java hljs"><span class="hljs-comment handout-javadoc-comment">/**
* Traverse a parse tree, indenting to make it easier to read.
* <span class="hljs-doctag">@param</span> node
* Parse tree to print.
* <span class="hljs-doctag">@param</span> indent
* Indentation to use.
*/</span>
<span class="hljs-function"><span class="hljs-keyword">void</span> <span class="hljs-title">visitAll</span><span class="hljs-params">(ParseTree<IntegerGrammar> node, String indent)</span></span>{
<span class="hljs-keyword">if</span>(node.isTerminal()){
System.out.println(indent + node.getName() + <span class="hljs-string">":"</span> + node.getContents());
}<span class="hljs-keyword">else</span>{
System.out.println(indent + node.getName());
<span class="hljs-keyword">for</span>(ParseTree<IntegerGrammar> child: node){
visitAll(child, indent + <span class="hljs-string">" "</span>);
}
}
}</code></pre></div>
<h2 id="constructing_an_abstract_syntax_tree">Constructing an abstract syntax tree</h2>
<div data-outline="constructing_an_abstract_syntax_tree"><p>We need to convert the parse tree into a recursive data type.
Here’s the definition of the recursive data type that we’re going to use to represent integer arithmetic expressions:</p><pre><code class="language-python hljs">IntegerExpression = Number(n:int)
+ Plus(left:IntegerExpression, right:IntegerExpression)</code></pre><p>If this syntax is mysterious, review <a href="/courses/course-v1:MITx+6.005.2x+1T2017/jump_to_id/vertical-recursive_datatype_definitions">recursive data type definitions</a>.</p><p>When a recursive data type represents a language this way, it is often called an <em>abstract syntax tree</em>. An <code>IntegerExpression</code> value captures the important features of the expression – its grouping and the integers in it – while omitting unnecessary details of the sequence of characters that created it.</p><p>By contrast, the parse tree that we just generated with the <em>IntegerExpression</em> parser is a <em>concrete syntax tree</em>. It’s called concrete, rather than abstract, because it contains more details about how the expression is represented in actual characters. For example, the strings <code>2+2</code>, <code>((2)+(2))</code>, and <code>0002+0002</code> would each produce a different concrete syntax tree, but these trees would all correspond to the same abstract <code>IntegerExpression</code> value: <code>Plus(Number(2), Number(2))</code>.</p><p>Now, we can create a simple recursive function that walks the <code>ParseTree</code> to produce an <code>IntegerExpression</code> as follows.</p><p>Here’s the code:</p>
<pre><code class="language-java hljs"><span class="hljs-comment handout-javadoc-comment">/**
* Function converts a ParseTree to an IntegerExpression.
* <span class="hljs-doctag">@param</span> p
* ParseTree<IntegerGrammar> that is assumed to have been constructed by the grammar in IntegerExpression.g
* <span class="hljs-doctag">@return</span>
*/</span>
<span class="hljs-function">IntegerExpression <span class="hljs-title">buildAST</span><span class="hljs-params">(ParseTree<IntegerGrammar> p)</span></span>{
<span class="hljs-keyword">switch</span>(p.getName()){
<span class="hljs-comment">/*
* Since p is a ParseTree parameterized by the type IntegerGrammar, p.getName()
* returns an instance of the IntegerGrammar enum. This allows the compiler to check
* that we have covered all the cases.
*/</span>
<span class="hljs-keyword">case</span> NUMBER:
<span class="hljs-comment">/*
* A number will be a terminal containing a number.
*/</span>
<span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> Number(Integer.parseInt(p.getContents()));
<span class="hljs-keyword">case</span> PRIMITIVE:
<span class="hljs-comment">/*
* A primitive will have either a number or a sum as child (in addition to some whitespace)
* By checking which one, we can determine which case we are in.
*/</span>
<span class="hljs-keyword">if</span>(p.childrenByName(IntegerGrammar.number).isEmpty()){
<span class="hljs-keyword">return</span> buildAST(p.childrenByName(IntegerGrammar.sum).get(<span class="hljs-number">0</span>));
}<span class="hljs-keyword">else</span>{
<span class="hljs-keyword">return</span> buildAST(p.childrenByName(IntegerGrammar.number).get(<span class="hljs-number">0</span>));
}
<span class="hljs-keyword">case</span> SUM:
<span class="hljs-comment">/*
* A sum will have one or more children that need to be summed together.
* Note that we only care about the children that are primitive. There may also be
* some whitespace children which we want to ignore.
*/</span>
<span class="hljs-keyword">boolean</span> first = <span class="hljs-keyword">true</span>;
IntegerExpression result = <span class="hljs-keyword">null</span>;
<span class="hljs-keyword">for</span>(ParseTree<IntegerGrammar> child : p.childrenByName(IntegerGrammar.PRIMITIVE)){
<span class="hljs-keyword">if</span>(first){
result = buildAST(child);
first = <span class="hljs-keyword">false</span>;
}<span class="hljs-keyword">else</span>{
result = <span class="hljs-keyword">new</span> Plus(result, buildAST(child));
}
}
<span class="hljs-keyword">if</span> (first) {
<span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> RuntimeException(<span class="hljs-string">"sum must have a non whitespace child:"</span> + p);
}
<span class="hljs-keyword">return</span> result;
<span class="hljs-keyword">case</span> ROOT:
<span class="hljs-comment">/*
* The root has a single sum child, in addition to having potentially some whitespace.
*/</span>
<span class="hljs-keyword">return</span> buildAST(p.childrenByName(IntegerGrammar.sum).get(<span class="hljs-number">0</span>));
<span class="hljs-keyword">case</span> WHITESPACE:
<span class="hljs-comment">/*
* Since we are always avoiding calling buildAST with whitespace,
* the code should never make it here.
*/</span>
<span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> RuntimeException(<span class="hljs-string">"You should never reach here:"</span> + p);
}
<span class="hljs-comment">/*
* The compiler should be smart enough to tell that this code is unreachable, but it isn't.
*/</span>
<span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> RuntimeException(<span class="hljs-string">"You should never reach here:"</span> + p);
}</code></pre></div><p>The function is quite simple, and very much follows the structure of the grammar. An important thing to note is that there is a very strong assumption that the code will process a <code>ParseTree</code> that
corresponds to the grammar in <code>IntegerExpression.g</code>. If you feed it a different kind of ParseTree, the code will likely fail with a <code>RuntimeException</code>, but it will always terminate and will never return a null reference.</p>
</div>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="vertical" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="VerticalStudentView" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@vertical+block@vertical_Questions_a73bd3d0c93f">
<h2 class="hd hd-2 unit-title">Questions</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@Parsing">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="html" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="XBlockToXModuleShim" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@Parsing">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<link href="/assets/courseware/v1/b2188f0a95a7a9062748278f7d5a584f/asset-v1:MITx+6.005.2x+1T2017+type@asset+block/syntax-highlighting.css" rel="stylesheet" type="text/css" />
<script src="/assets/courseware/v1/c9cee8830885f59e75b8782f44026226/asset-v1:MITx+6.005.2x+1T2017+type@asset+block/edx-script-v1.js"></script>
</div>
</div>
<div class="vert vert-1" data-id="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-ast-1">
<div class="xblock xblock-public_view xblock-public_view-problem xmodule_display xmodule_ProblemBlock" data-has-score="True" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="problem" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="XBlockToXModuleShim" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-ast-1">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "Problem"}
</script>
<div id="problem_04-ast-1" class="problems-wrapper" role="group"
aria-labelledby="04-ast-1-problem-title"
data-problem-id="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-ast-1" data-url="/courses/course-v1:MITx+6.005.2x+1T2017/xblock/block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-ast-1/handler/xmodule_handler"
data-problem-score="0"
data-problem-total-possible="1"
data-attempts-used="0"
data-content="
<h3 class="hd hd-3 problem-header" id="04-ast-1-problem-title" aria-describedby="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-ast-1-problem-progress" tabindex="-1">
String to AST 1
</h3>
<div class="problem-progress" id="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-ast-1-problem-progress"></div>
<div class="problem">
<div>
<p>If the input string is "19+23+18", which abstract syntax tree would be produced by <code>buildAST</code> above?</p>
<div class="wrapper-problem-response" tabindex="-1" aria-label="Question 1" role="group"><div class="choicegroup capa_inputtype" id="inputtype_04-ast-1_2_1">
<fieldset aria-describedby="status_04-ast-1_2_1">
<div class="field">
<input type="checkbox" name="input_04-ast-1_2_1[]" id="input_04-ast-1_2_1_choice_0" class="field-input input-checkbox" value="choice_0"/><label id="04-ast-1_2_1-choice_0-label" for="input_04-ast-1_2_1_choice_0" class="response-label field-label label-inline" aria-describedby="status_04-ast-1_2_1"> Plus(Number(19))
</label>
</div>
<div class="field">
<input type="checkbox" name="input_04-ast-1_2_1[]" id="input_04-ast-1_2_1_choice_1" class="field-input input-checkbox" value="choice_1"/><label id="04-ast-1_2_1-choice_1-label" for="input_04-ast-1_2_1_choice_1" class="response-label field-label label-inline" aria-describedby="status_04-ast-1_2_1"> Plus(19, 23, 18)
</label>
</div>
<div class="field">
<input type="checkbox" name="input_04-ast-1_2_1[]" id="input_04-ast-1_2_1_choice_2" class="field-input input-checkbox" value="choice_2"/><label id="04-ast-1_2_1-choice_2-label" for="input_04-ast-1_2_1_choice_2" class="response-label field-label label-inline" aria-describedby="status_04-ast-1_2_1"> Plus(Plus(19, 23), 18)
</label>
</div>
<div class="field">
<input type="checkbox" name="input_04-ast-1_2_1[]" id="input_04-ast-1_2_1_choice_3" class="field-input input-checkbox" value="choice_3"/><label id="04-ast-1_2_1-choice_3-label" for="input_04-ast-1_2_1_choice_3" class="response-label field-label label-inline" aria-describedby="status_04-ast-1_2_1"> Plus(Plus(Number(19), Number(23)), Number(18))
</label>
</div>
<div class="field">
<input type="checkbox" name="input_04-ast-1_2_1[]" id="input_04-ast-1_2_1_choice_4" class="field-input input-checkbox" value="choice_4"/><label id="04-ast-1_2_1-choice_4-label" for="input_04-ast-1_2_1_choice_4" class="response-label field-label label-inline" aria-describedby="status_04-ast-1_2_1"> Plus(Number(19), Plus(Number(23), Number(18)))
</label>
</div>
<span id="answer_04-ast-1_2_1"/>
</fieldset>
<div class="indicator-container">
<span class="status unanswered" id="status_04-ast-1_2_1" data-tooltip="Not yet answered.">
<span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/>
</span>
</div>
</div></div>
<div class="solution-span">
<span id="solution_04-ast-1_solution_1"/>
</div></div>
<div class="action">
<input type="hidden" name="problem_id" value="String to AST 1" />
<div class="submit-attempt-container">
<button type="button" class="submit btn-brand" data-submitting="Submitting" data-value="Submit" data-should-enable-submit-button="True" aria-describedby="submission_feedback_04-ast-1" >
<span class="submit-label">Submit</span>
</button>
<div class="submission-feedback" id="submission_feedback_04-ast-1">
<span class="sr">Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.</span>
</div>
</div>
<div class="problem-action-buttons-wrapper">
</div>
</div>
<div class="notification warning notification-gentle-alert
is-hidden"
tabindex="-1">
<span class="icon fa fa-exclamation-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="04-ast-1-problem-title">
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification warning notification-save
is-hidden"
tabindex="-1">
<span class="icon fa fa-save" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="04-ast-1-problem-title">None
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification general notification-show-answer
is-hidden"
tabindex="-1">
<span class="icon fa fa-info-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="04-ast-1-problem-title">Answers are displayed within the problem
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
</div>
"
data-graded="True">
<p class="loading-spinner">
<i class="fa fa-spinner fa-pulse fa-2x fa-fw"></i>
<span class="sr">Loading…</span>
</p>
</div>
</div>
</div>
<div class="vert vert-2" data-id="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-ast-2">
<div class="xblock xblock-public_view xblock-public_view-problem xmodule_display xmodule_ProblemBlock" data-has-score="True" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="problem" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="XBlockToXModuleShim" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-ast-2">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "Problem"}
</script>
<div id="problem_04-ast-2" class="problems-wrapper" role="group"
aria-labelledby="04-ast-2-problem-title"
data-problem-id="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-ast-2" data-url="/courses/course-v1:MITx+6.005.2x+1T2017/xblock/block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-ast-2/handler/xmodule_handler"
data-problem-score="0"
data-problem-total-possible="1"
data-attempts-used="0"
data-content="
<h3 class="hd hd-3 problem-header" id="04-ast-2-problem-title" aria-describedby="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-ast-2-problem-progress" tabindex="-1">
String to AST 2
</h3>
<div class="problem-progress" id="block-v1:MITx+6.005.2x+1T2017+type@problem+block@04-ast-2-problem-progress"></div>
<div class="problem">
<div>
<p>Which of the following input strings would produce:</p>
<pre>
Plus(Plus(Number(1), Number(2)),
Plus(Number(3), Number(4)))
</pre>
<div class="wrapper-problem-response" tabindex="-1" aria-label="Question 1" role="group"><div class="choicegroup capa_inputtype" id="inputtype_04-ast-2_2_1">
<fieldset aria-describedby="status_04-ast-2_2_1">
<div class="field">
<input type="checkbox" name="input_04-ast-2_2_1[]" id="input_04-ast-2_2_1_choice_0" class="field-input input-checkbox" value="choice_0"/><label id="04-ast-2_2_1-choice_0-label" for="input_04-ast-2_2_1_choice_0" class="response-label field-label label-inline" aria-describedby="status_04-ast-2_2_1"> "(1+2)+(3+4)"
</label>
</div>
<div class="field">
<input type="checkbox" name="input_04-ast-2_2_1[]" id="input_04-ast-2_2_1_choice_1" class="field-input input-checkbox" value="choice_1"/><label id="04-ast-2_2_1-choice_1-label" for="input_04-ast-2_2_1_choice_1" class="response-label field-label label-inline" aria-describedby="status_04-ast-2_2_1"> "1+2+3+4"
</label>
</div>
<div class="field">
<input type="checkbox" name="input_04-ast-2_2_1[]" id="input_04-ast-2_2_1_choice_2" class="field-input input-checkbox" value="choice_2"/><label id="04-ast-2_2_1-choice_2-label" for="input_04-ast-2_2_1_choice_2" class="response-label field-label label-inline" aria-describedby="status_04-ast-2_2_1"> "(1+2)+3+4"
</label>
</div>
<div class="field">
<input type="checkbox" name="input_04-ast-2_2_1[]" id="input_04-ast-2_2_1_choice_3" class="field-input input-checkbox" value="choice_3"/><label id="04-ast-2_2_1-choice_3-label" for="input_04-ast-2_2_1_choice_3" class="response-label field-label label-inline" aria-describedby="status_04-ast-2_2_1"> "(((1+2)))+(3+4)"
</label>
</div>
<span id="answer_04-ast-2_2_1"/>
</fieldset>
<div class="indicator-container">
<span class="status unanswered" id="status_04-ast-2_2_1" data-tooltip="Not yet answered.">
<span class="sr">unanswered</span><span class="status-icon" aria-hidden="true"/>
</span>
</div>
</div></div>
<div class="solution-span">
<span id="solution_04-ast-2_solution_1"/>
</div></div>
<div class="action">
<input type="hidden" name="problem_id" value="String to AST 2" />
<div class="submit-attempt-container">
<button type="button" class="submit btn-brand" data-submitting="Submitting" data-value="Submit" data-should-enable-submit-button="True" aria-describedby="submission_feedback_04-ast-2" >
<span class="submit-label">Submit</span>
</button>
<div class="submission-feedback" id="submission_feedback_04-ast-2">
<span class="sr">Some problems have options such as save, reset, hints, or show answer. These options follow the Submit button.</span>
</div>
</div>
<div class="problem-action-buttons-wrapper">
</div>
</div>
<div class="notification warning notification-gentle-alert
is-hidden"
tabindex="-1">
<span class="icon fa fa-exclamation-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="04-ast-2-problem-title">
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification warning notification-save
is-hidden"
tabindex="-1">
<span class="icon fa fa-save" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="04-ast-2-problem-title">None
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
<div class="notification general notification-show-answer
is-hidden"
tabindex="-1">
<span class="icon fa fa-info-circle" aria-hidden="true"></span>
<span class="notification-message" aria-describedby="04-ast-2-problem-title">Answers are displayed within the problem
</span>
<div class="notification-btn-wrapper">
<button type="button" class="btn btn-default btn-small notification-btn review-btn sr">Review</button>
</div>
</div>
</div>
"
data-graded="True">
<p class="loading-spinner">
<i class="fa fa-spinner fa-pulse fa-2x fa-fw"></i>
<span class="sr">Loading…</span>
</p>
</div>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="vertical" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="VerticalStudentView" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@vertical+block@Handling_Errors">
<h2 class="hd hd-2 unit-title">Handling Errors</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_6bb5b19d6f8d">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="html" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="XBlockToXModuleShim" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_6bb5b19d6f8d">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<link href="/assets/courseware/v1/b2188f0a95a7a9062748278f7d5a584f/asset-v1:MITx+6.005.2x+1T2017+type@asset+block/syntax-highlighting.css" rel="stylesheet" type="text/css" />
<h2 id="handling_errors">Handling errors</h2>
<div data-outline="handling_errors"><p>Several things can go wrong when parsing a file. </p><ul>
<li>Your grammar file may fail to open.</li>
<li>Your grammar may be syntactically incorrect.</li>
<li>The string you are trying to parse may not be parseable with your given grammar, either because your grammar is incorrect, or because your string is incorrect.</li>
</ul><p>In the first case, the <code>compile</code> method will throw an <code>IOException</code>. In the second case, it will throw an <code>UnableToParseException</code>.
In the third case, the <code>UnableToParseException</code> will be thrown by the <code>parse</code> method. The <code>UnableToParseException</code> exception will contain some information about the possible location of the error, although parse errors are sometimes inherently difficult to localize, since the parser cannot know what string you intended to write, so you may need to search a little to find the true location of the error. </p></div>
<h2 id="left_recursion_and_other_parserlib_limitations">Left recursion and other ParserLib limitations</h2>
<div data-outline="left_recursion_and_other_parserlib_limitations"><p>ParserLib works by generating a top-down <a href="https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-035-computer-language-engineering-spring-2010/lecture-notes/MIT6_035S10_lec04.pdf">Recursive Descent Parser</a>. These kind of parsers have a few limitations in terms of the grammars that they can parse. There are two in particular that are worth pointing out. </p><p><strong>Left recursion.</strong> A recursive descent parser can go into an infinite loop if the grammar involves <a href="https://en.wikipedia.org/wiki/Left_recursion">left recursion</a>. This is a case where a definition for a non-terminal involves that non-terminal as its leftmost symbol. For example, the grammar below includes left recursion because one of the possible definitions of <code>sum</code> is <code>sum '+' number</code> which has <code>sum</code> as its leftmost symbol.</p><pre><code class="language-python hljs">//The IntegerExpression grammar
<span class="hljs-meta">@skip whitespace{</span>
root ::= sum;
sum ::= number | sum <span class="hljs-string">'+'</span> number;
}
whitespace ::= [ \t\r\n];
number ::= [<span class="hljs-number">0</span><span class="hljs-number">-9</span>]+;</code></pre><p>Left recursion can also happen indirectly. For example, changing the grammar above to the one below does not address the problem because the definition of <code>sum</code> still indirectly involves a symbol that has <code>sum</code> as its first symbol. </p><pre><code class="language-python hljs">//The IntegerExpression grammar
<span class="hljs-meta">@skip whitespace{</span>
root ::= sum;
sum ::= number | thing number;
thing ::= sum <span class="hljs-string">'+'</span>;
}
whitespace ::= [ \t\r\n];
number ::= [<span class="hljs-number">0</span><span class="hljs-number">-9</span>]+;</code></pre><p>If you give any of these grammars to ParserLib and then try to use them to parse a symbol, ParserLib will fail with an <code>UnableToParse</code> exception listing the offending non-terminal. </p><p>There are some <a href="https://en.wikipedia.org/wiki/Left_recursion">general techniques</a> to eliminate left recursion; for our purposes, the simplest approach will be to replace left recursion with repetition (<code>*</code>), so the grammar above becomes:</p><pre><code class="language-python hljs">//The IntegerExpression grammar
<span class="hljs-meta">@skip whitespace{</span>
root ::= sum;
sum ::= (number <span class="hljs-string">'+'</span>)* number;
}
whitespace ::= [ \t\r\n];
number ::= [<span class="hljs-number">0</span><span class="hljs-number">-9</span>]+;</code></pre><p><strong>Greediness.</strong> This is not an issue that you will run into in this class, but it is a limitation of ParserLib you should be aware of. The ParserLib parsers are greedy in that at every point they try to match a maximal string for any rule they are currently considering. For example, consider the following grammar.</p><pre><code class="language-python hljs">root ::= ab threeb;
ab ::= <span class="hljs-string">'a'</span>*<span class="hljs-string">'b'</span>*
threeb ::= <span class="hljs-string">'bbb'</span>;</code></pre><p>The string <code>'aaaabbb'</code> is clearly in the grammar, but a greedy parser cannot parse it because it will try to parse a maximal substring that matches the <code>ab</code> symbol, and then it will find that it cannot parse <code>threeb</code> because it has already consumed the entire string. Unlike left recursion, which is easy to fix, this is a more fundamental limitation of the type of parser implemented by ParserLib, but as mentioned before, this is not something you should run into in this class.</p></div>
</div>
</div>
</div>
</div>
<div class="xblock xblock-public_view xblock-public_view-vertical" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="vertical" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="VerticalStudentView" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@vertical+block@Reading_4_Summary">
<h2 class="hd hd-2 unit-title">Reading 4 Summary</h2>
<div class="vert-mod">
<div class="vert vert-0" data-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_9e171e313dba">
<div class="xblock xblock-public_view xblock-public_view-html xmodule_display xmodule_HtmlBlock" data-has-score="False" data-runtime-class="LmsRuntime" data-graded="True" data-request-token="596b9138e99611efbd7112d4917dea95" data-runtime-version="1" data-block-type="html" data-course-id="course-v1:MITx+6.005.2x+1T2017" data-init="XBlockToXModuleShim" data-usage-id="block-v1:MITx+6.005.2x+1T2017+type@html+block@html_9e171e313dba">
<script type="json/xblock-args" class="xblock-json-init-args">
{"xmodule-type": "HTMLModule"}
</script>
<link href="/assets/courseware/v1/b2188f0a95a7a9062748278f7d5a584f/asset-v1:MITx+6.005.2x+1T2017+type@asset+block/syntax-highlighting.css" rel="stylesheet" type="text/css" />
<h2 id="summary">Summary</h2>
<div data-outline="summary">
<p>The topics of today's reading connect to our three properties of good software as follows:</p>
<ul>
<li>
<p><strong>Safe from bugs.</strong> A grammar is a declarative specification for strings and streams, which can be implemented automatically by a parser generator. These specifications are often simpler, more direct, and less likely to be buggy then parsing code written by hand.</p>
</li>
<li>
<p><strong>Easy to understand.</strong> A grammar captures the shape of a sequence in a form that is compact and easier to understand than hand-written parsing code.</p>
</li>
<li>
<p><strong>Ready for change.</strong> A grammar can be easily edited, then run through a parser generator to regenerate the parsing code.</p>
</li>
</ul>
</div>
<div class="license">This reading was collaboratively authored with contributions from: Saman Amarasinghe, Adam Chlipala, Srini Devadas, Michael Ernst, Max Goldman, John Guttag, Daniel Jackson, Rob Miller, Martin Rinard, and Armando Solar-Lezama. This work is licensed under <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a>.</div>
</div>
</div>
</div>
</div>