[phc-internals] [phc commit] r1748 - trunk/doc/manual

codesite-noreply at google.com codesite-noreply at google.com
Sun Oct 5 22:04:38 IST 2008


Author: paul.biggar
Date: Sun Oct  5 14:04:07 2008
New Revision: 1748

Modified:
    trunk/doc/manual/treetutorial3.sgml

Log:
convert treetutorial3.


Modified: trunk/doc/manual/treetutorial3.sgml
==============================================================================
--- trunk/doc/manual/treetutorial3.sgml	(original)
+++ trunk/doc/manual/treetutorial3.sgml	Sun Oct  5 14:04:07 2008
@@ -4,16 +4,21 @@
  <section>
  <title></title>

-<para>Now that we have seen in <xref linkend="treetutorial1"> how we can
-traverse the tree, and in <xref linkend="treetutorial2"> how we can modify
-individual nodes in the tree, in this tutorial we will look at modifying  
the
-structure of the tree itself.</para>	
-
-<para>The transform that we will be considering in this tutorial is one  
that is
-used in &phc itself. The transform is called  
<code>Remove_concat_null</code>
-and can be found in <filename>process_ast/Remove_concat_null.h</filename>.  
The
-purpose of the transform is to remove string concatenation with the empty
-string.  For example, </para>
+<para>
+	Now that we have seen in <xref linkend="treetutorial1"> how we can  
traverse
+	the tree, and in <xref linkend="treetutorial2"> how we can modify  
individual
+	nodes in the tree, in this tutorial we will look at modifying the  
structure
+	of the tree itself.
+</para>	
+
+<para>
+	The transform that we will be considering in this tutorial is one that is
+	used in &phc itself. The transform is called  
<code>Remove_concat_null</code>
+	and can be found in
+	<filename>src/process_ast/Remove_concat_null.h</filename>. The purpose of
+	the transform is to remove string concatenation with the empty string. For
+	example,
+</para>

  <programlisting>
  &lt;?<reserved>php</reserved>
@@ -21,7 +26,9 @@
  ?&gt;
  </programlisting>

-<para> is translated to </para>
+<para>
+	is translated to
+</para>

  <programlisting>
  &lt;?<reserved>php</reserved>
@@ -29,15 +36,18 @@
  ?&gt;
  </programlisting>

-<para> The reason that this transform is implemented in &phc is due to how  
the
-&phc parser deals with in-string syntax. For example, if you write </para>
+<para>
+	The reason that this transform is implemented in &phc is due to how the  
&phc
+	parser deals with in-string syntax. For example, if you write
+</para>

  <programlisting>
  $a = "foo $b bar";
  </programlisting>

-<para> the corresponding tree generated by &phc
-is </para>
+<para>
+	the corresponding tree generated by &phc is
+</para>

  <programlisting>
  $a = "foo " . $b . " bar";
@@ -51,30 +61,38 @@
  $a = "foo $b";
  </programlisting>

-<para> the parser generates </para>
+<para>
+	the parser generates
+</para>

  <programlisting>
  $a = "foo " . $b . "";
  </programlisting>

-<para> Obviously, the second concatenation is unnecessary, and the
-<code>Remove_concat_null</code> transform cleans this up. In this tutorial  
we
-will explain how this transform can be written. </para>
+<para>
+	Obviously, the second concatenation is unnecessary, and the
+	<code>Remove_concat_null</code> transform cleans this up. In this tutorial
+	we will explain how this transform can be written.
+</para>

  </section>
  <section>

  <title> Introducing the <code>Tree_transform</code> API </title>

-<para> Concatenation is a binary operator, so we are interested in nodes of
-type <code>Bin_op</code>. If you check the grammar or, alternatively,
-<filename>AST.h</filename>, you will find that <code>Bin_op</code> has
-three attributes: a <code>left</code> and a <code>right</code> expression  
(of
-type <code>Expr</code>) and the operator itself (<code>OP*
-op</code>).  Thus, we are interested in nodes of type <code>Bin_op</code>
-whose <code>op</code> equals the single dot (for string concatenation).  
</para>
+<para>
+	Concatenation is a binary operator, so we are interested in nodes of type
+	<code>Bin_op</code>. If you check the grammar, or, alternatively,
+	<filename>src/generated/AST.h</filename>, you will find that
+	<code>Bin_op</code> has three attributes: a <code>left</code> and a
+	<code>right</code> expression (of type <code>Expr</code>) and the operator
+	itself (<code>OP* op</code>). Thus, we are interested in nodes of type
+	<code>Bin_op</code> whose <code>op</code> equals the single dot (for  
string
+	concatenation).
+</para>

-<para> Based on the previous two tutorials, we might try something like  
this:
+<para>
+	Based on the previous two tutorials, we might try something like this:
  </para>

  <programlisting>
@@ -92,22 +110,25 @@
  }
  </programlisting>

-<para> The problem is, what are we going to do inside the <code>if</code>?  
Tree
-visitors can only inspect and modify <code>*in</code>; they cannot  
restructure
-the tree. In particular, we cannot replace <code>*in</code> by a new node.  
For
-this purpose, &phc offers a separate API, the tree
-<emphasis>transformation</emphasis> API. It looks very similar to the tree
-visitor API, but there are two important differences. First, the
-<code>pre</code> and <code>post</code> methods can modify the structure of  
the
-tree by returning new nodes. Second, there are no &ldquo;generic&rdquo;  
methods
-in the tree transform API. So, it is not possible to define a  
transformation
-that would replace all statements by something else. (It is not clear how  
that
-would be useful, anyway.) </para>
-
-<para> So, we need to write our transformation using the
-<code>Tree_transform</code> API, defined in
-<filename>Tree_transform.h</filename>. Restructuring the class above
-yields </para>
+<para>
+	The problem is, what are we going to do inside the <code>if</code>? Tree
+	visitors can only inspect and modify <code>*in</code>; they cannot
+	restructure the tree. In particular, we cannot replace <code>*in</code>  
by a
+	new node. For this purpose, &phc offers a separate API, the tree
+	<emphasis>transformation</emphasis> API. It looks very similar to the tree
+	visitor API, but there are two important differences. First, the
+	<code>pre</code> and <code>post</code> methods can modify the structure of
+	the tree by returning new nodes. Second, there are no  
&ldquo;generic&rdquo;
+	methods in the tree transform API. So, it is not possible to define a
+	transformation that would replace all statements by something else. (It is
+	not clear how that would be useful, anyway.)
+</para>
+
+<para>
+	So, we need to write our transformation using the
+	<code>Tree_transform</code> API, defined in
+	<filename>AST_transform.h</filename>. Restructuring the class above yields
+</para>

  <programlisting>
  <reserved>class</reserved> Remove_concat_null :  
<reserved>public</reserved> <boxed>Transform</boxed>
@@ -124,11 +145,13 @@
  }
  </programlisting>
  			
-<para> The differences between the previous version have been highlighted.  
We
-inherit from a different class, and <code>pre_bin_op</code> now has a  
return
-value, which is the node that will replace <code>*in</code>. If you check  
the
-default implementation of <code>pre_bin_op</code> in
-<filename>AST_transform.cpp</filename>, you'll find: </para>
+<para>
+	The differences between the previous version have been highlighted. We
+	inherit from a different class, and <code>pre_bin_op</code> now has a  
return
+	value, which is the node that will replace <code>*in</code>. If you check
+	the default implementation of <code>pre_bin_op</code> in
+	<filename>AST_transform.cpp</filename>, you'll find:
+</para>

  <programlisting>
  Expr* Transform::pre_bin_op(Bin_op* in)
@@ -137,32 +160,38 @@
  }
  </programlisting>
  			
-<para> The <code>return in;</code> is very important; as we mentioned  
before,
-the return value of <code>pre_bin_op</code> will replace <code>*in</code>  
in
-the tree. Therefore, if we don't want to replace <code>*in</code>, or  
perhaps
-if we want to replace <code>*in</code> only if a particular condition  
holds, we
-must return <code>in</code>. This will replace <code>*in</code> by
-<code>in</code> itself. </para>
-
-<para> The second thing to note is that the return type of
-<code>pre_bin_op</code> is <code>Expr</code> instead of
-<code>Bin_op</code>. This means that we can replace a binary operator node
-by another other expression node. The <xref linkend="maketeatheory"
-endterm="maketeatheory.title"> explains exactly how the signatures for the
-<code>pre</code> and <code>post</code> methods are derived, but in most  
cases
-they are what you'd expect.  The easiest way to check is to simply look  
them up
-in <filename>&lt;AST_transform.h&gt;</filename>. </para>
+<para>
+	The <code>return in;</code> is very important; as we mentioned before, the
+	return value of <code>pre_bin_op</code> will replace <code>*in</code> in  
the
+	tree. Therefore, if we don't want to replace <code>*in</code>, or perhaps  
if
+	we want to replace <code>*in</code> only if a particular condition holds,  
we
+	must return <code>in</code>. This will replace <code>*in</code> by
+	<code>in</code> itself.
+</para>
+
+<para>
+	The second thing to note is that the return type of  
<code>pre_bin_op</code>
+	is <code>Expr</code> instead of <code>Bin_op</code>. This means that we  
can
+	replace a binary operator node by another other expression node. The <xref
+	linkend="maketeatheory" endterm="maketeatheory.title"> explains exactly  
how
+	the signatures for the <code>pre</code> and <code>post</code> methods are
+	derived, but in most cases they are what you'd expect.  The easiest way to
+	check is to simply look them up in
+	<filename>&lt;AST_transform.h&gt;</filename>.
+</para>

  </section>
  <section id="implementation">

  <title>The Implementation</title>

-<para> We wanted to get rid of useless concatenation operators. To be  
precise,
-if the binary operator is the concatenation operator, and the left operand  
is
-the empty string, we want to replace the node by the right operand;  
similarly,
-if the right operand is the empty string, we want to replace the operator  
by
-its left operand. Here's the full transform: </para>
+<para>
+	We wanted to get rid of useless concatenation operators. To be precise, if
+	the binary operator is the concatenation operator, and the left operand is
+	the empty string, we want to replace the node by the right operand;
+	similarly, if the right operand is the empty string, we want to replace  
the
+	operator by its left operand. Here's the full transform:
+</para>
  	
  <programlisting>
  <reserved>class</reserved> Remove_concat_null :  
<reserved>public</reserved> Transform
@@ -170,7 +199,7 @@
  <reserved>public</reserved>:
     Expr* post_bin_op(Bin_op* in)
     {
-      STRING* empty = <reserved>new</reserved>  
STRING(<reserved>new</reserved> String(""), <reserved>new</reserved>  
String(""));
+      STRING* empty = <reserved>new</reserved>  
STRING(<reserved>new</reserved> String(""));
        Wildcard&lt;Expr&gt;* wildcard = <reserved>new</reserved>  
Wildcard&lt;Expr&gt;;

        <emphasis>// Replace with right operand if left operand is the empty  
string</emphasis>
@@ -186,45 +215,41 @@
  }
  </programlisting>

-<para> We already explained what <code>match</code> does in <xref
-linkend="treetutorial2">, but we have not yet explained the use of  
wildcards.
-If you are using a wildcard (<code>WILDCARD</code>) in a pattern passed to
-<code>match</code>, <code>match</code> will not take that subtree into  
account.
-Thus, </para>
+<para>
+	We already explained what <code>match</code> does in <xref
+	linkend="treetutorial2">, but we have not yet explained the use of
+	wildcards. If you are using a wildcard (<code>WILDCARD</code>) in a  
pattern
+	passed to <code>match</code>, <code>match</code> will not take that  
subtree
+	into account. Thus,
+</para>
  	
  <programlisting>
  <reserved>if</reserved>(in-&gt;match(<reserved>new</reserved>  
Bin_op(empty, WILDCARD, ".")))
  </programlisting>
  			
-<para> can be paraphrased as &ldquo;is <code>in</code> a binary operator  
with
-the empty string as the left operand and <code>"."</code> as the operator  
(I
-don't care about the right operand)?&ldquo; If the match succeeded, you can
-find out which expression was matched by the wildcard by accessing
-<code>wildcard->value</code>, although we do not use that particular  
feature of
-wildcards in this example. </para>
-
-<para> Note that the constructor for <code>STRING</code> has two
-arguments: one corresponds to the value of the string, and one corresponds  
to
-the representation of the string in the source (see also the explanation  
of the
-token classes in <xref linkend="treetutorial2">). For most strings, both of
-these values are the same; however, in some cases they are different. For
-example, <code>value</code> might be set to
-<code>&ldquo;/home/joe/myscript.php</code>, while <code>source_rep</code>  
is
-set to <code>__FILE__</code>. </para>
+<para>
+	can be paraphrased as &ldquo;is <code>in</code> a binary operator with the
+	empty string as the left operand and <code>"."</code> as the operator (I
+	don't care about the right operand)?&ldquo; If the match succeeded, you  
can
+	find out which expression was matched by the wildcard by accessing
+	<code>wildcard->value</code>.
+</para>

  </section>
  <section>

  <title> Running Transformations </title>

-<para> Recall from the previous two tutorials that visitors are run with a  
call
-to <code>visit</code>: </para>
+<para>
+	Recall from the previous two tutorials that visitors are run with a call  
to
+	<code>visit</code>:
+</para>

  <programlisting>
-<reserved>extern</reserved> "C" <reserved>void</reserved>  
process_ast(PHP_script* php_script)
+<reserved>extern</reserved> "C" <reserved>void</reserved> run_ast  
(PHP_script* in, Pass_manager* pm, String* option)
  {
      SomeVisitor visitor;
-    php_script-&gt;visit(&amp;visitor);
+    in-&gt;visit(&amp;visitor);
  }
  </programlisting>

@@ -232,15 +257,16 @@
  <code>transform_children</code>: </para>

  <programlisting>
-<reserved>extern</reserved> "C" <reserved>void</reserved>  
process_ast(PHP_script* php_script)
+<reserved>extern</reserved> "C" <reserved>void</reserved> run_ast  
(PHP_script* in, Pass_manager* pm, String* option)
  {
      SomeTransform transform;
-    php_script-&gt;transform_children(&amp;transform);
+    in-&gt;transform_children(&amp;transform);
  }
  </programlisting>

-<para> We invoke <code>transform_children</code> because we should not  
replace
-the top-level node in the AST (the <code>PHP_script</code> node itself).
+<para>
+	We invoke <code>transform_children</code> because we should not replace  
the
+	top-level node in the AST (the <code>PHP_script</code> node itself).
  </para>

  </section>
@@ -248,47 +274,60 @@

  <title> A Subtlety </title>

-<para> If you don't understand this section right now, don't worry about  
it;
-you might find it useful to read it again after having gained some  
experience
-with the transformation API. </para>
-
-<para> We have implemented the transform as a
-<emphasis>post-</emphasis>transform rather than a <emphasis>pre-</emphasis>
-transform. Why? Suppose we implemented the transform as a pre-transform.
-Consider the following PHP expression (bracketed explicitly for emphasis:)
+<para>
+	If you don't understand this section right now, don't worry about it; you
+	might find it useful to read it again after having gained some experience
+	with the transformation API.
+</para>
+
+<para>
+	We have implemented the transform as a <emphasis>post-</emphasis>transform
+	rather than a <emphasis>pre-</emphasis> transform. Why? Suppose we
+	implemented the transform as a pre-transform.  Consider the following PHP
+	expression (bracketed explicitly for emphasis:)
  </para>

  <programlisting>
  ("" . $a) . ""
  </programlisting>

-<para> The first binary operator we encounter is the second one (get &phc  
to
-print the tree if you don't see why.) So, we apply the transform and  
replace
-the operator by its left operand, which happens to be <code>("" .  
$a)</code>.
-We then continue <emphasis>and transform the children of the that
-node</emphasis>, because that is how the tree transform API is defined.  
But the
-<emphasis>children</emphasis> of that node are <code>""</code> and
-<code>$a</code>. So, that means that the other binary operator itself will
-never be processed! </para>
-
-<para> There are two solutions to this problem. The first is the one we  
used
-above, and use a post-transform instead of a pre-transform. You should try  
to
-reason out why this works, but a rule of thumb is that unless there is a  
good
-reason to use a pre-transform, it's safer to use the post-transform,  
because in
-the post-transform the children of the node have already been transformed,  
so
-that you are looking at the &ldquo;final&rdquo; version of the node.  
</para>
-
-<para> The second solution is to use a pre-transform, but explicitly tell  
&phc;
-to transform the new node in turn.  This is the less elegant solution, but
-sometimes this is the only solution that will work (see for example the
-<code>Token_conversion</code> transform in the &phc source tree). To do  
this,
-you would replace </para>
+<para>
+	The first binary operator we encounter is the second one (get &phc to  
print
+	the tree if you don't see why.) So, we apply the transform and replace the
+	operator by its left operand, which happens to be <code>("" . $a)</code>.
+	We then continue <emphasis>and transform the children of the that
+	node</emphasis>, because that is how the tree transform API is defined.  
But
+	the <emphasis>children</emphasis> of that node are <code>""</code> and
+	<code>$a</code>. So, that means that the other binary operator itself will
+	never be processed!
+</para>
+
+<para>
+	There are two solutions to this problem. The first is the one we used  
above,
+	and use a post-transform instead of a pre-transform. You should try to
+	reason out why this works, but a rule of thumb is that unless there is a
+	good reason to use a pre-transform, it's safer to use the post-transform,
+	because in the post-transform the children of the node have already been
+	transformed, so that you are looking at the &ldquo;final&rdquo; version of
+	the node.
+</para>
+
+<para>
+	The second solution is to use a pre-transform, but explicitly tell &phc;  
to
+	transform the new node in turn.  This is the less elegant solution, but
+	sometimes this is the only solution that will work (see for example the
+	<code>Token_conversion</code> transform in the
+	<filename>src/process_ast/Token_conversion.cpp</filename>). To do this,  
you
+	would replace
+</para>

  <programlisting>
  <reserved>return</reserved> in-&gt;right;
  </programlisting>

-<para> by </para>
+<para>
+	by
+</para>

  <programlisting>
  <reserved>return</reserved> in-&gt;right-&gt;pre_transform(this);
@@ -299,9 +338,11 @@

  <title> What's Next? </title>

-<para> The next tutorial in this series, <xref linkend="treetutorial4"
-endterm="treetutorial4.title">, introduces a very important notion in  
transforms: the
-use of <emphasis>state</emphasis>. </para>
+<para>
+	The next tutorial in this series, <xref linkend="treetutorial4"
+	endterm="treetutorial4.title">, introduces a very important notion in
+	transforms: the use of <emphasis>state</emphasis>.
+</para>

  </section>
  </chapter>


More information about the phc-internals mailing list