[phc-internals] [phc commit] r1748 - trunk/doc/manual
codesite-noreply at google.com
codesite-noreply at google.com
Sun Oct 5 22:04:38 IST 2008
Author: paul.biggar
Date: Sun Oct 5 14:04:07 2008
New Revision: 1748
Modified:
trunk/doc/manual/treetutorial3.sgml
Log:
convert treetutorial3.
Modified: trunk/doc/manual/treetutorial3.sgml
==============================================================================
--- trunk/doc/manual/treetutorial3.sgml (original)
+++ trunk/doc/manual/treetutorial3.sgml Sun Oct 5 14:04:07 2008
@@ -4,16 +4,21 @@
<section>
<title></title>
-<para>Now that we have seen in <xref linkend="treetutorial1"> how we can
-traverse the tree, and in <xref linkend="treetutorial2"> how we can modify
-individual nodes in the tree, in this tutorial we will look at modifying
the
-structure of the tree itself.</para>
-
-<para>The transform that we will be considering in this tutorial is one
that is
-used in &phc itself. The transform is called
<code>Remove_concat_null</code>
-and can be found in <filename>process_ast/Remove_concat_null.h</filename>.
The
-purpose of the transform is to remove string concatenation with the empty
-string. For example, </para>
+<para>
+ Now that we have seen in <xref linkend="treetutorial1"> how we can
traverse
+ the tree, and in <xref linkend="treetutorial2"> how we can modify
individual
+ nodes in the tree, in this tutorial we will look at modifying the
structure
+ of the tree itself.
+</para>
+
+<para>
+ The transform that we will be considering in this tutorial is one that is
+ used in &phc itself. The transform is called
<code>Remove_concat_null</code>
+ and can be found in
+ <filename>src/process_ast/Remove_concat_null.h</filename>. The purpose of
+ the transform is to remove string concatenation with the empty string. For
+ example,
+</para>
<programlisting>
<?<reserved>php</reserved>
@@ -21,7 +26,9 @@
?>
</programlisting>
-<para> is translated to </para>
+<para>
+ is translated to
+</para>
<programlisting>
<?<reserved>php</reserved>
@@ -29,15 +36,18 @@
?>
</programlisting>
-<para> The reason that this transform is implemented in &phc is due to how
the
-&phc parser deals with in-string syntax. For example, if you write </para>
+<para>
+ The reason that this transform is implemented in &phc is due to how the
&phc
+ parser deals with in-string syntax. For example, if you write
+</para>
<programlisting>
$a = "foo $b bar";
</programlisting>
-<para> the corresponding tree generated by &phc
-is </para>
+<para>
+ the corresponding tree generated by &phc is
+</para>
<programlisting>
$a = "foo " . $b . " bar";
@@ -51,30 +61,38 @@
$a = "foo $b";
</programlisting>
-<para> the parser generates </para>
+<para>
+ the parser generates
+</para>
<programlisting>
$a = "foo " . $b . "";
</programlisting>
-<para> Obviously, the second concatenation is unnecessary, and the
-<code>Remove_concat_null</code> transform cleans this up. In this tutorial
we
-will explain how this transform can be written. </para>
+<para>
+ Obviously, the second concatenation is unnecessary, and the
+ <code>Remove_concat_null</code> transform cleans this up. In this tutorial
+ we will explain how this transform can be written.
+</para>
</section>
<section>
<title> Introducing the <code>Tree_transform</code> API </title>
-<para> Concatenation is a binary operator, so we are interested in nodes of
-type <code>Bin_op</code>. If you check the grammar or, alternatively,
-<filename>AST.h</filename>, you will find that <code>Bin_op</code> has
-three attributes: a <code>left</code> and a <code>right</code> expression
(of
-type <code>Expr</code>) and the operator itself (<code>OP*
-op</code>). Thus, we are interested in nodes of type <code>Bin_op</code>
-whose <code>op</code> equals the single dot (for string concatenation).
</para>
+<para>
+ Concatenation is a binary operator, so we are interested in nodes of type
+ <code>Bin_op</code>. If you check the grammar, or, alternatively,
+ <filename>src/generated/AST.h</filename>, you will find that
+ <code>Bin_op</code> has three attributes: a <code>left</code> and a
+ <code>right</code> expression (of type <code>Expr</code>) and the operator
+ itself (<code>OP* op</code>). Thus, we are interested in nodes of type
+ <code>Bin_op</code> whose <code>op</code> equals the single dot (for
string
+ concatenation).
+</para>
-<para> Based on the previous two tutorials, we might try something like
this:
+<para>
+ Based on the previous two tutorials, we might try something like this:
</para>
<programlisting>
@@ -92,22 +110,25 @@
}
</programlisting>
-<para> The problem is, what are we going to do inside the <code>if</code>?
Tree
-visitors can only inspect and modify <code>*in</code>; they cannot
restructure
-the tree. In particular, we cannot replace <code>*in</code> by a new node.
For
-this purpose, &phc offers a separate API, the tree
-<emphasis>transformation</emphasis> API. It looks very similar to the tree
-visitor API, but there are two important differences. First, the
-<code>pre</code> and <code>post</code> methods can modify the structure of
the
-tree by returning new nodes. Second, there are no “generic”
methods
-in the tree transform API. So, it is not possible to define a
transformation
-that would replace all statements by something else. (It is not clear how
that
-would be useful, anyway.) </para>
-
-<para> So, we need to write our transformation using the
-<code>Tree_transform</code> API, defined in
-<filename>Tree_transform.h</filename>. Restructuring the class above
-yields </para>
+<para>
+ The problem is, what are we going to do inside the <code>if</code>? Tree
+ visitors can only inspect and modify <code>*in</code>; they cannot
+ restructure the tree. In particular, we cannot replace <code>*in</code>
by a
+ new node. For this purpose, &phc offers a separate API, the tree
+ <emphasis>transformation</emphasis> API. It looks very similar to the tree
+ visitor API, but there are two important differences. First, the
+ <code>pre</code> and <code>post</code> methods can modify the structure of
+ the tree by returning new nodes. Second, there are no
“generic”
+ methods in the tree transform API. So, it is not possible to define a
+ transformation that would replace all statements by something else. (It is
+ not clear how that would be useful, anyway.)
+</para>
+
+<para>
+ So, we need to write our transformation using the
+ <code>Tree_transform</code> API, defined in
+ <filename>AST_transform.h</filename>. Restructuring the class above yields
+</para>
<programlisting>
<reserved>class</reserved> Remove_concat_null :
<reserved>public</reserved> <boxed>Transform</boxed>
@@ -124,11 +145,13 @@
}
</programlisting>
-<para> The differences between the previous version have been highlighted.
We
-inherit from a different class, and <code>pre_bin_op</code> now has a
return
-value, which is the node that will replace <code>*in</code>. If you check
the
-default implementation of <code>pre_bin_op</code> in
-<filename>AST_transform.cpp</filename>, you'll find: </para>
+<para>
+ The differences between the previous version have been highlighted. We
+ inherit from a different class, and <code>pre_bin_op</code> now has a
return
+ value, which is the node that will replace <code>*in</code>. If you check
+ the default implementation of <code>pre_bin_op</code> in
+ <filename>AST_transform.cpp</filename>, you'll find:
+</para>
<programlisting>
Expr* Transform::pre_bin_op(Bin_op* in)
@@ -137,32 +160,38 @@
}
</programlisting>
-<para> The <code>return in;</code> is very important; as we mentioned
before,
-the return value of <code>pre_bin_op</code> will replace <code>*in</code>
in
-the tree. Therefore, if we don't want to replace <code>*in</code>, or
perhaps
-if we want to replace <code>*in</code> only if a particular condition
holds, we
-must return <code>in</code>. This will replace <code>*in</code> by
-<code>in</code> itself. </para>
-
-<para> The second thing to note is that the return type of
-<code>pre_bin_op</code> is <code>Expr</code> instead of
-<code>Bin_op</code>. This means that we can replace a binary operator node
-by another other expression node. The <xref linkend="maketeatheory"
-endterm="maketeatheory.title"> explains exactly how the signatures for the
-<code>pre</code> and <code>post</code> methods are derived, but in most
cases
-they are what you'd expect. The easiest way to check is to simply look
them up
-in <filename><AST_transform.h></filename>. </para>
+<para>
+ The <code>return in;</code> is very important; as we mentioned before, the
+ return value of <code>pre_bin_op</code> will replace <code>*in</code> in
the
+ tree. Therefore, if we don't want to replace <code>*in</code>, or perhaps
if
+ we want to replace <code>*in</code> only if a particular condition holds,
we
+ must return <code>in</code>. This will replace <code>*in</code> by
+ <code>in</code> itself.
+</para>
+
+<para>
+ The second thing to note is that the return type of
<code>pre_bin_op</code>
+ is <code>Expr</code> instead of <code>Bin_op</code>. This means that we
can
+ replace a binary operator node by another other expression node. The <xref
+ linkend="maketeatheory" endterm="maketeatheory.title"> explains exactly
how
+ the signatures for the <code>pre</code> and <code>post</code> methods are
+ derived, but in most cases they are what you'd expect. The easiest way to
+ check is to simply look them up in
+ <filename><AST_transform.h></filename>.
+</para>
</section>
<section id="implementation">
<title>The Implementation</title>
-<para> We wanted to get rid of useless concatenation operators. To be
precise,
-if the binary operator is the concatenation operator, and the left operand
is
-the empty string, we want to replace the node by the right operand;
similarly,
-if the right operand is the empty string, we want to replace the operator
by
-its left operand. Here's the full transform: </para>
+<para>
+ We wanted to get rid of useless concatenation operators. To be precise, if
+ the binary operator is the concatenation operator, and the left operand is
+ the empty string, we want to replace the node by the right operand;
+ similarly, if the right operand is the empty string, we want to replace
the
+ operator by its left operand. Here's the full transform:
+</para>
<programlisting>
<reserved>class</reserved> Remove_concat_null :
<reserved>public</reserved> Transform
@@ -170,7 +199,7 @@
<reserved>public</reserved>:
Expr* post_bin_op(Bin_op* in)
{
- STRING* empty = <reserved>new</reserved>
STRING(<reserved>new</reserved> String(""), <reserved>new</reserved>
String(""));
+ STRING* empty = <reserved>new</reserved>
STRING(<reserved>new</reserved> String(""));
Wildcard<Expr>* wildcard = <reserved>new</reserved>
Wildcard<Expr>;
<emphasis>// Replace with right operand if left operand is the empty
string</emphasis>
@@ -186,45 +215,41 @@
}
</programlisting>
-<para> We already explained what <code>match</code> does in <xref
-linkend="treetutorial2">, but we have not yet explained the use of
wildcards.
-If you are using a wildcard (<code>WILDCARD</code>) in a pattern passed to
-<code>match</code>, <code>match</code> will not take that subtree into
account.
-Thus, </para>
+<para>
+ We already explained what <code>match</code> does in <xref
+ linkend="treetutorial2">, but we have not yet explained the use of
+ wildcards. If you are using a wildcard (<code>WILDCARD</code>) in a
pattern
+ passed to <code>match</code>, <code>match</code> will not take that
subtree
+ into account. Thus,
+</para>
<programlisting>
<reserved>if</reserved>(in->match(<reserved>new</reserved>
Bin_op(empty, WILDCARD, ".")))
</programlisting>
-<para> can be paraphrased as “is <code>in</code> a binary operator
with
-the empty string as the left operand and <code>"."</code> as the operator
(I
-don't care about the right operand)?“ If the match succeeded, you can
-find out which expression was matched by the wildcard by accessing
-<code>wildcard->value</code>, although we do not use that particular
feature of
-wildcards in this example. </para>
-
-<para> Note that the constructor for <code>STRING</code> has two
-arguments: one corresponds to the value of the string, and one corresponds
to
-the representation of the string in the source (see also the explanation
of the
-token classes in <xref linkend="treetutorial2">). For most strings, both of
-these values are the same; however, in some cases they are different. For
-example, <code>value</code> might be set to
-<code>“/home/joe/myscript.php</code>, while <code>source_rep</code>
is
-set to <code>__FILE__</code>. </para>
+<para>
+ can be paraphrased as “is <code>in</code> a binary operator with the
+ empty string as the left operand and <code>"."</code> as the operator (I
+ don't care about the right operand)?“ If the match succeeded, you
can
+ find out which expression was matched by the wildcard by accessing
+ <code>wildcard->value</code>.
+</para>
</section>
<section>
<title> Running Transformations </title>
-<para> Recall from the previous two tutorials that visitors are run with a
call
-to <code>visit</code>: </para>
+<para>
+ Recall from the previous two tutorials that visitors are run with a call
to
+ <code>visit</code>:
+</para>
<programlisting>
-<reserved>extern</reserved> "C" <reserved>void</reserved>
process_ast(PHP_script* php_script)
+<reserved>extern</reserved> "C" <reserved>void</reserved> run_ast
(PHP_script* in, Pass_manager* pm, String* option)
{
SomeVisitor visitor;
- php_script->visit(&visitor);
+ in->visit(&visitor);
}
</programlisting>
@@ -232,15 +257,16 @@
<code>transform_children</code>: </para>
<programlisting>
-<reserved>extern</reserved> "C" <reserved>void</reserved>
process_ast(PHP_script* php_script)
+<reserved>extern</reserved> "C" <reserved>void</reserved> run_ast
(PHP_script* in, Pass_manager* pm, String* option)
{
SomeTransform transform;
- php_script->transform_children(&transform);
+ in->transform_children(&transform);
}
</programlisting>
-<para> We invoke <code>transform_children</code> because we should not
replace
-the top-level node in the AST (the <code>PHP_script</code> node itself).
+<para>
+ We invoke <code>transform_children</code> because we should not replace
the
+ top-level node in the AST (the <code>PHP_script</code> node itself).
</para>
</section>
@@ -248,47 +274,60 @@
<title> A Subtlety </title>
-<para> If you don't understand this section right now, don't worry about
it;
-you might find it useful to read it again after having gained some
experience
-with the transformation API. </para>
-
-<para> We have implemented the transform as a
-<emphasis>post-</emphasis>transform rather than a <emphasis>pre-</emphasis>
-transform. Why? Suppose we implemented the transform as a pre-transform.
-Consider the following PHP expression (bracketed explicitly for emphasis:)
+<para>
+ If you don't understand this section right now, don't worry about it; you
+ might find it useful to read it again after having gained some experience
+ with the transformation API.
+</para>
+
+<para>
+ We have implemented the transform as a <emphasis>post-</emphasis>transform
+ rather than a <emphasis>pre-</emphasis> transform. Why? Suppose we
+ implemented the transform as a pre-transform. Consider the following PHP
+ expression (bracketed explicitly for emphasis:)
</para>
<programlisting>
("" . $a) . ""
</programlisting>
-<para> The first binary operator we encounter is the second one (get &phc
to
-print the tree if you don't see why.) So, we apply the transform and
replace
-the operator by its left operand, which happens to be <code>("" .
$a)</code>.
-We then continue <emphasis>and transform the children of the that
-node</emphasis>, because that is how the tree transform API is defined.
But the
-<emphasis>children</emphasis> of that node are <code>""</code> and
-<code>$a</code>. So, that means that the other binary operator itself will
-never be processed! </para>
-
-<para> There are two solutions to this problem. The first is the one we
used
-above, and use a post-transform instead of a pre-transform. You should try
to
-reason out why this works, but a rule of thumb is that unless there is a
good
-reason to use a pre-transform, it's safer to use the post-transform,
because in
-the post-transform the children of the node have already been transformed,
so
-that you are looking at the “final” version of the node.
</para>
-
-<para> The second solution is to use a pre-transform, but explicitly tell
&phc;
-to transform the new node in turn. This is the less elegant solution, but
-sometimes this is the only solution that will work (see for example the
-<code>Token_conversion</code> transform in the &phc source tree). To do
this,
-you would replace </para>
+<para>
+ The first binary operator we encounter is the second one (get &phc to
print
+ the tree if you don't see why.) So, we apply the transform and replace the
+ operator by its left operand, which happens to be <code>("" . $a)</code>.
+ We then continue <emphasis>and transform the children of the that
+ node</emphasis>, because that is how the tree transform API is defined.
But
+ the <emphasis>children</emphasis> of that node are <code>""</code> and
+ <code>$a</code>. So, that means that the other binary operator itself will
+ never be processed!
+</para>
+
+<para>
+ There are two solutions to this problem. The first is the one we used
above,
+ and use a post-transform instead of a pre-transform. You should try to
+ reason out why this works, but a rule of thumb is that unless there is a
+ good reason to use a pre-transform, it's safer to use the post-transform,
+ because in the post-transform the children of the node have already been
+ transformed, so that you are looking at the “final” version of
+ the node.
+</para>
+
+<para>
+ The second solution is to use a pre-transform, but explicitly tell &phc;
to
+ transform the new node in turn. This is the less elegant solution, but
+ sometimes this is the only solution that will work (see for example the
+ <code>Token_conversion</code> transform in the
+ <filename>src/process_ast/Token_conversion.cpp</filename>). To do this,
you
+ would replace
+</para>
<programlisting>
<reserved>return</reserved> in->right;
</programlisting>
-<para> by </para>
+<para>
+ by
+</para>
<programlisting>
<reserved>return</reserved> in->right->pre_transform(this);
@@ -299,9 +338,11 @@
<title> What's Next? </title>
-<para> The next tutorial in this series, <xref linkend="treetutorial4"
-endterm="treetutorial4.title">, introduces a very important notion in
transforms: the
-use of <emphasis>state</emphasis>. </para>
+<para>
+ The next tutorial in this series, <xref linkend="treetutorial4"
+ endterm="treetutorial4.title">, introduces a very important notion in
+ transforms: the use of <emphasis>state</emphasis>.
+</para>
</section>
</chapter>
More information about the phc-internals
mailing list