VERSION

This document describes Makefile::DOM 0.006 released on 28 August 2011.

DESCRIPTION

This library can serve as an advanced lexer for (\s-1GNU\s0) makefiles. It parses makefiles as \*(L"documents\*(R" and the parsing is lossless. The results are data structures similar to \s-1DOM\s0 trees. The \s-1DOM\s0 trees hold every single bit of the information in the original input files, including white spaces, blank lines and makefile comments. That means it's possible to reproduce the original makefiles from the \s-1DOM\s0 trees. In addition, each node of the \s-1DOM\s0 trees is modifiable and so is the whole tree, just like the \s-1PPI\s0 module used for Perl source parsing and the HTML::TreeBuilder module used for parsing \s-1HTML\s0 source.

If you're looking for a true \s-1GNU\s0 make parser that generates an \s-1AST\s0, please see Makefile::Parser::GmakeDB instead.

The interface of \*(C`Makefile::DOM\*(C' mimics the \s-1API\s0 design of \s-1PPI\s0. In fact, I've directly stolen the source code and \s-1POD\s0 documentation of PPI::Node, PPI::Element, and PPI::Dumper, with the full permission from the author of \s-1PPI\s0, Adam Kennedy.

\*(C`Makefile::DOM\*(C' tries to be independent of specific makefile's syntax. The same set of \s-1DOM\s0 node types is supposed to get shared by different makefile \s-1DOM\s0 generators. For example, MDOM::Document::Gmake parses \s-1GNU\s0 makefiles and returns an instance of MDOM::Document, i.e., the root of the \s-1DOM\s0 tree while the \s-1NMAKE\s0 makefile lexer in the future, \*(C`MDOM::Document::Nmake\*(C', also returns instances of the MDOM::Document class. Later, I'll also consider adding support for dmake and bsdmake.

Structure of the DOM

Makefile \s-1DOM\s0 (\s-1MDOM\s0) is a structured set of a series of data types. They provide a flexible document model conformed to the makefile syntax. Below is a complete list of the 19 \s-1MDOM\s0 classes in the current implementation where the indentation indicates the class inheritance relationships.

    MDOM::Element
        MDOM::Node
            MDOM::Unknown
            MDOM::Assignment
            MDOM::Command
            MDOM::Directive
            MDOM::Document
                MDOM::Document::Gmake
            MDOM::Rule
                MDOM::Rule::Simple
                MDOM::Rule::StaticPattern
        MDOM::Token
            MDOM::Token::Bare
            MDOM::Token::Comment
            MDOM::Token::Continuation
            MDOM::Token::Interpolation
            MDOM::Token::Modifier
            MDOM::Token::Separator
            MDOM::Token::Whitespace

It's not hard to see that all of the \s-1MDOM\s0 classes inherit from the MDOM::Element class. MDOM::Token and MDOM::Node are its direct children. The former represents a string token which is atomic from the perspective of the lexer while the latter represents a structured node, which usually has one or more children, and serves as the container for other DOM::Element objects.

Next we'll show a few examples to demonstrate how to map \s-1DOM\s0 trees to particular makefiles.

Case 1

Consider the following simple \*(L"hello, world\*(R" makefile: all : ; echo "hello, world" We can use the MDOM::Dumper class provided by Makefile::DOM to dump out the internal structure of its corresponding \s-1MDOM\s0 tree: MDOM::Document::Gmake MDOM::Rule::Simple MDOM::Token::Bare 'all' MDOM::Token::Whitespace ' ' MDOM::Token::Separator ':' MDOM::Token::Whitespace ' ' MDOM::Command MDOM::Token::Separator ';' MDOM::Token::Whitespace ' ' MDOM::Token::Bare 'echo "hello, world"' MDOM::Token::Whitespace '\n' In this example, speparators \*(C`:\*(C' and \*(C`;\*(C' are all instances of the MDOM::Token::Separator class while spaces and new line characters are all represented as MDOM::Token::Whitespace. The other two leaf nodes, \*(C`all\*(C' and \*(C`echo "hello, world"\*(C' both belong to MDOM::Token::Bare. It's worth mentioning that, the space characters in the rule command \*(C`echo "hello, world"\*(C' were not represented as MDOM::Token::Whitespace. That's because in makefiles, the spaces in commands do not make any sense to \*(C`make\*(C' in syntax; those spaces are usually sent to shell programs verbatim. Therefore, the \s-1DOM\s0 parser does not try to recognize those spaces specifially so as to reduce memory use and the number of nodes. However, leading spaces and trailing new lines will still be recognized as MDOM::Token::Whitespace. On a higher level, it's a MDOM::Rule::Simple instance holding several \*(C`Token\*(C' and one MDOM::Command. On the highest level, it's the root node of the whole \s-1DOM\s0 tree, i.e., an instance of MDOM::Document::Gmake.

Case 2

Below is a relatively complex example: a: foo.c bar.h $(baz) # hello! @echo ... It's corresponding \s-1DOM\s0 structure is MDOM::Document::Gmake MDOM::Rule::Simple MDOM::Token::Bare 'a' MDOM::Token::Separator ':' MDOM::Token::Whitespace ' ' MDOM::Token::Bare 'foo.c' MDOM::Token::Whitespace ' ' MDOM::Token::Bare 'bar.h' MDOM::Token::Whitespace '\t' MDOM::Token::Interpolation '$(baz)' MDOM::Token::Whitespace ' ' MDOM::Token::Comment '# hello!' MDOM::Token::Whitespace '\n' MDOM::Command MDOM::Token::Separator '\t' MDOM::Token::Modifier '@' MDOM::Token::Bare 'echo ...' MDOM::Token::Whitespace '\n' Compared to the previous example, here appears several new node types. The variable interpolation \*(C`$(baz)\*(C' on the first line of the original makefile corresponds to a MDOM::Token::Interpolation node in its \s-1MDOM\s0 tree. Similarly, the comment \*(C`# hello\*(C' corresponds to a MDOM::Token::Comment node. On the second line, the rule command indented by a tab character is still represented by a MDOM::Command object. Its first child node (or its first element) is also an MDOM::Token::Seperator instance corresponding to that tab. The command modifier \*(C`@\*(C' follows the \*(C`Separator\*(C' immediately, which is of type MDOM::Token::Modifier.

Case 3

Now let's study a sample makefile with various global structures: a: b foo = bar # hello! Here on the top level, there are three language structures: one rule "\*(C`a: b\*(C'\*(L", one assignment statement \*(R"foo = bar", and one comment \*(C`# hello!\*(C'. Its \s-1MDOM\s0 tree is shown below: MDOM::Document::Gmake MDOM::Rule::Simple MDOM::Token::Bare 'a' MDOM::Token::Separator ':' MDOM::Token::Whitespace ' ' MDOM::Token::Bare 'b' MDOM::Token::Whitespace '\n' MDOM::Assignment MDOM::Token::Bare 'foo' MDOM::Token::Whitespace ' ' MDOM::Token::Separator '=' MDOM::Token::Whitespace ' ' MDOM::Token::Bare 'bar' MDOM::Token::Whitespace '\n' MDOM::Token::Whitespace '\t' MDOM::Token::Comment '# hello!' MDOM::Token::Whitespace '\n' We can see that below the root node MDOM::Document::Gmake, there are MDOM::Rule::Simple, MDOM::Assignment, and MDOM::Comment three elements, as well as two MDOM::Token::Whitespace objects.

It can be observed from the examples above that the \s-1MDOM\s0 representation for the makefile's lexical elements is rather loose. It only provides very limited structural representation instead of making a bad guess.

OPERATIONS FOR MDOM TREES

Generating an \s-1MDOM\s0 tree from a \s-1GNU\s0 makefile only requires two lines of Perl code:

use MDOM::Document::Gmake; my $dom = MDOM::Document::Gmake->new('Makefile');

If the makefile source code being parsed is already stored in a Perl variable, say, $var, then we can construct an \s-1MDOM\s0 via the following code:

my $dom = MDOM::Document::Gmake->new(\$var);

Now $dom becomes the reference to the root of the \s-1MDOM\s0 tree and its type is now MDOM::Document::Gmake, which is also an instance of the MDOM::Node class.

Just as mentioned above, \*(C`MDOM::Node\*(C' is the container for other MDOM::Element instances. So we can retrieve some element node's value via its \*(C`child\*(C' method:

$node = $dom->child(3); # or $node = $dom->elements(0);

And we may also use the \*(C`elements\*(C' method to obtain the values of all the nodes:

@elems = $dom->elements;

For every \s-1MDOM\s0 node, its corresponding makefile source can be generated by invoking its \*(C`content\*(C' method.

BUGS AND TODO

The current implemenation of the MDOM::Document::Gmake lexer is based on a hand-written state machie. Although the efficiency of the engine is not bad, the code is rather complicated and messy, which hurts both extensibility and maintanabilty. So it's expected to rewrite the parser using some grammatical tools like the Perl 6 regex engine Pugs::Compiler::Rule or a yacc-style one like Parse::Yapp.

SOURCE REPOSITORY

You can always get the latest source code of this module from its GitHub repository:

http://github.com/agentzh/makefile-dom-pm <http://github.com/agentzh/makefile-dom-pm>

If you want a commit bit, please let me know.

AUTHOR

Zhang \*(L"agentzh\*(R" Yichun (\s-1XXX\s0) <[email protected]>

COPYRIGHT

Copyright 2006-2011 by Zhang \*(L"agentzh\*(R" Yichun (\s-1XXX\s0).

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

RELATED TO Makefile::DOM…

MDOM::Document, MDOM::Document::Gmake, \s-1PPI\s0, Makefile::Parser::GmakeDB, makesimple.