Perlsax handlers for building tree structures
use XML::Handler::Trees; use XML::Parser::PerlSAX; my $p=XML::Parser::PerlSAX->new(); my $h=XML::Handler::Tree->new(); my $tree=$p->parse(Handler=>$h,Source=>{SystemId=>'file.xml'}); my $p=XML::Parser::PerlSAX->new(); my $h=XML::Handler::EasyTree->new(Noempty=>1); my $easytree=$p->parse(Handler=>$h,Source=>{SystemId=>'file.xml'}); my $p=XML::Parser::PerlSAX->new(); my $h=XML::Handler::TreeBuilder->new(); $h->store_pis(1); my $tree=$p->parse(Handler=>$h,Source=>{SystemId=>'file.xml'});
XML::Handler::Trees provides three PerlSAX handler classes for building tree structures. XML::Handler::Tree builds the same type of tree as the \*(L"Tree\*(R" style in XML::Parser. XML::Handler::EasyTree builds the same type of tree as the \*(L"EasyTree\*(R" style added to XML::Parser by XML::Parser::EasyTree. XML::Handler::TreeBuilder builds the same type of tree as Sean M. Burke's XML::TreeBuilder. These classes make it possible to construct these tree structures from sources other than XML::Parser.
All three handlers can be driven by either PerlSAX 1 or PerlSAX 2 drivers. In all cases, the end_document() method returns a reference to the constructed tree, which normally becomes the return value of the PerlSAX driver.
This handler builds the same type of tree structure as the \*(L"Tree\*(R" style in XML::Parser. Some modules such as Dan Brian's XML::SimpleObject work with this type of tree. See the documentation for XML::Parser for details.
Creates a handler object.
This handler builds a lightweight tree structure representing the \s-1XML\s0 document. This structure is, at least in this author's opinion, easier to work with than the \*(L"standard\*(R" style of tree. It is the same type of structure as built by XML::Parser when using XML::Parser::EasyTree, or by the get_simple_tree method in XML::Records.
The tree is returned as a reference to an array of tree nodes, each of which is a hash reference. All nodes have a 'type' key whose value is the type of the node: 'e' for element nodes, 't' for text nodes, and 'p' for processing instruction nodes. All nodes also have a 'content' key whose value is a reference to an array holding the element's child nodes for element nodes, the string value for text nodes, and the data value for processing instruction nodes. Element nodes also have an 'attrib' key whose value is a reference to a hash of attribute names and values and a 'name' key whose value is the element's name. Processing instructions also have a 'target' key whose value is the \s-1PI\s0's target.
EasyTree nodes are ordinary Perl hashes and are not objects. Contiguous runs of text are always returned in a single node.
The reason the parser returns an array reference rather than the root element's node is that an \s-1XML\s0 document can legally contain processing instructions outside the root element (the xml-stylesheet \s-1PI\s0 is commonly used this way).
If namespace information is available (only possible with PerlSAX 2), element and attribute names will be prefixed with their (possibly empty) namespace \s-1URI\s0 enclosed in curly brackets, and namespace prefixes will be stripped from names.
Creates a handler object. Options can be provided hash-style:
If this is set to a true value, text nodes consisting entirely of whitespace will not be stored in the tree. The default is false.
If this is set to a true value, characters with Unicode values in the Latin-1 range (160-255) will be stored in the tree as Latin-1 rather than \s-1UTF-8\s0. The default is false.
If this is set to a true value, the parser will return a tree of XML::Handler::EasyTree::Searchable objects rather than bare array references, providing access to the navigation methods listed below. The top-level node returned will be a dummy element node with a name of \*(L"_\|_TOPLEVEL_\|_\*(R". It is false by default. Setting this option automatically enables the Noempty option.
If the Searchable option is set, all nodes in the tree will be XML::Handler::EasyTree::Searchable objects, which have the same structure as EasyTree nodes but also implement the following methods similar to those in XML::SimpleObject. Returns the name of the node. Ideally, it should return a \*(L"fully qualified\*(R" name, but it doesn't. Returns the text value associated with a node object. Returns undef if the node has no text children or its first child is not a text node. Returns a child (elements only) of the object with the $name. For the case where there is more than one child that match $name, the array context semantics haven't been completely worked out: - in an array context, all children are returned. - in scalar context, the first child matching $name is returned. In a scalar context, The XML::Parser::SimpleObj class returns an object containing all the children matching $name, unless there is only one child in which case it returns that child (see commented code). I find that behavior confusing. Returns a list of all children (elements only) of the $obj that match $name \*(-- in the order in which they appeared in the original xml text. Returns a list of all the names of the objects children (elements only) in the order in which they appeared in the original text. Returns the string associated with the attribute of the object. If not found returns a null string. Returns a list (in no particular order) of the attribute names associated with the object Returns a textual representation (in xml form) of the object's hierarchy. Only elements are processed. The result will be in whatever character encoding the \s-1SAX\s0 driver delivered (which may not be the same encoding as the original source). Identical to dump_tree(), except that newline and indentation embellishments are added
#! /usr/bin/perl -w
use XML::Handler::Trees; use XML::Parser::PerlSAX; use strict;
my $p=XML::Parser::PerlSAX->new(); my $h=XML::Handler::EasyTree->new( Searchable=>1 ); my $easytree=$p->parse( Handler => $h, Source => { SystemId => 'systemB.xml' } );
my $vme = $easytree->child( "vmesystem" );
print "\n"; print "vmesystem config: ", $vme->attribute( "configuration_name" ), "\n";
print "\n"; print "vmesystem children: ", join( ', ', $vme->children_names() ), "\n";
print "\n"; print "gps model is ", $vme->child( "gps" )->child( "model" )->value(), "\n"; my $gps = $vme->child( "gps" ); print "gps slot is ", $gps->child( "slot" )->value(), "\n";
print "\n"; print "reconstructed XML: \n"; print $easytree->dump_tree(), "\n";
# print "\n"; # print "recontructed XML (pretty): \n"; # print $easytree->pretty_dump_tree(), "\n";
print "\n"; exit;
This handler builds \s-1XML\s0 document trees constructed of XML::Element objects (XML::Element is a subclass of HTML::Element adapted for \s-1XML\s0). To use it, XML::TreeBuilder and its prerequisite HTML::Tree need to be installed. See the documentation for those modules for information on how to work with these tree structures.
Creates a handler which builds a tree rooted in an XML::Element. This determines whether comments will be stored in the tree (not all \s-1SAX\s0 drivers generate comment events). Currently, this is off by default. This determines whether markup declarations will be stored in the tree. Currently, this is off by default. The present implementation does not store markup declarations in any case; this method is provided for future use. This determines whether processing instructions will be stored in the tree. Currently, this is off (false) by default.
Eric Bohlman ([email protected])
PerlSAX 2 compatibility added by Matt Sergeant ([email protected])
XML::EasyTree::Searchable written by Stuart McDow ([email protected])
Copyright (c) 2001 Eric Bohlman.
Portions of this code Copyright (c) 2001 Matt Sergeant.
Portions of this code Copyright (c) 2001 Stuart McDow.
All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
L<perl> L<XML::Parser> L<XML::SimpleObject> L<XML::Parser::EasyTree> L<XML::TreeBuilder> L<XML::Element> L<HTML::Element> L<PerlSAX>