A pull-parser interface to parsing pod
my $parser = SomePodProcessor->new; $parser->set_source( "whatever.pod" ); $parser->run;
Or:
my $parser = SomePodProcessor->new; $parser->set_source( $some_filehandle_object ); $parser->run;
Or:
my $parser = SomePodProcessor->new; $parser->set_source( \$document_source ); $parser->run;
Or:
my $parser = SomePodProcessor->new; $parser->set_source( \@document_lines ); $parser->run;
And elsewhere:
require 5; package SomePodProcessor; use strict; use base qw(Pod::Simple::PullParser);
sub run { my $self = shift; Token: while(my $token = $self->get_token) { ...process each token... } }
This class is for using Pod::Simple to build a Pod processor \*(-- but one that uses an interface based on a stream of token objects, instead of based on events.
This is a subclass of Pod::Simple and inherits all its methods.
A subclass of Pod::Simple::PullParser should define a \*(C`run\*(C' method that calls \*(C`$token = $parser->get_token\*(C' to pull tokens.
See the source for Pod::Simple::RTF for an example of a formatter that uses Pod::Simple::PullParser.
This returns the next token object (which will be of a subclass of Pod::Simple::PullParserToken), or undef if the parser-stream has hit the end of the document. This restores the token object(s) to the front of the parser stream.
The source has to be set before you can parse anything. The lowest-level way is to call \*(C`set_source\*(C':
Or you can call these methods, which Pod::Simple::PullParser has defined to work just like Pod::Simple's same-named methods:
For those to work, the Pod-processing subclass of Pod::Simple::PullParser has to have defined a $parser->run method \*(-- so it is advised that all Pod::Simple::PullParser subclasses do so. See the Synopsis above, or the source for Pod::Simple::RTF.
Authors of formatter subclasses might find these methods useful to call on a parser object that you haven't started pulling tokens from yet: This tries to get the title string out of $parser, by getting some tokens, and scanning them for the title, and then ungetting them so that you can process the token-stream from the beginning. For example, suppose you have a document that starts out: =head1 NAME
Hoo::Boy::Wowza -- Stuff B<wow> yeah! $parser->get_title on that document will return \*(L"Hoo::Boy::Wowza \*(-- Stuff wow yeah!\*(R". If the document starts with: =head1 Name
Hoo::Boy::W00t -- Stuff B<w00t> yeah! Then you'll need to pass the \*(C`nocase\*(C' option in order to recognize \*(L"Name\*(R": $parser->get_title(nocase => 1); In cases where get_title can't find the title, it will return empty-string (""). This is just like get_title, except that it returns just the modulename, if the title seems to be of the form \*(L"SomeModuleName \*(-- description\*(R". For example, suppose you have a document that starts out: =head1 NAME
Hoo::Boy::Wowza -- Stuff B<wow> yeah! then $parser->get_short_title on that document will return \*(L"Hoo::Boy::Wowza\*(R". But if the document starts out: =head1 NAME
Hooboy, stuff B<wow> yeah! then $parser->get_short_title on that document will return \*(L"Hooboy, stuff wow yeah!\*(R". If the document starts with: =head1 Name
Hoo::Boy::W00t -- Stuff B<w00t> yeah! Then you'll need to pass the \*(C`nocase\*(C' option in order to recognize \*(L"Name\*(R": $parser->get_short_title(nocase => 1); If the title can't be found, then get_short_title returns empty-string (""). This works like get_title except that it returns the contents of the \*(L"=head1 AUTHOR\n\nParagraph...\n\*(R" section, assuming that that section isn't terribly long. To recognize a \*(L"=head1 Author\n\nParagraph\n\*(R" section, pass the \*(C`nocase\*(C' option: $parser->get_author(nocase => 1); (This method tolerates \*(L"\s-1AUTHORS\s0\*(R" instead of \*(L"\s-1AUTHOR\s0\*(R" too.) This works like get_title except that it returns the contents of the \*(L"=head1 DESCRIPTION\n\nParagraph...\n\*(R" section, assuming that that section isn't terribly long. To recognize a \*(L"=head1 Description\n\nParagraph\n\*(R" section, pass the \*(C`nocase\*(C' option: $parser->get_description(nocase => 1); This works like get_title except that it returns the contents of the \*(L"=head1 VERSION\n\n[\s-1BIG\s0 \s-1BLOCK\s0]\n\*(R" block. Note that this does \s-1NOT\s0 return the module's $VERSION!! To recognize a \*(L"=head1 Version\n\n[\s-1BIG\s0 \s-1BLOCK\s0]\n\*(R" section, pass the \*(C`nocase\*(C' option: $parser->get_version(nocase => 1);
You don't actually have to define a \*(C`run\*(C' method. If you're writing a Pod-formatter class, you should define a \*(C`run\*(C' just so that users can call \*(C`parse_file\*(C' etc, but you don't have to.
And if you're not writing a formatter class, but are instead just writing a program that does something simple with a Pod::PullParser object (and not an object of a subclass), then there's no reason to bother subclassing to add a \*(C`run\*(C' method.
Pod::Simple
Pod::Simple::PullParserToken \*(-- and its subclasses Pod::Simple::PullParserStartToken, Pod::Simple::PullParserTextToken, and Pod::Simple::PullParserEndToken.
HTML::TokeParser, which inspired this.
Questions or discussion about \s-1POD\s0 and Pod::Simple should be sent to the [email protected] mail list. Send an empty email to [email protected] to subscribe.
This module is managed in an open GitHub repository, https://github.com/theory/pod-simple/ <https://github.com/theory/pod-simple/>. Feel free to fork and contribute, or to clone git://github.com/theory/pod-simple.git <git://github.com/theory/pod-simple.git> and send patches!
Patches against Pod::Simple are welcome. Please send bug reports to <[email protected]>.
Copyright (c) 2002 Sean M. Burke.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.
Pod::Simple was created by Sean M. Burke <[email protected]>. But don't bother him, he's retired.
Pod::Simple is maintained by:
Allison Randal \*(C`[email protected]\*(C'
Hans Dieter Pearcey \*(C`[email protected]\*(C'
David E. Wheeler \*(C`[email protected]\*(C'