SYNOPSIS

 use HTML::Microformats;

 my $doc = HTML::Microformats
             ->new_document($html, $uri)
             ->assume_profile(qw(hCard hCalendar));
 print $doc->json(pretty => 1);

 use RDF::TrineShortcuts qw(rdf_query);
 my $results = rdf_query($sparql, $doc->model);

DESCRIPTION

The HTML::Microformats module is a wrapper for parser and handler modules of various individual microformats (each of those modules has a name like HTML::Microformats::Format::Foo).

The general pattern of usage is to create an HTML::Microformats object (which corresponds to an \s-1HTML\s0 document) using the \*(C`new_document\*(C' method; then ask for the data, as a Perl hashref, a \s-1JSON\s0 string, or an RDF::Trine model.

Constructor

Constructs a document object. $html is the \s-1HTML\s0 or \s-1XHTML\s0 source (string) or an XML::LibXML::Document. $uri is the document \s-1URI\s0, important for resolving relative \s-1URL\s0 references. %opts are additional parameters; currently only one option is defined: $opts{'type'} is set to 'text/html' or 'application/xhtml+xml', to control how $html is parsed.

Profile Management

HTML::Microformats uses \s-1HTML\s0 profiles (i.e. the profile attribute on the \s-1HTML\s0 <head> element) to detect which Microformats are used on a page. Any microformats which do not have a profile \s-1URI\s0 declared will not be parsed.

Because many pages fail to properly declare which profiles they use, there are various profile management methods to tell HTML::Microformats to assume the presence of particular profile URIs, even if they're actually missing. This method returns a list of profile URIs declared by the document. This method returns true if and only if one or more of the profile URIs in @profiles is declared by the document. Using \*(C`add_profile\*(C' you can add one or more profile URIs, and they are treated as if they were found on the document. For example: $doc->add_profile('http://microformats.org/profile/rel-tag') This is useful for adding profile URIs declared outside the document itself (e.g. in \s-1HTTP\s0 headers). Returns a reference to the document. For example: $doc->assume_profile(qw(hCard adr geo)) This method acts similarly to \*(C`add_profile\*(C' but allows you to use names of microformats rather than URIs. Microformat names are case sensitive, and must match HTML::Microformats::Format::Foo module names. Returns a reference to the document. This method is equivalent to calling \*(C`assume_profile\*(C' for all known microformats. Returns a reference to the document.

Parsing Microformats

Generally speaking, you can skip this. The \*(C`data\*(C', \*(C`json\*(C' and \*(C`model\*(C' methods will automatically do this for you. Scans through the document, finding microformat objects. On subsequent calls, does nothing (as everything is already parsed). Returns a reference to the document. Forgets information gleaned by \*(C`parse_microformats\*(C' and thus allows \*(C`parse_microformats\*(C' to be run again. This is useful if you've modified added some profiles between runs of \*(C`parse_microformats\*(C'. Returns a reference to the document.

Retrieving Data

These methods allow you to retrieve the document's data, and do things with it. $format is, for example, 'hCard', 'adr' or 'RelTag'. Returns a list of objects of that type. (If called in scalar context, returns an arrayref.) Each object is, for example, an HTML::Microformat::hCard object, or an HTML::Microformat::RelTag object, etc. See the relevent documentation for details. Returns a hashref of data. Each hashref key is the name of a microformat (e.g. 'hCard', 'RelTag', etc), and the values are arrayrefs of objects. Each object is, for example, an HTML::Microformat::hCard object, or an HTML::Microformat::RelTag object, etc. See the relevent documentation for details. Returns data roughly equivalent to the \*(C`all_objects\*(C' method, but as a \s-1JSON\s0 string. %opts is a hash of options, suitable for passing to the \s-1JSON\s0 module's to_json function. The 'convert_blessed' and 'utf8' options are enabled by default, but can be disabled by explicitly setting them to 0, e.g. print $doc->json( pretty=>1, canonical=>1, utf8=>0 ); Returns data as an RDF::Trine::Model, suitable for serialising as \s-1RDF\s0 or running \s-1SPARQL\s0 queries. As \*(C`model\*(C' but returns a string. Adds data to an existing RDF::Trine::Model. Returns a reference to the document.

Utility Functions

Returns a list of Perl modules, each of which implements a specific microformat. As per \*(C`modules\*(C', but strips 'HTML::Microformats::Format::' off the module name, and sorts alphabetically.

WHY ANOTHER MICROFORMATS MODULE?

There already exist two microformats packages on \s-1CPAN\s0 (see Text::Microformat and Data::Microformat), so why create another?

Firstly, HTML::Microformats isn't being created from scratch. It's actually a fork/clean-up of a non-CPAN application (Swignition), and in that sense predates Text::Microformat (though not Data::Microformat).

It has a number of other features that distinguish it from the existing packages:

  • It supports more formats. HTML::Microformats supports hCard, hCalendar, rel-tag, geo, adr, rel-enclosure, rel-license, hReview, hResume, hRecipe, xFolk, \s-1XFN\s0, hAtom, hNews and more.

  • It supports more patterns. HTML::Microformats supports the include pattern, abbr pattern, table cell header pattern, value excerpting and other intricacies of microformat parsing better than the other modules on \s-1CPAN\s0.

  • It offers \s-1RDF\s0 support. One of the key features of HTML::Microformats is that it makes data available as RDF::Trine models. This allows your application to benefit from a rich, feature-laden Semantic Web toolkit. Data gleaned from microformats can be stored in a triple store; output in \s-1RDF/XML\s0 or Turtle; queried using the \s-1SPARQL\s0 or \s-1RDQL\s0 query languages; and more. If you're not comfortable using \s-1RDF\s0, HTML::Microformats also makes all its data available as native Perl objects.

BUGS

Please report any bugs to <http://rt.cpan.org/>.

RELATED TO HTML::Microformats…

HTML::Microformats::Documentation::Notes.

Individual format modules:

  • HTML::Microformats::Format::adr

  • HTML::Microformats::Format::figure

  • HTML::Microformats::Format::geo

  • HTML::Microformats::Format::hAtom

  • HTML::Microformats::Format::hAudio

  • HTML::Microformats::Format::hCalendar

  • HTML::Microformats::Format::hCard

  • HTML::Microformats::Format::hListing

  • HTML::Microformats::Format::hMeasure

  • HTML::Microformats::Format::hNews

  • HTML::Microformats::Format::hProduct

  • HTML::Microformats::Format::hRecipe

  • HTML::Microformats::Format::hResume

  • HTML::Microformats::Format::hReview

  • HTML::Microformats::Format::hReviewAggregate

  • HTML::Microformats::Format::OpenURL_COinS

  • HTML::Microformats::Format::RelEnclosure

  • HTML::Microformats::Format::RelLicense

  • HTML::Microformats::Format::RelTag

  • HTML::Microformats::Format::species

  • HTML::Microformats::Format::VoteLinks

  • HTML::Microformats::Format::XFN

  • HTML::Microformats::Format::XMDP

  • HTML::Microformats::Format::XOXO

Similar modules: RDF::RDFa::Parser, HTML::HTML5::Microdata::Parser, XML::Atom::Microformats, Text::Microformat, Data::Microformats.

Related web sites: <http://microformats.org/>, <http://www.perlrdf.org/>.

AUTHOR

Toby Inkster <[email protected]>.

COPYRIGHT AND LICENCE

Copyright 2008-2012 Toby Inkster

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

DISCLAIMER OF WARRANTIES

\s-1THIS\s0 \s-1PACKAGE\s0 \s-1IS\s0 \s-1PROVIDED\s0 \*(L"\s-1AS\s0 \s-1IS\s0\*(R" \s-1AND\s0 \s-1WITHOUT\s0 \s-1ANY\s0 \s-1EXPRESS\s0 \s-1OR\s0 \s-1IMPLIED\s0 \s-1WARRANTIES\s0, \s-1INCLUDING\s0, \s-1WITHOUT\s0 \s-1LIMITATION\s0, \s-1THE\s0 \s-1IMPLIED\s0 \s-1WARRANTIES\s0 \s-1OF\s0 \s-1MERCHANTIBILITY\s0 \s-1AND\s0 \s-1FITNESS\s0 \s-1FOR\s0 A \s-1PARTICULAR\s0 \s-1PURPOSE\s0.