Configuration sets for rdfa parser
The third argument to the constructor for RDF::RDFa::Parser objects is a configuration set. This module provides such configuration sets.
Confguration sets are needed by the parser so that it knows how to handle certain features which vary between different host languages, or different RDFa versions.
All you need to know about is the constructor:
$config = RDF::RDFa::Parser::Config->new($host, $version, %options);
$host is the host language. Generally you would supply one of the following constants; the default is \s-1HOST_XHTML\s0. Internet media types are accepted (e.g. 'text/html' or 'image/svg+xml'), but it's usually better to use a constant as some media types are shared (e.g. \s-1HTML4\s0 and \s-1HTML5\s0 both use the same media type).
RDF::RDFa::Parser::Config->\s-1HOST_ATOM\s0
RDF::RDFa::Parser::Config->\s-1HOST_DATARSS\s0
RDF::RDFa::Parser::Config->\s-1HOST_HTML32\s0
RDF::RDFa::Parser::Config->\s-1HOST_HTML4\s0
RDF::RDFa::Parser::Config->\s-1HOST_HTML5\s0
RDF::RDFa::Parser::Config->\s-1HOST_OPENDOCUMENT_XML\s0 (Flat \s-1XML:\s0 \*(L"\s-1FODT\s0\*(R", \*(L"\s-1FODS\s0\*(R", etc)
RDF::RDFa::Parser::Config->\s-1HOST_OPENDOCUMENT_ZIP\s0 (\*(L"\s-1ODT\s0\*(R", \*(L"\s-1ODS\s0\*(R", etc)
RDF::RDFa::Parser::Config->\s-1HOST_SVG\s0
RDF::RDFa::Parser::Config->\s-1HOST_XHTML\s0
RDF::RDFa::Parser::Config->\s-1HOST_XHTML5\s0
RDF::RDFa::Parser::Config->\s-1HOST_XML\s0
$version is the RDFa version. Generally you would supply one of the following constants; the default is \s-1RDFA_LATEST\s0.
RDF::RDFa::Parser::Config->\s-1RDFA_10\s0
RDF::RDFa::Parser::Config->\s-1RDFA_11\s0
RDF::RDFa::Parser::Config->\s-1RDFA_GUESS\s0
RDF::RDFa::Parser::Config->\s-1RDFA_LATEST\s0
Version guessing: the root element is inspected for an attribute 'version'. If this exists and matches /\bRDFa\s+(\d+\.\d+)\b/i then that is used as the version. Otherwise, the latest version is assumed.
%options is a hash of additional options to use which override the defaults. While many of these are useful, they probably also reduce conformance to the official RDFa specifications. The following options exist; defaults for XHTML+RDFa1.0 and XHTML+RDFa1.1 are shown in brackets.
alt_stylesheet - magic rel=\*(L"alternate stylesheet\*(R". [0]
atom_elements - process <feed> and <entry> specially. [0]
atom_parser - extract Atom 1.0 native semantics. [0]
auto_config - see section \*(L"Auto Config\*(R" [0]
bookmark_start, bookmark_end, bookmark_name - Elements to treat like OpenDocument's <text:bookmark-start> and <text:bookmark-end> element, and associated text:name attribute. Must set all three to use this feature. Use Clark Notation to specify namespaces. [all undef]
cite_attr - support @cite [0]
datetime_attr - support @datetime attribute and \s-1HTML5\s0 <time> element. [0]
default_profiles - \s-1THIS\s0 \s-1OPTION\s0 \s-1IS\s0 \s-1NO\s0 \s-1LONGER\s0 \s-1SUPPORTED\s0!
dom_parser - parser to use to turn a markup string into a \s-1DOM\s0. 'html', 'opendocument' (i.e. zipped \s-1XML\s0) or 'xml'. ['xml']
embedded_rdfxml - find plain \s-1RDF/XML\s0 chunks within document. 0=no, 1=handle, 2=skip. [0]
full_uris - support full URIs in CURIE-only attributes. [0, 1]
graph - enable support for named graphs. [0]
graph_attr - attribute to use for named graphs. Use Clark Notation to specify a namespace. ['graph']
graph_type - graph attribute behaviour ('id' or 'about'). ['about']
graph_default - default graph name. [undef]
initial_context - space-separated list of URIs, which must be keys in %RDF::RDFa::Parser::InitialContext::Known [?]
inlist_attr - support @inlist. [0, 1]
longdesc_attr - support @longdesc [0]
lwp_ua - an LWP::UserAgent to use for \s-1HTTP\s0 requests. [undef]
ns - namespace for RDFa attributes. [undef]
prefix_attr - support @prefix rather than just @xmlns:*. [0, 1]
prefix_bare - support CURIEs with no colon+suffix. [0]
prefix_default - \s-1URI\s0 for default prefix (e.g. rel=\*(L":foo\*(R"). ['http://www.w3.org/1999/xhtml/vocab#']
prefix_nocase - \s-1DEPRECATED\s0 - shortcut for prefix_nocase_attr and prefix_nocase_xmlns.
prefix_nocase_attr - ignore case-sensitivity of \s-1CURIE\s0 prefixes defined via @prefix attribute. [0, 1]
prefix_nocase_xmlns - ignore case-sensitivity of \s-1CURIE\s0 prefixes defined via xmlns. [0, 1]
profile_attr - \s-1THIS\s0 \s-1OPTION\s0 \s-1IS\s0 \s-1NO\s0 \s-1LONGER\s0 \s-1SUPPORTED\s0!
profile_pi - \s-1THIS\s0 \s-1OPTION\s0 \s-1IS\s0 \s-1NO\s0 \s-1LONGER\s0 \s-1SUPPORTED\s0!
property_resources - @property works for resources [0, 1]
role_attr - support for \s-1XHTML\s0 @role [0]
safe_anywhere - allow Safe CURIEs in @rel/@rev/etc. [0, 1]
safe_optional - allow Unsafe CURIEs in @about/@resource. [0, 1]
skolemize - mint URIs instead of blank node identifiers. [0]
src_sets_object - @src sets object \s-1URI\s0 (like @href) [0, 1]
tdb_service - use thing-described-by.org to name some bnodes. [0]
typeof_resources - allow @typeof to occasionally apply to objects rather than subjects. [0, 1]
user_agent - a User-Agent header to use for \s-1HTTP\s0 requests. Ignored if lwp_ua is provided. [undef]
use_rtnlx - use RDF::Trine::Node::Literal::XML. 0=no, 1=if available. [0]
value_attr - support @value attribute (like @content) [0]
vocab_attr - support @vocab from RDFa 1.1. [0, 1]
vocab_default - default vocab \s-1URI\s0 (e.g. rel=\*(L"foo\*(R"). [undef]
vocab_triple - generate triple from @vocab. [0, 1]
xhtml_base - process <base> element. 0=no, 1=yes, 2=use it for \s-1RDF/XML\s0 too. [1]
xhtml_elements - process <head> and <body> specially. (Different special handling for XHTML+RDFa 1.0 and 1.1.) [1, 2]
xhtml_lang - support @lang rather than just @xml:lang. [0]
xml_base - support for 'xml:base' attribute. 0=only \s-1RDF/XML\s0; 1=except @href/@src; 2=always. [0]
xml_lang - Support for 'xml:lang' attribute. [1]
xmllit_default - Generate XMLLiterals enthusiastically. [1, 0]
xmllit_recurse - Look for RDFa inside XMLLiterals. [0, 1]
xmlns_attr - Support for 'xmlns:foo' to define \s-1CURIE\s0 prefixes. [1]
An alternative constructor \*(C`tagsoup\*(C' is provided with a useful set of options for dealing with content \*(L"from the wild\*(R".
The following full example parses RDFa 1.1 in an Atom document, also using the non-default 'atom_parser' option which parses native Atom elements into the graph too.
use RDF::RDFa::Parser;
$config = RDF::RDFa::Parser::Config->new( RDF::RDFa::Parser::Config->HOST_ATOM, RDF::RDFa::Parser::Config->RDFA_11, atom_parser => 1, ); $parser = RDF::RDFa::Parser->new_from_url($url, $config); $data = $parser->graph;
The following configuration set parses XHTML+RDFa 1.1 while also parsing any \s-1RDF/XML\s0 chunks that are embedded in the document.
use RDF::RDFa::Parser::Config qw(HOST_XHTML RDFA_11); $config = RDF::RDFa::Parser::Config->new( HOST_XHTML, RDFA_11, embedded_rdfxml=>1); $parser = RDF::RDFa::Parser->new_from_url($url, $config); $data = $parser->graph;
The following config is good for dealing with (X)HTML content from the wild:
$config = RDF::RDFa::Parser::Config->tagsoup; $parser = RDF::RDFa::Parser->new_from_url($url, $config); $data = $parser->graph;
RDF::RDFa::Parser.
Toby Inkster <[email protected]>.
Copyright 2008-2012 Toby Inkster
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
\s-1THIS\s0 \s-1PACKAGE\s0 \s-1IS\s0 \s-1PROVIDED\s0 \*(L"\s-1AS\s0 \s-1IS\s0\*(R" \s-1AND\s0 \s-1WITHOUT\s0 \s-1ANY\s0 \s-1EXPRESS\s0 \s-1OR\s0 \s-1IMPLIED\s0 \s-1WARRANTIES\s0, \s-1INCLUDING\s0, \s-1WITHOUT\s0 \s-1LIMITATION\s0, \s-1THE\s0 \s-1IMPLIED\s0 \s-1WARRANTIES\s0 \s-1OF\s0 \s-1MERCHANTIBILITY\s0 \s-1AND\s0 \s-1FITNESS\s0 \s-1FOR\s0 A \s-1PARTICULAR\s0 \s-1PURPOSE\s0.