Remove unwanted xml / xhtml tags and attributes
use MKDoc::XML::Stripper; my $stripper = new MKDoc::XML::Stripper; $stripper->allow (qw /p class id/); my $ugly = '<p class="para" style="color:red">Hello, <strong>World</strong>!</p>'; my $neat = $stripper->process_data ($ugly); print $neat;
Should print:
<p class="para">Hello, World!</p>
MKDoc::XML::Stripper is a class which lets you specify a set of tags and attributes which you want to allow, and then cheekily strip any \s-1XML\s0 of unwanted tags and attributes.
In MKDoc, this is used so that editors use structural \s-1XHTML\s0 rather than presentational tags, i.e. strip anything which looks like a <font> tag, a 'style' attribute or other tags which would break separation of structure from content.
This module does low level \s-1XML\s0 manipulation. It will somehow parse even broken \s-1XML\s0 and try to do something with it. Do not use it unless you know what you're doing.
Instantiates a new MKDoc::XML::Stripper object. Loads a definition located somewhere in @INC under MKDoc/XML/Stripper.
Available definitions are:
You can also load your own definition file, for instance:
$stripper->load_def ('my_def.txt');
Definitions are simple text files as follows:
# allow p with 'class' and id p class p id
# allow more stuff td class td id td style
# etc... Allows \*(L"<$tag>\*(R" to appear in the stripped \s-1XML\s0. Additionally, allows @attributes to appear as attributes of <$tag>, so for instance:
$stripper->allow ('p', 'class', 'id');
Will allow the following:
<p> <p class="foo"> <p id="bar"> <p class="foo" id="bar">
However any extra attributes will be stripped, i.e.
<p class="foo" id="bar" style="font-color: red">
Will be rewritten as
<p class="foo" id="bar"> Explicitly disallows a tag and all its associated attributes. By default everything is disallowed. Strips $some_xml according to the rules that were given with the allow() and disallow() methods and returns the result. Does not modify $some_xml in place. Strips '/an/xml/file.xml' according to the rules that were given with the allow() and disallow() methods and returns the result. Does not modify '/an/xml/file.xml' in place.
MKDoc::XML::Stripper does not really parse the \s-1XML\s0 file you're giving to it nor does it care if the \s-1XML\s0 is well-formed or not. It uses MKDoc::XML::Tokenizer to turn the \s-1XML\s0 / \s-1XHTML\s0 file into a series of MKDoc::XML::Token objects and strictly operates on a list of tokens.
For this same reason MKDoc::XML::Stripper does not support namespaces.
Copyright 2003 - MKDoc Holdings Ltd.
Author: Jean-Michel Hiver
This module is free software and is distributed under the same license as Perl itself. Use it at your own risk.
MKDoc::XML::Tokenizer MKDoc::XML::Token