Converts html to text with tables intact
Version 0.01
use HTML::FormatText::WithLinks::AndTables; my $text = HTML::FormatText::WithLinks::AndTables->convert($html);
Or optionally...
my $conf = { # same as HTML::FormatText excepting below cellpadding => 2, # defaults to 1 no_rowspacing => 1, # bool, suppress vertical space between table rows };
my $text = HTML::FormatText::WithLinks::AndTables->convert($html, $conf);
This module was inspired by HTML::FormatText::WithLinks which has proven to be a useful `lynx -dump` work-alike. However one frustration was that no other \s-1HTML\s0 converters I came across had the ability to deal affectively with \s-1HTML\s0 <\s-1TABLE\s0>s. This module can in a rudimentary sense do so. The aim was to provide facility to take a simple \s-1HTML\s0 based email template, and to also convert it to text with the <\s-1TABLE\s0> structure intact for inclusion as \*(L"multipart/alternative\*(R" content. Further, it will preserve both the formatting specified by the <\s-1TD\s0> tag's \*(L"align\*(R" attribute, and will also preserve multiline text inside of a <\s-1TD\s0> element provided it is broken using <\s-1BR/\s0> tags.
None by default.
Given the \s-1HTML\s0 below ...
<HTML><BODY> <TABLE> <TR> <TD ALIGN="right">Name:</TD> <TD>Mr. Foo Bar</TD> </TR> <TR> <TD ALIGN="right">Address:</TD> <TD> #1-276 Quux Lane, <BR/> Schenectady, NY, USA, <BR/> 12345 </TD> </TR> <TR> <TD ALIGN="right">Email:</TD> <TD><a href="mailto:[email protected]">[email protected]</a></TD> </TR> </TABLE> </BODY></HTML>
... the (default) return value of convert() will be as follows.
Name: Mr. Foo Bar
Address: #1-276 Quux Lane, Schenectady, NY, USA, 12345
Email: [1][email protected]
1. mailto:[email protected]
HTML::FormatText::WithLinks HTML::TreeBuilder
* This does not handle <TH> elements whatsoever!
* It assumes a fixed width font for display of resulting text.
* It doesn't work well on nested <TABLE>s or other nested blocks within <TABLE>s.
Shaun Fryer, \*(C`<pause.cpan.org at sourcery.ca>\*(C'
Please report any bugs or feature requests to \*(C`bug-html-formattext-withlinks-andtables at rt.cpan.org\*(C', or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=HTML-FormatText-WithLinks-AndTables <http://rt.cpan.org/NoAuth/ReportBug.html?Queue=HTML-FormatText-WithLinks-AndTables>. I will be notifi ed, and then you'll automatically be notified of progress on your bug as I make changes.
You can find documentation for this module with the perldoc command.
perldoc HTML::FormatText::WithLinks::AndTables
You can also look for information at:
\s-1RT:\s0 \s-1CPAN\s0's request tracker http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTML-FormatText-WithLinks-AndTables <http://rt.cpan.org/NoAuth/Bugs.html?Dist=HTML-FormatText-WithLinks-AndTables>
AnnoCPAN: Annotated \s-1CPAN\s0 documentation http://annocpan.org/dist/HTML-FormatText-WithLinks-AndTables <http://annocpan.org/dist/HTML-FormatText-WithLinks-AndTables>
\s-1CPAN\s0 Ratings http://cpanratings.perl.org/d/HTML-FormatText-WithLinks-AndTables <http://cpanratings.perl.org/d/HTML-FormatText-WithLinks-AndTables>
Search \s-1CPAN\s0 http://search.cpan.org/dist/HTML-FormatText-WithLinks-AndTables <http://search.cpan.org/dist/HTML-FormatText-WithLinks-AndTables>
Everybody. :) <http://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants>
Copyright 2008 Shaun Fryer, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.