HTML::Copy: Copy a html file without breaking links.

VERSION

Version 1.31

SYMPOSIS

  use HTML::Copy;

  HTML::Copy->htmlcopy($source_path, $destination_path);

  # or

  $p = HTML::Copy->new($source_path);
  $p->copy_to($destination_path);

  # or

  open my $in, "<", $source_path;
  $p = HTML::Copy->new($in)
  $p->source_path($source_path);    # can be omitted,
                                    # when $source_path is in cwd.

  $p->destination_path($destination_path) # can be omitted,
                                          # when $source_path is in cwd.
  open my $out, ">", $source_path;
  $p->copy_to($out);

DESCRIPTION

This module is to copy a \s-1HTML\s0 file without beaking links in the file. This module is a sub class of HTML::Parser.

REQUIRED MODULES

HTML::Parser

CLASS METHODS

htmlcopy

HTML::Copy->htmlcopy($source_path, $destination_path);

Parse contents of $source_path, change links and write into $destination_path.

parse_file

$html_text = HTML::Copy->parse_file($source_path, $destination_path);

Parse contents of $source_path and change links to copy into $destination_path. But don't make $destination_path. Just return modified \s-1HTML\s0. The encoding of strings is converted into utf8.

CONSTRUCTOR METHODS

new

$p = HTML::Copy->new($source);

Make an instance of this module with specifying a source of \s-1HTML\s0.

The argument $source can be a file path or a file handle. When a file handle is passed, you may need to indicate a file path of the passed file handle by the method \*(L"source_path\*(R". If calling \*(L"source_path\*(R" is omitted, it is assumed that the location of the file handle is the current working directory.

INSTANCE METHODS

copy_to

$p->copy_to($destination)

Parse contents of $source given in new method, change links and write into $destination.

The argument $destination can be a file path or a file handle. When $destination is a file handle, you may need to indicate the location of the file handle by a method \*(L"destination_path\*(R". \*(L"destination_path\*(R" must be called before calling \*(L"copy_to\*(R". When calling \*(L"destination_path\*(R" is omitted, it is assumed that the locaiton of the file handle is the current working directory.

parse_to

$p->parse_to($destination_path)

Parse contents of $source_path given in new method, change links and return \s-1HTML\s0 contents to wirte $destination_path. Unlike copy_to, $destination_path will not created and just return modified \s-1HTML\s0. The encoding of strings is converted into utf8.

ACCESSOR METHODS

source_path

$p->source_path $p->source_path($path)

Get and set a source location. Usually source location is specified with the \*(L"new\*(R" method. When a file handle is passed to \*(L"new\*(R" and the location of the file handle is not the current working directory, you need to use this method.

destination_path

$p->destination_path $p->destination_path($path)

Get and set a destination location. Usually destination location is specified with the \*(L"copy_to\*(R". When a file handle is passed to \*(L"copy_to\*(R" and the location of the file handle is not the current working directory, you need to use this method before \*(L"copy_to\*(R".

enchoding

$p->encoding;

Get an encoding of a source \s-1HTML\s0.

io_layer

$p->io_layer; $p->io_layer(':utf8');

Get and set PerlIO layer to read the source path and to write the destination path. Usually it was automatically determined by $source_path's charset tag. If charset is not specified, Encode::Guess module will be used.

encode_suspects

@suspects = $p->encode_sustects; $p->encode_suspects(qw/shiftjis euc-jp/);

Add suspects of text encoding to guess the text encoding of the source \s-1HTML\s0. If the source \s-1HTML\s0 have charset tag, it is not required to add suspects.

source_html

$p->source_html;

Obtain source \s-1HTML\s0's contents

NOTE

Cleanuped pathes should be given to HTML::Copy and it's instances. For example, a verbose path like '/aa/bb/../cc' may cause converting links wrongly. This is a limitaion of the \s-1URI\s0 module's rel method. To cleanup pathes, Cwd::realpath is useful.

AUTHOR

Tetsuro \s-1KURITA\s0 <[email protected]>

HTML::Copy (3pm)