Extract selected elements from a html or xml file
hxextract [\| -h | -? \|] [\| -x \|] [\| -s text \|] [\| -e text \|] [\| -b base \|] element-or-class [\| -c configfile | file-or-URL \|]
hxextract outputs all elements with a certain name and/or class.
Input must be well-formed, since no HTML heuristics are applied.
The following options are supported:
-x
Use XML format conventions.
-s text
Insert text at the start of the output.
-e text
Insert text at the end of the output.
-b base
URL base
-c configfile
Read @chapter lines from configfile (lines must be of the form "@chapter filename") and extract elements from each of those files.
-h, -?
Print command usage.
The following operands are supported:
element-or-class
The name of an element to extract (e.g., "H2"), or the name of a class preceded by "." (e.g., ".example") or a combination of both (e.g., "H2.example").
file-or-URL
A file name or a URL. To read from standard input, use "-".
To use a proxy to retrieve remote files, set the environment variables http_proxy and ftp_proxy. E.g., http_proxy="http://localhost:8080/"
Remote files (specified with a URL) are currently only supported for HTTP. Password-protected files or files that depend on HTTP "cookies" are not handled. (You can use tools such as curl(1) or wget(1) to retrieve such files.)