SYNOPSIS

(1) code2html [options] [input-file [output-file]]

(2) code2html -p [file [alternate-outfile]]

(3) code2html (as a CGI script; see the section on CGI)

DESCRIPTION

code2html is a perl script which converts a program source code to syntax highlighted HTML, or any other format for wich rules are defined.

(1) OPTIONS

input-file

Is the file which contains the program source code to be formatted. If not specified or a minus (-) is given, the code will be read from STDIN.

output-file

Is the file to write the formatted code to. If not specified or a minus (-) is given, the code will be written to STDOUT.

-l, --language-mode

Specify the set of regular expressions to use. These have to be defined in a language file (see FILES below). To find out which language modes are defined, issue a code2html --modes.

This input is treated case-insensitive.

If not given, some heuristics will be used to determine the file language.

-v, --verbose

Prints progress information to STDERR.

-n, --linenumbers

Print out the source code with line numbers.

-N, --linknumbers

Print out the source code with line numbers. The linenumbers will link to themselves, which makes it easy to send links to lines.

-P, --prefix

Optional prefix to use for line number anchors.

-t, --replace-tabs[=TABSTOP-WIDTH]

Replace each occurence of a <TAB> character with the right amount of spaces to get to the next tabstop. Default is a tabstop width of 8 characters.

-L, --language-file=LANGUAGE-FILE

Specify an alternate file to take the language and output-format definitions from (see the section on FILES below).

-m, --modes

Print all language modes and output-formats currently defined to STDOUT and exit succesfully. Also prints modes from a LANGUAGE-FILE given by --language-file if applicable.

--fallback=LANG

If the language mode given with --language-mode cannot be found then use this mode.

--fallback plain for instance is usefull when code2html is called from a script to ensure output is created.

-h, --help

Print a short help and exit succesfully.

-V, --version

Print the program version and exit succesfully.

-c, --content-type

Prints ”Content-Type: text/html\n\n“ (or whatever the output-format defines as a content-type) prior to the rest of the output. Usefull if the script is ivoked as a cgi script.

-o, --output-format

Selects the output-format. html is the default. To find out which outputformats are defined, issue a code2html --modes.

-H, --no-header

do not make use of the template defined by the output-format. For HTML this means that there will be no <html>, <head>, and no <typical for patch and CGI modes,pre> tags.

--template=FILE

overrides the default template for the given output format. If --no-header is given too, this has no meaning, since the template is ignored anyway.

-T, --title

Set the title of the produced output file. Only works if the template does support setting the title.

-w, --linewidth=LINEWIDTH

Wrap lines after LINEWIDTH characters. Default is to not wrap lines at all.

-b, --linebreakprefix=LINEPREFIX

Use fILINEPREFIX at the start of wrapped lines. Default is "\[Fc] ".

(2) HTML patching

code2html -p [file [alternate-outfile]]

code2html also allows you to have inline source code in an html file. It can then take this html file and insert the syntax highlighted code.

If no file is given, code2html reads from STDIN and writes to STDOUT. If just one file is given it replaces this file with the output. If two files are provided, the first one is read from and the second one written to.

To use this feature, just insert a like like this into your html file:

  • <!-- code2html add [options] <file> -->

the syntax highlighted file will be inserted at this position enclosed in <pre> tags.

All options that can be given on the command line like --linenumbers etc. work. --help, --version, etc. work too however it is not very intelligent to use them :). Using --output-format to choose a non-HTML outputformat is not adviseable. --content-type is ignored.

You may also write the program's source code directly in the html file with the following syntax:

  • <!-- code2html add [options]

    <your program source code here>

    -->

It is usually a good idea to at least give the --language-mode option to specify the language.

(3) CGI

If the the script is used as a CGI script (GATEWAY_INTERFACE environment set and no command line arguments given) code2html reads the arguments either from the query string or from SDTIN. (methods POST and GET).

--content-type is switched on automatically and the output always goes to STDOUT.

The following parameters/options are accepted:

language-mode - optional

`c', `cc', `pas', etc.

if not given, some heuristics are used to find out the language.

fallback - optional

`plain', `c', etc. if language-mode cannot be found, use this one

input-selector - optional

either `file', `cgi-input1', `cgi-input2', or `REDIRECT_URL'

default: file

filename

file to read from if input-selector is `file'

cgi-input1

The source code to syntax highlight. For example from a <textarea> or from a upload. See input-selector.

cgi-input2

The source code to syntax highlight. For example from a <textarea> or from a upload. See input-selector.

line-numbers - optional

`yes', `no' or `link'

default: no

replace-tabs - optional

If 0 then tabs are not replaced, else replace each occurence of a <TAB> character with the right amount of spaces to get to the next tabstop.

default: 0

title - optional

Set's the title of the file.

no-encoding - optional

By default code2html tries to encode the output as either bz2/gz/Z if the client supports this (HTTP_ACCEPT_ENCODING) and the needed program is available on the server. You may need to modify @CGI_ENCODING in the script to match your program locations.

If no-encoding is defined as “true” code2html does not try to encode the output.

Why two cgi-inputs you may ask: This is to allow your users to choose vie a <form> interface whether they want to insert their file into a <textarea> or user a <browse> button to select their file. See the example on my home- page.

Note that if $FILES_DISALLOWED_IN_CGI is 0 it is possbile for your users to read all the files the httpd can read (if you don't run a cgi- wrapper or something like this. By default this value is set to 1, so file reading via cgi should not be allowed. You can allow it with setting $FILES_DISALLOWED_IN_CGI to 0 at the top of the script.

The input selector REDIRECT_URL needs a special explaination. The file name is formed from the two enviroment variables DOCUMENT_ROOT and REDIRECT_URL.

If you want apache to automatically call code2html for all program source code files you may do this by adding these two lines to your srm.conf:

  • AddHandler text/x-sourcecode .c .cc .cpp .pas .h .p

    Action text/x-sourcecode /cgi-bin/code2html?input-selector=REDIRECT_URL&foo=

or something similar to this. In the AddHandle line you can choose which extensions to pass through code2html.

WARNING: Do not add .pl to this line and name this script “code2html.pl”. This will result in a loop.

Also make sure that you load the Action module (srm.conf).

Replace /cgi-bin/code2html with the virtual location under which the file can be accessed. Note the “foo=” part. Apache appends the URL of the file to display at the end of the action part. We do not need this since we use the environment variable REDIRECT_URL however we do not want to get the url addes to the input-selector string. Therefore we append the “&foo=” part.

Tnx to Kevin Burton <[email protected]> for the idea. He also states that

> It is more powerfull if you use it in an Apache

> <Directory> tag

>

> <Directory /source>

>

> #with your Action tag here... this way you can

> #still have regular .java files on your server.

>

> </Directory>

>

EXAMPLE

assuming code2html is in the current directory, you may type

code2html -l perl code2html.pl code2html.html

to convert the script into a html file.

FILES

Code2html looks for it's configuration in several places.

  • the file specified by -L or --language-file if any

  • the files specified in the evironment variable CODE2HTML_CONFIG, seperated by colons

  • user's $HOME/.code2html.config

  • /etc/code2html.config

  • built in default languages

Entries in a file that is mentioned earlier in this list override rules from later files.

The file structure must be valid perl code.

The global variables %LANGUAGE and %STYLESHEET are already defined, so you should not redeclare them using “my”.

When you are looking for a model configuration to serve as a basis for your own configuration file, it is probably best to start out by checking the built-in definitions at the bottom of code2html.

If your pattern includes back references like a lot patterns do in perl for example, then you have to use \2 instead of \1, \3 instead of \2 and so on. I really don't like this hack but it is a lot faster.

Example:

<<([^\n]*).*?^\2$

In this example the perl << stuff is matched, i.e. everything from a << until a line that consists of exactly the same string as behind the << was. The \2 references the matched chars in the parentenses.

If you ever write language specific rule files yourself, I'd be grateful if you could send those to me, so I could make them available (with full credits of course) on my homepage for anyone to grab, whenever some of those files suit someone else's needs. Before you do so you might also have a look at my site to check wheter someone has already written a rule file for your favourite language.

NOTES

The language recognition mechanism relies on specific patterns within the file name and the content of the processed file, such as file name extensions and shebangs (#!). This means that if the input is a pipe or a socket, the file name does not follow traditional naming conventions, or the content of the processed file is incomplete, the input language name should be specified using the --language-mode command line parameter.

BUGS

Please report bugs to [email protected]. This program is still a beta release, so you should expect to find some.

Also have a look at my web-site, perhaps a new version is available already at http://www.palfrader.org/code2html/.

AUTHOR

Peter Palfrader, <[email protected]> A lot of other people. See contributers in the file itself.

LICENSE

Copyright (c) 1999, 2000 by Peter Palfrader & others.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”, to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.