Converts a program source code to html
(1) code2html [options] [input-file [output-file]]
(2) code2html -p [file [alternate-outfile]]
(3) code2html (as a CGI script; see the section on CGI)
code2html is a perl script which converts a program source code to syntax highlighted HTML, or any other format for wich rules are defined.
(1) OPTIONS
Is the file which contains the program source code to be formatted. If not specified or a minus (-) is given, the code will be read from STDIN.
Is the file to write the formatted code to. If not specified or a minus (-) is given, the code will be written to STDOUT.
Specify the set of regular expressions to use. These have to be defined in a language file (see FILES below). To find out which language modes are defined, issue a code2html --modes.
This input is treated case-insensitive.
If not given, some heuristics will be used to determine the file language.
Prints progress information to STDERR.
Print out the source code with line numbers.
Print out the source code with line numbers. The linenumbers will link to themselves, which makes it easy to send links to lines.
Optional prefix to use for line number anchors.
Replace each occurence of a <TAB> character with the right amount of spaces to get to the next tabstop. Default is a tabstop width of 8 characters.
Specify an alternate file to take the language and output-format definitions from (see the section on FILES below).
Print all language modes and output-formats currently defined to STDOUT and exit succesfully. Also prints modes from a LANGUAGE-FILE given by --language-file if applicable.
If the language mode given with --language-mode cannot be found then use this mode.
--fallback plain for instance is usefull when code2html is called from a script to ensure output is created.
Print a short help and exit succesfully.
Print the program version and exit succesfully.
Prints ”Content-Type: text/html\n\n“ (or whatever the output-format defines as a content-type) prior to the rest of the output. Usefull if the script is ivoked as a cgi script.
Selects the output-format. html is the default. To find out which outputformats are defined, issue a code2html --modes.
do not make use of the template defined by the output-format. For HTML this means that there will be no <html>, <head>, and no <typical for patch and CGI modes,pre> tags.
overrides the default template for the given output format. If --no-header is given too, this has no meaning, since the template is ignored anyway.
Set the title of the produced output file. Only works if the template does support setting the title.
Wrap lines after LINEWIDTH characters. Default is to not wrap lines at all.
Use fILINEPREFIX at the start of wrapped lines. Default is "\[Fc] ".
(2) HTML patching
code2html -p [file [alternate-outfile]]
code2html also allows you to have inline source code in an html file. It can then take this html file and insert the syntax highlighted code.
If no file is given, code2html reads from STDIN and writes to STDOUT. If just one file is given it replaces this file with the output. If two files are provided, the first one is read from and the second one written to.
To use this feature, just insert a like like this into your html file:
<!-- code2html add [options] <file> -->
the syntax highlighted file will be inserted at this position enclosed in <pre> tags.
All options that can be given on the command line like --linenumbers etc. work. --help, --version, etc. work too however it is not very intelligent to use them :). Using --output-format to choose a non-HTML outputformat is not adviseable. --content-type is ignored.
You may also write the program's source code directly in the html file with the following syntax:
<!-- code2html add [options]
<your program source code here>
-->
It is usually a good idea to at least give the --language-mode option to specify the language.
(3) CGI
If the the script is used as a CGI script (GATEWAY_INTERFACE environment set and no command line arguments given) code2html reads the arguments either from the query string or from SDTIN. (methods POST and GET).
--content-type is switched on automatically and the output always goes to STDOUT.
The following parameters/options are accepted:
`c', `cc', `pas', etc.
if not given, some heuristics are used to find out the language.
`plain', `c', etc. if language-mode cannot be found, use this one
either `file', `cgi-input1', `cgi-input2', or `REDIRECT_URL'
default: file
file to read from if input-selector is `file'
The source code to syntax highlight. For example from a <textarea> or from a upload. See input-selector.
The source code to syntax highlight. For example from a <textarea> or from a upload. See input-selector.
`yes', `no' or `link'
default: no
If 0 then tabs are not replaced, else replace each occurence of a <TAB> character with the right amount of spaces to get to the next tabstop.
default: 0
Set's the title of the file.
By default code2html tries to encode the output as either bz2/gz/Z if the client supports this (HTTP_ACCEPT_ENCODING) and the needed program is available on the server. You may need to modify @CGI_ENCODING in the script to match your program locations.
If no-encoding is defined as “true” code2html does not try to encode the output.
Why two cgi-inputs you may ask: This is to allow your users to choose vie a <form> interface whether they want to insert their file into a <textarea> or user a <browse> button to select their file. See the example on my home- page.
Note that if $FILES_DISALLOWED_IN_CGI is 0 it is possbile for your users to read all the files the httpd can read (if you don't run a cgi- wrapper or something like this. By default this value is set to 1, so file reading via cgi should not be allowed. You can allow it with setting $FILES_DISALLOWED_IN_CGI to 0 at the top of the script.
The input selector REDIRECT_URL needs a special explaination. The file name is formed from the two enviroment variables DOCUMENT_ROOT and REDIRECT_URL.
If you want apache to automatically call code2html for all program source code files you may do this by adding these two lines to your srm.conf:
AddHandler text/x-sourcecode .c .cc .cpp .pas .h .p
Action text/x-sourcecode /cgi-bin/code2html?input-selector=REDIRECT_URL&foo=
or something similar to this. In the AddHandle line you can choose which extensions to pass through code2html.
WARNING: Do not add .pl to this line and name this script “code2html.pl”. This will result in a loop.
Also make sure that you load the Action module (srm.conf).
Replace /cgi-bin/code2html with the virtual location under which the file can be accessed. Note the “foo=” part. Apache appends the URL of the file to display at the end of the action part. We do not need this since we use the environment variable REDIRECT_URL however we do not want to get the url addes to the input-selector string. Therefore we append the “&foo=” part.
Tnx to Kevin Burton <[email protected]> for the idea. He also states that
> It is more powerfull if you use it in an Apache
> <Directory> tag
>
> <Directory /source>
>
> #with your Action tag here... this way you can
> #still have regular .java files on your server.
>
> </Directory>
>
assuming code2html is in the current directory, you may type
code2html -l perl code2html.pl code2html.html
to convert the script into a html file.
Code2html looks for it's configuration in several places.
the file specified by -L or --language-file if any
the files specified in the evironment variable CODE2HTML_CONFIG, seperated by colons
user's $HOME/.code2html.config
/etc/code2html.config
built in default languages
Entries in a file that is mentioned earlier in this list override rules from later files.
The file structure must be valid perl code.
The global variables %LANGUAGE and %STYLESHEET are already defined, so you should not redeclare them using “my”.
When you are looking for a model configuration to serve as a basis for your own configuration file, it is probably best to start out by checking the built-in definitions at the bottom of code2html.
If your pattern includes back references like a lot patterns do in perl for example, then you have to use \2 instead of \1, \3 instead of \2 and so on. I really don't like this hack but it is a lot faster.
Example:
<<([^\n]*).*?^\2$
In this example the perl << stuff is matched, i.e. everything from a << until a line that consists of exactly the same string as behind the << was. The \2 references the matched chars in the parentenses.
If you ever write language specific rule files yourself, I'd be grateful if you could send those to me, so I could make them available (with full credits of course) on my homepage for anyone to grab, whenever some of those files suit someone else's needs. Before you do so you might also have a look at my site to check wheter someone has already written a rule file for your favourite language.
The language recognition mechanism relies on specific patterns within the file name and the content of the processed file, such as file name extensions and shebangs (#!). This means that if the input is a pipe or a socket, the file name does not follow traditional naming conventions, or the content of the processed file is incomplete, the input language name should be specified using the --language-mode command line parameter.
Please report bugs to [email protected]. This program is still a beta release, so you should expect to find some.
Also have a look at my web-site, perhaps a new version is available already at http://www.palfrader.org/code2html/.
Peter Palfrader, <[email protected]> A lot of other people. See contributers in the file itself.
Copyright (c) 1999, 2000 by Peter Palfrader & others.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”, to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.