Generate validating xml processor and applications from dtd
flexml [-ASHDvdnLXV] [-sskel] [-ppubid] [-iinit_header] [-uuri] [-rrootags] [-aactions] name[.dtd]
Flexml reads name.dtd which must be a \s-1DTD\s0 (Document Type Definition) describing the format of \s-1XML\s0 (Extensible Markup Language) documents, and produces a \*(L"validating\*(R" \s-1XML\s0 processor with an interface to support \s-1XML\s0 applications. Proper applications can be generated optionally from special \*(L"action files\*(R", either for linking or textual combination with the processor.
The generated processor will only validate documents that conform strictly to the \s-1DTD\s0, without extending it, more precisely we in practice restrict \s-1XML\s0 rule [28] to
[28r] doctypedecl ::= '<!DOCTYPE' S Name S ExternalID S? '>'
where the \*(C`ExternalId\*(C' denotes the used \s-1DTD\s0. (One might say, in fact, that flexml implements \*(L"non-extensible\*(R" markup. :)
The generated processor is a flex(1) scanner, by default named name.l with a corresponding C header file name.h for separate compilation of generated applications. Optionally flexml takes an actions file with per-element actions and produces a C file with element functions for an \s-1XML\s0 application with entry points called from the \s-1XML\s0 processor (it can also fold the \s-1XML\s0 application into the \s-1XML\s0 processor to make stand-alone \s-1XML\s0 applications but this prevents sharing of the processor between applications).
In \*(L"\s-1OPTIONS\s0\*(R" we list the possible options, in \*(L"\s-1ACTION\s0 \s-1FILE\s0 \s-1FORMAT\s0\*(R" we explain how to write applications, in \*(L"\s-1COMPILATION\s0\*(R" we explain how to compile produced processors and applications into executables, and in \*(L"\s-1BUGS\s0\*(R" we list the current limitations of the system before giving standard references.
Flexml takes the following options.
Generate a stand-alone scanner application. If combined with -aactions then the application will be named as actions with the extension replaced by .l, otherwise it will be in name.l. Conflicts with -S, -H, and -D.
Uses the actions file to produce an \s-1XML\s0 application in the file with the same name as actions after replacing the extension with .c. If combined with -A then instead the stand-alone application will include the action functions.
Generate a dummy application with just empty functions to be called by the \s-1XML\s0 processor. If app_name is not specified on the command line, it defaults to name-dummy.c. If combined with -a actions then the application will insert the specified actions and be named as actions with the extension replaced by .c. Conflicts with -A; implied by -a unless either of -SHD is specified.
Turns on debug mode in the flex scanner and also prints out the details of the \s-1DTD\s0 analysis performed by flexml.
Generate the header file. If the header_name is not specified on the command line, defaults to name.h. Conflicts with -A; on by default if none of -SHD specified.
Makes the \s-1XML\s0 processor (as produced by flex(1)) count the lines in the input and keep it available to \s-1XML\s0 application actions in the integer \*(C`yylineno\*(C'. (This is off by default as the performance overhead is significant.)
Prevents the \s-1XML\s0 processor (as produced by flex(1)) from reporting the error it runs into on stderr. Instead, users will have to pool for error messages with the parse_err_msg() function. By default, error messages are written on stderr.
\*(L"Dry-run\*(R": do not produce any of the output files.
Sets the document type to be \*(C`PUBLIC\*(C' with the identifier pubid instead of \*(C`SYSTEM\*(C', the default.
Puts a line containing \*(C`#include "init_header"\*(C' in the \*(C`%{...%}\*(C' section at the top of the generated .l file. This may be useful for making various flex \*(C`#define\*(C's, for example \*(C`YY_INPUT\*(C' or \*(C`YY_DECL\*(C'.
Overrides the \*(C`SYSTEM\*(C' id of the accepted \s-1DTD\s0. Sometimes useful when your dtd is placed in a subdirectory.
Restricts the \s-1XML\s0 processor to validate only documents with one of the root elements listed in the comma-separated roottags.
Generate the scanner. If scanner_name is not given on command line, it defaults to name.l. Conflicts with -A; on by default if none of -SHD specified.
Use the skeleton scanner skel instead of the default.
This is an internal option mainly used to test versions of flexml not installed yet.
Sets the \s-1FLEXML_BUFFERSTACKSIZE\s0 to stack_increment (100000 by default). This controls how much the data stack grows in each realloc().
Use \s-1STRING\s0 to differentiate multiple versions of flexml in the same C code, just like the -P flex argument.
Sets the \s-1URI\s0 of the \s-1DTD\s0, used in the \*(C`DOCTYPE\*(C' header, to the specified uri (the default is the \s-1DTD\s0 name).
Be verbose: echo each \s-1DTD\s0 declaration (after parameter expansion).
Print the version of flexml and exit.
Action files, passed to the -a option, are \s-1XML\s0 documents conforming to the \s-1DTD\s0 flexml-act.dtd which is the following:
<!ELEMENT actions ((top|start|end)*,main?)> <!ENTITY % C-code "(#PCDATA)"> <!ELEMENT top %C-code;> <!ELEMENT start %C-code;> <!ATTLIST start tag NMTOKEN #REQUIRED> <!ELEMENT end %C-code;> <!ATTLIST end tag NMTOKEN #REQUIRED> <!ELEMENT main %C-code;>
The elements should be used as follows: Use for top-level C code such as global declarations, utility functions, etc. Attaches the code as an action to the element with the name of the required "\*(C`tag\*(C'\*(L" attribute. The \*(R"\*(C`%C-code;\*(C'" component should be C code suitable for inclusion in a C block (i.e., within \*(C`{\*(C'...\*(C`}\*(C' so it may contain local variables); furthermore the following extensions are available: \*(C`{\*(C'attribute\*(C`}\*(C': Can be used to access the value of the attribute as set with attribute\*(C`=\*(C'value in the start tag. In C, \*(C`{\*(C'attribute\*(C`}\*(C' will be interpreted depending on the declaration of the attribute. If the attribute is declared as an enumerated type like <!ATTLIST attrib (alt1 | alt2 |...) ...> then the C attribute value is of an enumerated type with the elements written \*(C`{\*(C'attribute\*(C`=\*(C'alt1\*(C`}\*(C', \*(C`{\*(C'attribute\*(C`=\*(C'alt2\*(C`}\*(C', etc.; furthermore an unset attribute has the \*(L"value\*(R" \*(C`{!\*(C'attribute\*(C`}\*(C'. If the attribute is not an enumeration then \*(C`{\*(C'attribute\*(C`}\*(C' is a null-terminated C string (of type \*(C`char*\*(C') and \*(C`{!\*(C'attribute\*(C`}\*(C' is \*(C`NULL\*(C'. Similarly attaches the code as an action to the end tag with the name of the required "\*(C`tag\*(C'\*(L" attribute; also here the \*(R"\*(C`%C-code;\*(C'\*(L" component should be C code suitable for inclusion in a C block. In case the element has \*(R"Mixed" contents, i.e, was declared to permit \*(C`#PCDATA\*(C', then the following variable is available: \*(C`{#PCDATA}\*(C': Contains the text (\*(C`#PCDATA\*(C') of the element as a null-terminated C string (of type \*(C`char*\*(C'). In case the Mixed contents element actually mixed text and child elements then \*(C`pcdata\*(C' contains the plain concatenation of the text fragments as one string. Finally, an optional "\*(C`main\*(C'" element can contain the C \*(C`main\*(C' function of the \s-1XML\s0 application. Normally the \*(C`main\*(C' function should include (at least) one call of the \s-1XML\s0 processor: \*(C`yylex()\*(C': Invokes the \s-1XML\s0 processor produced by flex(1) on the \s-1XML\s0 document found on the standard input (actually the \*(C`yyin\*(C' file handle: see the manual for flex(1) for information on how to change this as well as the name \*(C`yylex\*(C'). If no \*(C`main\*(C' action is provided then the following is used: int main() { exit(yylex()); }
It is advisable to use \s-1XML\s0 <\*(C`![CDATA[\*(C' ... \*(C`]]\*(C'> sections for the C code to make sure that all characters are properly passed to the output file.
Finally note that Flexml handles empty elements <tag\*(C`/\*(C'> as equivalent to <tag><\*(C`/\*(C'tag>.
The following make(1) file fragment shows how one can compile flexml-generated programs:
# Programs. FLEXML = flexml -v
# Generate linkable XML processor with header for application. %.l %.h: %.dtd $(FLEXML) $<
# Generate C source from flex scanner. %.c: %.l $(FLEX) -Bs -o"$@" "$<"
# Generate XML application C source to link with processor. # Note: The dependency must be of the form "appl.c: appl.act proc.dtd". %.c: %.act $(FLEXML) -D -a $^
# Direct generation of stand-alone XML processor+application. # Note: The dependency must be of the form "appl.l: appl.act proc.dtd". %.l: %.act $(FLEXML) -A -a $^
The present version of flexml is to be considered in \*(L"early beta\*(R" state thus bugs should be expected (and the author would like to hear about them). Here are some known restrictions that we hope to overcome in the future:
The character set is merely \s-1ASCII\s0 (actually flex(1) handles 8 bit characters but only the \s-1ASCII\s0 character set is common with the \s-1XML\s0 default \s-1UTF-8\s0 encoding).
\*(C`ID\*(C' type attributes are not validated for uniqueness; \*(C`IDREF\*(C' and \*(C`IDREFS\*(C' attributes are not validated for existence.
The \*(C`ENTITY\*(C' and \*(C`ENTITIES\*(C' attribute types are not supported.
\*(C`NOTATION\*(C' declarations are not supported.
The various \*(C`xml:\*(C'-attributes are treated like any other attributes; in particular \*(C`xml:spaces\*(C' should be supported.
The \s-1DTD\s0 parser is presently a perl hack so it may parse some DTDs badly; in particular the expansion of parameter entities may not conform fully to the \s-1XML\s0 specification.
A child should be able to \*(L"return\*(R" a value for the parent (also called a synthesised attribute). Similarly an element in Mixed contents should be able to inject text into the \*(C`pcdata\*(C' of the parent.
The skeleton scanner with the generic parts of \s-1XML\s0 scanning.
License, further documentation, and examples.
flex(1), Extensible Markup Language (\s-1XML\s0) 1.0 (W3C Recommendation REC-xml-1998-0210).
Flexml was written by Kristoffer Rose, <\*(C`[email protected]\*(C'>.
The program is Copyright (c) 1999 Kristoffer Rose (all rights reserved) and distributed under the \s-1GNU\s0 General Public License (\s-1GPL\s0, also known as \*(L"copyleft\*(R", which clarifies that the author provides absolutely no warranty for flexml and ensures that flexml is and will remain available for all uses, even comercial).
I am grateful to NTSys (France) for supporting the development of flexml. Finally extend my severe thanks to Jef Poskanzer, Vern Paxson, and the rest of the flex maintainers and \s-1GNU\s0 developers for a great tool.