Perl extension converting csv files to xml
use XML::CSV; $csv_obj = XML::CSV->new(); $csv_obj = XML::CSV->new(\%attr); $status = $csv_obj->parse_doc(file_name); $status = $csv_obj->parse_doc(file_name, \%attr); $csv_obj->declare_xml(\%attr); $csv_obj->declare_doctype(\%attr); $csv_obj->print_xml(file_name, \%attr);
\s-1XML::CSV\s0 is a new module in is going to be upgraded very often as my time permits. For the time being it uses \s-1CSV_XS\s0 module object default values to parse the (*.csv) document and then creates a perl data structure with xml tags names and data. At this point it does not allow for a write as you parse interface but is the first upgrade for the next release. I will also allow more access to the data structures and more documentation. I will also put in more support for \s-1XML\s0, since currently it only allows a simple \s-1XML\s0 structure. Currently you can modify the tag structure to allow for attributes. No \s-1DTD\s0 support is currently available, but will be implemented in a soon coming release. As the module will provide both: object and event interfaces, it will be used upon individual needs, system resources, and required performance. Ofcourse the \s-1DOM\s0 implementation takes up more resources and in some instances timing, it's the easiest to use.
error_out - Turn on the error handling which will die on all errors and assign the error message to $XML::CSV::csvxml_error.
column_headings - Specifies the column heading to use. Passed as an array reference. Can be used as a supplement to using the first column in the file as the \s-1XML\s0 tag names. Since \s-1XML::CSV\s0 does not require you to parse the \s-1CSV\s0 file, you can provide your own data structure to parse.
column_data - Specifies the \s-1CSV\s0 data in a two dimensional array. Passed as an array reference.
csv_xs - Specifies the \s-1CSV_XS\s0 object to use. This is used to create custom \s-1CSV_XS\s0 object and override the default one created by \s-1XML::CSV\s0.
headings - Specifies the number of rows to use as tag names. Defaults to 0. Ex. {headings => 1} (This will use the first row of data as xml tags)
sub_char - Specifies the character with which the illegal tag characters will be replaced with. Defaults to undef meaning no substitution is done. To eliminate characters use "\*(L" (empty string) or to replace with another see below. Ex. {sub_char => \*(R"_\*(L"} or {sub_char => \*(R""}
version - Specifies the xml version. Ex. {version => '1.0'}
encoding - Specifies the type of encoding. \s-1XML\s0 standard defaults encoding to '\s-1UTF-8\s0' if notspecifically
set.
Ex. {encoding => '\s-1ISO-8859_1\s0'}
standalone - Specifies the the document as standalone (yes|no). If the document is does not rely on an
external \s-1DTD\s0, \s-1DTD\s0 is internal, or the external \s-1DTD\s0 does not effect the contents of the document, the standalone attribute should be set to 'yes', otherwise 'no' should be used. For more info see \s-1XML\s0 declaration documentation.
Ex. {standalone => 'yes'}
source - Specifies the source of the \s-1DTD\s0 (SYSTEM|PUBLIC) Ex. {source => '\s-1SYSTEM\s0'}
location1 - \s-1URI\s0 to the \s-1DTD\s0 file. Public \s-1ID\s0 may be used if source is \s-1PUBLIC\s0. Ex. {location1 => 'http://www.xmlproj.com/dtd/index_dtd.dtd'} or {location1 => '-//Netscape Communications//DTD \s-1RSS\s0 0.90//EN'}
location2 - Optional second \s-1URI\s0. Usually used if the location1 public \s-1ID\s0 is not found by the
validating parser.
Ex. {location2 => 'http://www.xmlproj.com/file.dtd'}
subset - Any other information that proceedes the \s-1DTD\s0 declaration. Usually includes internal \s-1DTD\s0 if any. Ex. {subset => '\s-1ELEMENT\s0 first_name (#PCDATA)>\n<!ELEMENT last_name (#PCDATA)>'} You can even enterpolate the string with $obj->{column_headings} to dynamically build the \s-1DTD\s0. Ex. {subset => \*(L"\s-1ELEMENT\s0 $obj->{columnt_headings}[0] (#PCDATA)>\*(R"}
file_tag - Specifies the file parent tag. Defaults to \*(L"records\*(R". Ex. {file_tag => \*(L"file_data\*(R"} (Do not use < and > when specifying)
parent_tag - Specifies the record parent tag. Defaults to \*(L"record\*(R". Ex. {parent_tag => \*(L"record_data\*(R"} (Do not use < and > when specifying)
format - Specifies the character to use to indent nodes. Defaults to \*(L"\t\*(R" (tab). Ex. {format => \*(L" \*(R"} or {format => \*(L"\t\t\*(R"}
$csv_obj->{column_headings} $csv_obj->{column_data}
Example #1:
This is a simple implementation which uses defaults
use \s-1XML::CSV\s0; $csv_obj = \s-1XML::CSV-\s0>new(); $csv_obj->parse_doc(\*(L"in_file.csv\*(R", {headings => 1});
$csv_obj->print_xml(\*(L"out.xml\*(R");
Example #2:
This example uses a passed headings array reference which is used along with the parsed data.
use \s-1XML::CSV\s0; $csv_obj = \s-1XML::CSV-\s0>new();
$csv_obj->{column_headings} = \@arr_of_headings;
$csv_obj->parse_doc(\*(L"in_file.csv\*(R"); $csv_obj->print_xml(\*(L"out.xml\*(R", {format => \*(L" \*(R", file_tag = \*(L"xml_file\*(R", parent_tag => \*(L"record\*(R"});
Example #3:
First it passes a reference to a array with column headings and then a reference to two dimensional array of data where the first index represents the row number and the second column number. We also pass a custom Text::CSV_XS object to overwrite the default object. This is usefull for creating your own \s-1CSV_XS\s0 object's args before using the parse_doc() method. See 'perldoc Text::CSV_XS' for different new() attributes.
use \s-1XML::CSV\s0;
$default_obj_xs = Text::CSV_XS->new({quote_char => '"'}); $csv_obj = \s-1XML::CSV-\s0>new({csv_xs => $default_obj_xs}); $csv_obj->{column_headings} = \@arr_of_headings;
$csv_obj->{column_data} = \@arr_of_data;
$csv_obj->print_xml(\*(L"out.xml\*(R");
Ilya Sterin, [email protected]
Text::CSV_XS