Information retrieval using a simple relational index
stag-ir.pl -r person -k social_security_no -d Pg:mydb myrecords.xml stag-ir.pl -d Pg:mydb -q 999-9999-9999 -q 888-8888-8888
Indexes stag nodes (\s-1XML\s0 Elements) in a simple relational db structure - keyed by \s-1ID\s0 with an \s-1XML\s0 Blob as a value
Imagine you have a very large file of data, in a stag compatible format such as \s-1XML\s0. You want to index all the elements of type person; each person can be uniquely identified by social_security_no, which is a direct subnode of person
The first thing to do is to build the index file, which will be stored in the database mydb
stag-ir.pl -r person -k social_security_no -d Pg:mydb myrecords.xml
You can then use the index \*(L"person-idx\*(R" to retrieve person nodes by their social security number
stag-ir.pl -d Pg:mydb -q 999-9999-9999 > some-person.xml
You can export using different stag formats
stag-ir.pl -d Pg:mydb -q 999-9999-9999 -w sxpr > some-person.xml
You can retrieve multiple nodes (although these need to be rooted to make a valid file)
stag-ir.pl -d Pg:mydb -q 999-9999-9999 -q 888-8888-8888 -top personset
Or you can use a list of IDs from a file (newline delimited)
stag-ir.pl -d Pg:mydb -qf my_ss_nmbrs.txt -top personset
-d \s-1DB_NAME\s0
This database will be used for storing the stag nodes
The name can be a logical name or \s-1DBI\s0 locator or DBStag shorthand - see DBIx::DBStag
The database must already exist
-clear
Deletes all data from the relation type (specified with -r) before loading
-insertonly
Does not check if the \s-1ID\s0 in the file exists in the db - will always attempt an \s-1INSERT\s0 (and will fail if \s-1ID\s0 already exists)
This is the fastest way to load data (only one \s-1SQL\s0 operation per node rather than two) but is only safe if there is no existing data
(Default is clobber mode - existing data with same \s-1ID\s0 will be replaced)
-newonly
If there is already data in the specified relation in the db, and the \s-1XML\s0 being loaded specifies an \s-1ID\s0 that is already in the db, then this node will be ignored
(Default is clobber mode - existing data with same \s-1ID\s0 will be replaced)
-transaction_size
A commit will be performed every n UPDATEs/COMMITs (and at the end)
Default is autocommit
note that if you are using -insertonly, and you are using transactions, and the input file contains an \s-1ID\s0 already in the database, then the transaction will fail because this script will try and insert a duplicate \s-1ID\s0
-r RELATION-NAME
This is the name of the stag node (\s-1XML\s0 element) that will be stored in the index; for example, with the \s-1XML\s0 below you may want to use the node name person and the unique key id
<person_set> <person> <id>...</id> </person> <person> <id>...</id> </person> ... </person_set>
This flag should only be used when you want to store data
-k UNIQUE-KEY
This node will be used as the unique/primary key for the data
This node should be nested directly below the node that is being stored in the index - if it is more that one below, specify a path
This flag should only be used when you want to store data
-u UNIQUE-KEY
Synonym for -k
-create
If specified, this will create a table for the relation name specified below; you should use this the first time you index a relation
-idtype \s-1TYPE\s0
(optional)
This is the \s-1SQL\s0 datatype for the unique key; it defaults to \s-1VARCHAR\s0(255)
If you know that your id is an integer, you can specify \s-1INTEGER\s0 here
If your id is always a 8-character field you can do this
-idtype 'CHAR(8)'
This option only makes sense when combined with the -c option
-p \s-1PARSER\s0
This can be the name of a stag supported format (xml, sxpr, itext) - \s-1XML\s0 is assumed by default
It can also be a module name - this module is used to parse the input file into a stag stream; see Data::Stag::BaseGenerator for details on writing your own parsers/event generators
This flag should only be used when you want to store data
-q QUERY-ID
Fetches the relation/node with unique key value equal to query-id
Multiple arguments can be passed by specifying -q multple times
This flag should only be used when you want to query data
-top NODE-NAME
If this is specified in conjunction with -q or -qf then all the query result nodes will be nested inside a node with this name (ie this provides a root for the resulting document tree)
-qf QUERY-FILE
This is a file of newline-seperated IDs; this is useful for querying the index in batch
-keys
This will write a list of all primary keys in the index
Data::Stag
For more complex stag to database mapping, see DBIx::DBStag and the scripts
stag-db.pl use file \s-1DBM\s0 indexes
stag-storenode.pl is for storing fully normalised stag trees
selectall_xml