SYNOPSIS

   use Chemistry::Formula qw(parse_formula);
   parse_formula('Pb (H (TiO3)2 )2 U [(H2O)3]2', \%count);

That is obviously not a real compound, but it demonstrates the capabilities of the routine. This returns

%count = ( 'O' => 18, 'H' => 14, 'Ti' => 4, 'U' => 1, 'Pb' => 1 );

DESCRIPTION

This module provides a function which parses a string containing a chemical formula and returns the number of each element in the string. It can handle nested parentheses and square brackets and correctly computes stoichiometry given numbers outside the (possibly nested) parentheses.

No effort is made to evaluate the chemical plausibility of the formula. The example above parses just fine using this module, even though it is clearly not a viable compound. Charge balancing, bond valence, and so on is beyond the scope of this module.

Only one function is exported, \*(C`parse_formula\*(C'. This takes a string and a hash reference as its arguments and returns 0 or 1.

$ok = parse_formula('PbTiO3', \%count);

If the formula was parsed without trouble, \*(C`parse_formula\*(C' returns 1. If there was any problem, it returns 0 and $count{error} is filled with a string describing the problem. It throws an error afer the first error encountered without testing the rest of the string.

If the formula was parsed correctly, the %count hash contains element symbols as its keys and the number of each element as its values.

Here is an example of a program that reads a string from the command line and, for the formula unit described in the string, writes the weight and absorption in barns.

use Data::Dumper; use Xray::Absorption; use Chemistry::Formula qw(parse_formula);

parse_formula($ARGV[0], \%count);

print Data::Dumper->Dump([\%count], [qw(*count)]); my ($weight, $barns) = (0,0); foreach my $k (keys(%$count)) { $weight += Xray::Absorption -> get_atomic_weight($k) * $count{$k}; $barns += Xray::Absorption -> cross_section($k, 9000) * $count{$k}; }; printf "This weighs %.3f amu and absorbs %.3f barns at 9 keV.\n", $weight, $barns;

Pretty simple.

The parser is not brilliant. Here are the ground rules:

1.

Element symbols must be first letter capitalized.

2.

Whitespace is unimportant \*(-- it will be removed from the string. So will dollar signs, underscores, and curly braces (in an attempt to handle TeX). Also a sequence like this: '/sub 3/' will be converted to '3' (in an attempt to handle \s-1INSPEC\s0).

3.

Numbers can be integers or floating point numbers. Things like 5, 0.5, 12.87, and .5 are all acceptible, as is exponential notation like 1e-2. Note that exponential notation must use a leading number to avoid confusion with element symbols. That is, 1e-2 is ok, but e-2 is not.

4.

Uncapitalized symbols or unrecognized symbols will flag an error.

5.

An error will be flagged if the number of open parens is different from the number of close parens.

6.

An error will be flagged if any unusual symbols are found in the string.

ACKNOWLEDGMENTS

This was written at the suggestion of Matt Newville, who tested early versions.

The routine \*(C`matchingbrace\*(C' was swiped from the C::Scan module, which can be found on \s-1CPAN\s0. C::Scan is maintained by Hugo van der Sanden.

AUTHOR

Bruce Ravel <bravel \s-1AT\s0 bnl \s-1DOT\s0 gov>

http://cars9.uchicago.edu/~ravel/software/

\s-1SVN\s0 repository: http://cars9.uchicago.edu/svn/libperlxray/