Perl extension for formatting numbers
use Number::Format; my $x = new Number::Format %args; $formatted = $x->round($number, $precision); $formatted = $x->format_number($number, $precision, $trailing_zeroes); $formatted = $x->format_negative($number, $picture); $formatted = $x->format_picture($number, $picture); $formatted = $x->format_price($number, $precision, $symbol); $formatted = $x->format_bytes($number, $precision); $number = $x->unformat_number($formatted); use Number::Format qw(:subs); $formatted = round($number, $precision); $formatted = format_number($number, $precision, $trailing_zeroes); $formatted = format_negative($number, $picture); $formatted = format_picture($number, $picture); $formatted = format_price($number, $precision, $symbol); $formatted = format_bytes($number, $precision); $number = unformat_number($formatted);
Perl, version 5.8 or higher.
\s-1POSIX\s0.pm to determine locale settings.
Carp.pm is used for some error reporting.
These functions provide an easy means of formatting numbers in a manner suitable for displaying to the user.
There are two ways to use this package. One is to declare an object of type Number::Format, which you can think of as a formatting engine. The various functions defined here are provided as object methods. The constructor \*(C`new()\*(C' can be used to set the parameters of the formatting engine. Valid parameters are:
THOUSANDS_SEP - character inserted between groups of 3 digits DECIMAL_POINT - character separating integer and fractional parts MON_THOUSANDS_SEP - like THOUSANDS_SEP, but used for format_price MON_DECIMAL_POINT - like DECIMAL_POINT, but used for format_price INT_CURR_SYMBOL - character(s) denoting currency (see format_price()) DECIMAL_DIGITS - number of digits to the right of dec point (def 2) DECIMAL_FILL - boolean; whether to add zeroes to fill out decimal NEG_FORMAT - format to display negative numbers (def ``-x'') KILO_SUFFIX - suffix to add when format_bytes formats kilobytes (trad) MEGA_SUFFIX - " " " " " " megabytes (trad) GIGA_SUFFIX - " " " " " " gigabytes (trad) KIBI_SUFFIX - suffix to add when format_bytes formats kibibytes (iec) MEBI_SUFFIX - " " " " " " mebibytes (iec) GIBI_SUFFIX - " " " " " " gibibytes (iec)
They may be specified in upper or lower case, with or without a leading hyphen ( - ).
If \*(C`THOUSANDS_SEP\*(C' is set to the empty string, format_number will not insert any separators.
The defaults for \*(C`THOUSANDS_SEP\*(C', \*(C`DECIMAL_POINT\*(C', \*(C`MON_THOUSANDS_SEP\*(C', \*(C`MON_DECIMAL_POINT\*(C', and \*(C`INT_CURR_SYMBOL\*(C' come from the \s-1POSIX\s0 locale information (see perllocale). If your \s-1POSIX\s0 locale does not provide \*(C`MON_THOUSANDS_SEP\*(C' and/or \*(C`MON_DECIMAL_POINT\*(C' fields, then the \*(C`THOUSANDS_SEP\*(C' and/or \*(C`DECIMAL_POINT\*(C' values are used for those parameters. Formerly, \s-1POSIX\s0 was optional but this caused problems in some cases, so it is now required. If this causes you hardship, please contact the author of this package at <[email protected]> (remove \*(L"\s-1SPAM\s0\*(R" to get correct email address) for help.
If any of the above parameters are not specified when you invoke \*(C`new()\*(C', then the values are taken from package global variables of the same name (e.g. $DECIMAL_POINT is the default for the \*(C`DECIMAL_POINT\*(C' parameter). If you use the \*(C`:vars\*(C' keyword on your \*(C`use Number::Format\*(C' line (see non-object-oriented example below) you will import those variables into your namesapce and can assign values as if they were your own local variables. The default values for all the parameters are:
THOUSANDS_SEP = ',' DECIMAL_POINT = '.' MON_THOUSANDS_SEP = ',' MON_DECIMAL_POINT = '.' INT_CURR_SYMBOL = 'USD' DECIMAL_DIGITS = 2 DECIMAL_FILL = 0 NEG_FORMAT = '-x' KILO_SUFFIX = 'K' MEGA_SUFFIX = 'M' GIGA_SUFFIX = 'G' KIBI_SUFFIX = 'KiB' MEBI_SUFFIX = 'MiB' GIBI_SUFFIX = 'GiB'
Note however that when you first call one of the functions in this module without using the object-oriented interface, further setting of those global variables will have no effect on non-OO calls. It is recommended that you use the object-oriented interface instead for fewer headaches and a cleaner design.
The \*(C`DECIMAL_FILL\*(C' and \*(C`DECIMAL_DIGITS\*(C' values are not set by the Locale system, but are definable by the user. They affect the output of \*(C`format_number()\*(C'. Setting \*(C`DECIMAL_DIGITS\*(C' is like giving that value as the $precision argument to that function. Setting \*(C`DECIMAL_FILL\*(C' to a true value causes \*(C`format_number()\*(C' to append zeroes to the right of the decimal digits until the length is the specified number of digits.
\*(C`NEG_FORMAT\*(C' is only used by \*(C`format_negative()\*(C' and is a string containing the letter 'x', where that letter will be replaced by a positive representation of the number being passed to that function. \*(C`format_number()\*(C' and \*(C`format_price()\*(C' utilize this feature by calling \*(C`format_negative()\*(C' if the number was less than 0.
\*(C`KILO_SUFFIX\*(C', \*(C`MEGA_SUFFIX\*(C', and \*(C`GIGA_SUFFIX\*(C' are used by \*(C`format_bytes()\*(C' when the value is over 1024, 1024*1024, or 1024*1024*1024, respectively. The default values are \*(L"K\*(R", \*(L"M\*(R", and \*(L"G\*(R". These apply in the default \*(L"traditional\*(R" mode only. Note: \s-1TERA\s0 or higher are not implemented because of integer overflows on 32-bit systems.
\*(C`KIBI_SUFFIX\*(C', \*(C`MEBI_SUFFIX\*(C', and \*(C`GIBI_SUFFIX\*(C' are used by \*(C`format_bytes()\*(C' when the value is over 1024, 1024*1024, or 1024*1024*1024, respectively. The default values are \*(L"KiB\*(R", \*(L"MiB\*(R", and \*(L"GiB\*(R". These apply in the \*(L"iso60027\*(R"" mode only. Note: \s-1TEBI\s0 or higher are not implemented because of integer overflows on 32-bit systems.
The only restrictions on \*(C`DECIMAL_POINT\*(C' and \*(C`THOUSANDS_SEP\*(C' are that they must not be digits, must not be identical, and must each be one character. There are no restrictions on \*(C`INT_CURR_SYMBOL\*(C'.
For example, a German user might include this in their code:
use Number::Format; my $de = new Number::Format(-thousands_sep => '.', -decimal_point => ',', -int_curr_symbol => 'DEM'); my $formatted = $de->format_number($number);
Or, if you prefer not to use the object oriented interface, you can do this instead:
use Number::Format qw(:subs :vars); $THOUSANDS_SEP = '.'; $DECIMAL_POINT = ','; $INT_CURR_SYMBOL = 'DEM'; my $formatted = format_number($number);
Nothing is exported by default. To export the functions or the global variables defined herein, specify the function name(s) on the import list of the \*(C`use Number::Format\*(C' statement. To export all functions defined herein, use the special tag \*(C`:subs\*(C'. To export the variables, use the special tag \*(C`:vars\*(C'; to export both subs and vars you can use the tag \*(C`:all\*(C'.
Creates a new Number::Format object. Valid keys for %args are any of the parameters described above. Keys may be in all uppercase or all lowercase, and may optionally be preceded by a hyphen (-) character. Example: my $de = new Number::Format(-thousands_sep => '.', -decimal_point => ',', -int_curr_symbol => 'DEM'); Rounds the number to the specified precision. If $precision is omitted, the value of the \*(C`DECIMAL_DIGITS\*(C' parameter is used (default value 2). Both input and output are numeric (the function uses math operators rather than string manipulation to do its job), The value of $precision may be any integer, positive or negative. Examples: round(3.14159) yields 3.14 round(3.14159, 4) yields 3.1416 round(42.00, 4) yields 42 round(1234, -2) yields 1200 Since this is a mathematical rather than string oriented function, there will be no trailing zeroes to the right of the decimal point, and the \*(C`DECIMAL_POINT\*(C' and \*(C`THOUSANDS_SEP\*(C' variables are ignored. To format your number using the \*(C`DECIMAL_POINT\*(C' and \*(C`THOUSANDS_SEP\*(C' variables, use \*(C`format_number()\*(C' instead. Formats a number by adding \*(C`THOUSANDS_SEP\*(C' between each set of 3 digits to the left of the decimal point, substituting \*(C`DECIMAL_POINT\*(C' for the decimal point, and rounding to the specified precision using \*(C`round()\*(C'. Note that $precision is a maximum precision specifier; trailing zeroes will only appear in the output if $trailing_zeroes is provided, or the parameter \*(C`DECIMAL_FILL\*(C' is set, with a value that is true (not zero, undef, or the empty string). If $precision is omitted, the value of the \*(C`DECIMAL_DIGITS\*(C' parameter (default value of 2) is used. If the value is too large or great to work with as a regular number, but instead must be shown in scientific notation, returns that number in scientific notation without further formatting. Examples: format_number(12345.6789) yields '12,345.68' format_number(123456.789, 2) yields '123,456.79' format_number(1234567.89, 2) yields '1,234,567.89' format_number(1234567.8, 2) yields '1,234,567.8' format_number(1234567.8, 2, 1) yields '1,234,567.80' format_number(1.23456789, 6) yields '1.234568' format_number("0.000020000E+00", 7);' yields '2e-05' Of course the output would have your values of \*(C`THOUSANDS_SEP\*(C' and \*(C`DECIMAL_POINT\*(C' instead of ',' and '.' respectively. Formats a negative number. Picture should be a string that contains the letter \*(C`x\*(C' where the number should be inserted. For example, for standard negative numbers you might use ``\*(C`-x\*(C''', while for accounting purposes you might use ``\*(C`(x)\*(C'''. If the specified number begins with a ``-'' character, that will be removed before formatting, but formatting will occur whether or not the number is negative. Returns a string based on $picture with the \*(C`#\*(C' characters replaced by digits from $number. If the length of the integer part of $number is too large to fit, the \*(C`#\*(C' characters are replaced with asterisks (\*(C`*\*(C') instead. Examples: format_picture(100.023, 'USD ##,###.##') yields 'USD 100.02' format_picture(1000.23, 'USD ##,###.##') yields 'USD 1,000.23' format_picture(10002.3, 'USD ##,###.##') yields 'USD 10,002.30' format_picture(100023, 'USD ##,###.##') yields 'USD **,***.**' format_picture(1.00023, 'USD #.###,###') yields 'USD 1.002,300' The comma (,) and period (.) you see in the picture examples should match the values of \*(C`THOUSANDS_SEP\*(C' and \*(C`DECIMAL_POINT\*(C', respectively, for proper operation. However, the \*(C`THOUSANDS_SEP\*(C' characters in $picture need not occur every three digits; the only use of that variable by this function is to remove leading commas (see the first example above). There may not be more than one instance of \*(C`DECIMAL_POINT\*(C' in $picture. The value of \*(C`NEG_FORMAT\*(C' is used to determine how negative numbers are displayed. The result of this is that the output of this function my have unexpected spaces before and/or after the number. This is necessary so that positive and negative numbers are formatted into a space the same size. If you are only using positive numbers and want to avoid this problem, set \s-1NEG_FORMAT\s0 to \*(L"x\*(R". Returns a string containing $number formatted similarly to \*(C`format_number()\*(C', except that the decimal portion may have trailing zeroes added to make it be exactly $precision characters long, and the currency string will be prefixed. The $symbol attribute may be one of \*(L"\s-1INT_CURR_SYMBOL\s0\*(R" or \*(L"\s-1CURRENCY_SYMBOL\s0\*(R" (case insensitive) to use the value of that attribute of the object, or a string containing the symbol to be used. The default is \*(L"\s-1INT_CURR_SYMBOL\s0\*(R" if this argument is undefined or not given; if set to the empty string, or if set to undef and the \*(C`INT_CURR_SYMBOL\*(C' attribute of the object is the empty string, no currency will be added. If $precision is not provided, the default of 2 will be used. Examples: format_price(12.95) yields 'USD 12.95' format_price(12) yields 'USD 12.00' format_price(12, 3) yields '12.000' The third example assumes that \*(C`INT_CURR_SYMBOL\*(C' is the empty string. Returns a string containing $number formatted similarly to \*(C`format_number()\*(C', except that large numbers may be abbreviated by adding a suffix to indicate 1024, 1,048,576, or 1,073,741,824 bytes. Suffix may be the traditional K, M, or G (default); or the \s-1IEC\s0 standard 60027 \*(L"KiB,\*(R" \*(L"MiB,\*(R" or \*(L"GiB\*(R" depending on the \*(L"mode\*(R" option. Negative values will result in an error. The second parameter can be either a hash that sets options, or a number. Using a number here is deprecated and will generate a warning; early versions of Number::Format only allowed a numeric value. A future release of Number::Format will change this warning to an error. New code should use a hash instead to set options. If it is a number this sets the value of the \*(L"precision\*(R" option. Valid options are:
Set the precision for displaying numbers. If not provided, a default of 2 will be used. Examples: format_bytes(12.95) yields '12.95' format_bytes(12.95, precision => 0) yields '13' format_bytes(2048) yields '2K' format_bytes(2048, mode => "iec") yields '2KiB' format_bytes(9999999) yields '9.54M' format_bytes(9999999, precision => 1) yields '9.5M'
Sets the default units used for the results. The default is to determine this automatically in order to minimize the length of the string. In other words, numbers greater than or equal to 1024 (or other number given by the 'base' option, q.v.) will be divided by 1024 and $KILO_SUFFIX or $KIBI_SUFFIX added; if greater than or equal to 1048576 (1024*1024), it will be divided by 1048576 and $MEGA_SUFFIX or $MEBI_SUFFIX appended to the end; etc. However if a value is given for \*(C`unit\*(C' it will use that value instead. The first letter (case-insensitive) of the value given indicates the threshhold for conversion; acceptable values are G (for giga/gibi), M (for mega/mebi), K (for kilo/kibi), or A (for automatic, the default). For example: format_bytes(1048576, unit => 'K') yields '1,024K' instead of '1M' Note that the valid values to this option do not vary even when the suffix configuration variables have been changed.
Sets the number at which the $KILO_SUFFIX is added. Default is 1024. Set to any value; the only other useful value is probably 1000, as hard disk manufacturers use that number to make their disks sound bigger than they really are. If the mode (see below) is set to \*(L"iec\*(R" or \*(L"iec60027\*(R" then setting the base option results in an error.
Traditionally, bytes have been given in \s-1SI\s0 (metric) units such as \*(L"kilo\*(R" and \*(L"mega\*(R" even though they represent powers of 2 (1024, etc.) rather than powers of 10 (1000, etc.) This \*(L"binary prefix\*(R" causes much confusion in consumer products where \*(L"\s-1GB\s0\*(R" may mean either 1,048,576 or 1,000,000, for example. The International Electrotechnical Commission has created standard \s-1IEC\s0 60027 to introduce prefixes Ki, Mi, Gi, etc. (\*(L"kibibytes,\*(R" \*(L"mebibytes,\*(R" \*(L"gibibytes,\*(R" etc.) to remove this confusion. Specify a mode option with either \*(L"traditional\*(R" or \*(L"iec60027\*(R" (or abbreviate as \*(L"trad\*(R" or \*(L"iec\*(R") to indicate which type of binary prefix you want format_bytes to use. For backward compatibility, \*(L"traditional\*(R" is the default. See http://en.wikipedia.org/wiki/Binary_prefix for more information.
Converts a string as returned by \*(C`format_number()\*(C', \*(C`format_price()\*(C', or \*(C`format_picture()\*(C', and returns the corresponding value as a numeric scalar. Returns \*(C`undef\*(C' if the number does not contain any digits. Examples: unformat_number('USD 12.95') yields 12.95 unformat_number('USD 12.00') yields 12 unformat_number('foobar') yields undef unformat_number('[email protected]') yields 1234567.8 The value of \*(C`DECIMAL_POINT\*(C' is used to determine where to separate the integer and decimal portions of the input. All other non-digit characters, including but not limited to \*(C`INT_CURR_SYMBOL\*(C' and \*(C`THOUSANDS_SEP\*(C', are removed. If the number matches the pattern of \*(C`NEG_FORMAT\*(C' or there is a ``-'' character before any of the digits, then a negative number is returned. If the number ends with the \*(C`KILO_SUFFIX\*(C', \*(C`KIBI_SUFFIX\*(C', \*(C`MEGA_SUFFIX\*(C', \*(C`MEBI_SUFFIX\*(C', \*(C`GIGA_SUFFIX\*(C', or \*(C`GIBI_SUFFIX\*(C' characters, then the number returned will be multiplied by the appropriate multiple of 1024 (or if the base option is given, by the multiple of that value) as appropriate. Examples: unformat_number("4K", base => 1024) yields 4096 unformat_number("4K", base => 1000) yields 4000 unformat_number("4KiB", base => 1024) yields 4096 unformat_number("4G") yields 4294967296
Some systems, notably OpenBSD, may have incomplete locale support. Using this module together with setlocale\|(3) in OpenBSD may therefore not produce the intended results.
No known bugs at this time. Report bugs using the \s-1CPAN\s0 request tracker at https://rt.cpan.org/NoAuth/Bugs.html?Dist=Number-Format <https://rt.cpan.org/NoAuth/Bugs.html?Dist=Number-Format> or by email to the author.
William R. Ward, [email protected] (remove \*(L"\s-1SPAM\s0\*(R" before sending email, leaving only my initials)
perl\|(1).