NAME Data::TextFields - routines for parsing text into hash structure SYNOPSIS use Data::TextFields; my $filename = "/bob/weblog/2002-12-15.txt"; my $ref = TxtIn($filename); my $result = TxtOut($filename,$ref); Where /bob/weblog/2002-12-15.txt contains: TITLE sample article DESCRIPTION Article found while Googling. . AUTHOR Bob Root, Doug Core, and Susan Finger LINK http://example.com/articles/sample CATEGORY test CATEGORY sample and $ref contains: { title => 'sample article', description => "Article found\nwhile Googling.", author => 'Bob Root, Doug Core, and Susan Finger', link => 'http://example.com/articles/sample', category => [ 'test', 'sample' ], } Fields are defined by a hash table, which defaults to a subset of RSS 2.0 item fields. Setting it to: { set => 1, object => [ qw/color type/ ], } will parse this: type cars object red;chevy object blue;ford as this: { type => 'cars', object => [ { color => 'red', type => 'chevy' }, { color => 'blue', type => 'ford' }, ], } DESCRIPTION This module allows you to store structured data in a format that can be easily created in any decent text edi- tor. The specific applications I had in mind when I wrote it were weblog entries, recipes, and photo-gallery meta- data. Here's a full-featured example of the data format: # comments are ignored # blank lines outside of a field are also ignored Field1 either put text after the field name Field2 or put as many lines of text as you want on separate lines, and end with a line containing only a "." . #multiple copies of a field become an array Field3 foo Field3 bar #if a field's definition contains an array of #strings, the value is split on ";" and #a hash is created Field4 foo;bar;baz Given the following field definitions: { field1 => 1, field2 => 1, field3 => 1, field4 => [ qw/name rank serial/ ] } The following hash will be returned by TxtIn(): { field1 => 'either put text after...', field2 => 'or put as many lines...', field3 => [ 'foo', 'bar' ], field4 => { name=> 'foo', rank => 'bar', serial => 'baz' }, } FUNCTIONS TxtIn($file,[$fieldref]) Parse a file of text into a hash table, based on the defi- nitions passed in $fieldref or the default definitions. Either returns a hashref (if one set of data was found in the file), or a reference to an array of hashrefs (if the file contained multiple record, separated by '###cut###'). TxtOut($hash,$file,[$fieldref],[$orderref]) Note: this function hasn't been written yet. Write the data structure in $hash out to $file, based on the definitions passed in $fieldref or the default defini- tions, in the order specified in $orderref or alphabetical by default. PARSING Data Files The format of the data files is intended to be simple and robust, with a good chance of surviving email and other hazards. Comments starting with "#" are ignored (with one exception, described below), as are blank lines outside of fields. Case is not significant in field names, and whitespace is whitespace; you can freely mix any number of spaces and tabs, and they will be treated as a single space while parsing, and preserved in values. When the parser reads a line, the first word is inter- preted as the field name (which must exist in the field definition table). If there are any other non-whitespace characters on the line, they are interpreted as the field's value. Otherwise, everything up until the next line containing only a "." is the value. If a field appears multiple times, the values are col- lected into an array. A file can contain more than one set of data. The special comment "###cut###" appearing by itself on a line termi- nates a record and starts a new one. Field Definition Table A hash of valid field names and their definitions, capi- talized as they should appear in the data structure read in by TxtIn() or the file written out by TxtOut(). A value of 1 in a field definition indicates that the data should be returned as a single string. If the value is an array of strings, the data will be returned as a hash, with keys consisting of the strings in the array, and values as ';'-separated fields, assigned in order. A literal semicolon in this kind of field value is represented by '\;'. Whitespace surrounding a separator will be stripped. $Data::TextFields::XMLSimple By default, XMLout() in the XML::Simple package will fold simple fields into attributes. This is never what I want to do. I could have switched XML libraries, but instead I added this flag. Setting it to "1" modifies the data structure created by TxtIn() so that it can be written out by XMLout(). Basically, all top-level scalar values become one-element arrays. Cheesy, but it worked. BUGS TxtOut() is just a call to Data::Dumper() right now. AUTHOR J Greely,