Crosstabulate Data in a Text File
xtab.py

xtab.py is a Python module and command-line program that rearranges data from a normalized format to a crosstabulated format. It takes data in this form:

StationDateValue
WQ-012006-05-234.5
WQ-022006-05-233.7
WQ-032006-05-236.8
WQ-012006-06-159.7
WQ-022006-05-155.1
WQ-032006-06-157.2
WQ-012006-07-1910
WQ-022006-07-196.1
WQ-032006-07-198.8

and rearranges it into this form:

Station2006-05-232006-06-152006-07-19
WQ-014.53.76.8
WQ-029.75.17.2
WQ-03106.18.8

Input and output are both text (CSV) files.

Capabilities

You can use the xtab program to:

Required and Optional Arguments

Required Arguments

-i <filename> The name of the input file from which to read data. This must be a text file, with data in a normalized format. The first line of the file must contain column names. -o <filename> The name of the output file to create. The output file will be created as a .csv file. Required argument. -r <column_name1> [column_name2 [...]] One or more column names to use as row headers (space delimited). Unique values of these columns will appear at the beginning of every output line. -c <column_name1> [column_name2 [...]] One or more column names to use as column headers in the output (space delimited). A crosstab column (or columns) will be created for every unique combination of values of these fields in the input. -v <column_name1> [column_name2 [...]] One or more column names with values to be used to fill the cells of the cross-table. If n columns names are specified, then there will be n columns in the output table for each of the column headers corresponding to values of the -c argument. The column names specified with the -v argument will be appended to the output column headers created from values of the -c argument. There should be only one value of the -v column(s) for each combination of the -r and -c columns; if there is more than one, a warning will be printed and only the first value will appear in the output. (That is, values are not combined in any way when there are multiple values for each output cell.)

Optional Arguments

-d Prints output column headers in two rows. The first row contains values of the columns specified by the -c argument, and the second row contains the column names specified by the -v argument. If this is not specified, output column headers are printed in one row, with elements joined by underscores to facilitate parsing by other programs. -f Use a temporary (sqlite) file instead of memory for intermediate storage. -k Keep (i.e., do not delete) the sqlite file. Only useful with the "-f" option. Unless the "-t" option is also used, the table name will be "src". -t <tablename> Name to use for the table in the intermediate sqlite database. Only useful with the "-f" and "-k" options. -e [filename] Log all error messages, to a file if the filename is specified or to the console if the filename is not specified. -q <filename> Log the sequence of SQL commands used to extract data from the input file to write the output file, including the result of each command. -h Print this help and exit.

Notes

Copyright and License

Copyright (c) 2008, R.Dreas Nielsen

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. The GNU General Public License is available at http://www.gnu.org/licenses/.