## Friday, April 9, 2010

### Tabrot

Here's another one that stems from the practical programs of my everyday life.

So there I was, writing an article in LaTeX using tabular to create this huge table. If you don't know tabular, it supports this very bare-bones way fo creating tables, where the ampersand separates the columns, and the double slash separates the rows, e.g.:

\begin{tabular}{c c c}
(1,1) &(1,2) & (1,3) \\
(2,1) & (2,2) & (2,3) \\
\end{tabular}

This particular table was something like a million rows and a billion columns... or at least it felt that way when I was typing it in. The thing was though, that right after I finished typesetting and compiling it I noticed two things:
(a) the content would make more sense if I swapped the rows with the columns, and (b) the damn table would also actually fit on the page if I did that.

Damn and blast. I'm not going to copy-paste it into submission and neither am I going to retype it, no matter how efficiently I can do it in vim... it's just wrong to do it manually. Just wrong!

So I wrote a quick script that could do it with just a little Math and some basic parsing. And it worked.

And then I figured I could very quickly add the option to go counter-clockwise with just a bit of effort. And then I thought that if I made this and that more in accordance with the functional
wossname, I could knock up a couple more transitions very easily. And then I decided that I could make those other functions more function-oriented too, if I used those newfangled map, reduce, and filter functions I keep hearing so much about. And then I figured I could play with optparser for a change (I usually do my option parsing on a slightly lower level with getopt). And then all that was left was to put comments on the thing and maybe clean up the code a bit, and here it is.

Honestly, this script practically built itself...

Anyway, here's how to use it. Supposing you have a CVS file that you want rotated clockwise to its current layout.

cat file.cvs | ./tabrot.py -c > new_file.cvs

How about if the file used semicolons instead of commas for delimiters?

cat file.cvs | ./tabrot.py -c -d';' > new_file.cvs

And so on. It's pretty basic.

Update: I added some stuff later on, like output delimiters (April 17th 2010).

And here's the code. Is it not pretty? Well, I think it's very pretty.
   1 #!/usr/bin/env python   2 # -*- coding: utf-8 -*-   3 #   4 # Tabrot   5 #   6 # The script reads the array from standard input, applies the specified    7 # transformations and prints the new array to standard output. The    8 # transformations include rotating the array (90°, 180°, 270°, and 0°) or    9 # flipping it (horizontally, vertically, or both).  10 #  11 # The input array is expected to be in a text format. The particular row and   12 # field delimiters can be set up with the appropriate options (see below).  13 #  14 # The functions used in the script can also be used via Python instead of a   15 # commandline tool, if your fancy takes you that way.  16 #  17 # Usage:  18 #   tabrot.py [OPTIONS]  19 #  20 # Options:  21 #   Rotating:  22 #       -c, --clockwise     rotate the array clockwise (right).  23 #       -C, --counter-clockwise  24 #                           rotate the array counter-clockwise (left).  25 #       -r, --rotate-180    rotate the array 180? - make it upside-down.  26 #       -m, --meh, --do-nothing  27 #                           rotate the array 0? - not at all.  28 #   Flipping (mirroring):  29 #       -f, --horizontal-flip  30 #                           flip the array horizontally.  31 #       -v, --vertical-flip  32 #                           flip the array vertically.  33 #       -b, --horizontal-and-vertical-flip  34 #                           flip the array both vertically and horizontally.  35 #   Delimiters:  36 #       -d DELIM, --field-delimiter=DELIM  37 #                           specify a field (column) delimiter instead of ",".  38 #       -D DELIM, --row-delimiter=DELIM  39 #                           specify a row delimiter instead of " ".  40 #       -o DELIM, --output-field-delimiter=DELIM  41 #                           specify an output field delimiter instead of ",".  42 #       -O DELIM, --output-row-delimiter=DELIM  43 #                           specify an output row delimiter instead of " ".  44 #       -S, --output-delimiters-same  45 #                           specify output delimiters as the same as input  46 #                           delimiters.  47 #   Printing:  48 #       -s TEMPLATE, --print-template=TEMPLATE  49 #                           set a sprintf-like template for cells, default: "%s"  50 #       -e STRING, --empty-field-symbol=STRING  51 #                           set a string to use for empty cells instead of " ".  52 #       -n, --no-newline    do not append new line at the end of output.  53 #   -h, --help            show this help message and exit  54 #  55 # Examples:  56 #   echo -en "a,b,c\n1,2,3" | ./tabrot.py -c            # result: 1,a\n2,b\n3,c  57 #   echo 'a&b&c|1&2&3' | ./tabrot.py -d'&' -D'|' -f     # result: c&b&a|3&2&1  58 #  59 # Author:  60 #   Konrad Siek <konrad.siek@gmail.com>  61 #  62 # License:  63 #   Copyright 2010 Konrad Siek  64 #  65 #   This program is free software: you can redistribute it and/or modify it   66 #   under the terms of the GNU General Public License as published by the Free   67 #   Software Foundation, either version 3 of the License, or (at your option)   68 #   any later version. See  <http://www.gnu.org/licenses/> for details.  69   70 # Default constants that get used by all the functions unless other values are  71 # specified via arguments.  72 COL_DELIM = ','  73 ROW_DELIM = '\n'  74 O_COL_DELIM = ','  75 O_ROW_DELIM = '\n'  76 EMPTY_SYM = ' '  77 PRINT_TPL = '%s'  78   79 def rotate_clockwise(array, empty=EMPTY_SYM): 80     """ Rotate a 2D array clockwise (rightward) returning a copy of the array.  81     By a 2D array I mean a list containing lists containing strings.  82   83     If an empty element occurs (a string of length 0), then it gets replaced  84     with blanks - whatever is provided through the parameter 'empty'.  85       86     If the array is ragged (contains rows of different lengths), then it will  87     be filled out with blanks during the transformation.  88   89     Note that the original array is left intact."""  90   91     return transpose(array, _cw_coord_trans, empty=empty) 92   93 def rotate_counterclockwise(array, empty=EMPTY_SYM): 94     """ Rotate a 2D array counter-clockwise (leftward) returning a copy of the  95      array. By a 2D array I mean a list containing lists containing strings.  96   97     If an empty element occurs (a string of length 0), then it gets replaced  98     with blanks - whatever is provided through the parameter 'empty'.  99  100     If the array is ragged (contains rows of different lengths), then it will 101     be filled out with blanks during the transformation. 102  103     Note that the original array is left intact.""" 104  105     return transpose(array, _ccw_coord_trans, empty=empty)106  107 def rotate_180_degrees(array, empty=EMPTY_SYM):108     """ Rotate a 2D array upside-down (180° as the name implies) returning a 109     copy of the array. By a 2D array I mean a list containing lists containing 110     strings. 111  112     If an empty element occurs (a string of length 0), then it gets replaced 113     with blanks - whatever is provided through the parameter 'empty'. 114  115     If the array is ragged (contains rows of different lengths), then it will  116     be filled out with blanks during the transformation. 117  118     Note that the original array is left intact.""" 119  120     return transpose(array, _180_coord_trans, swap=False, empty=empty)121  122 def flip_horizontal(array, empty=EMPTY_SYM):123     """ Flip a 2D array horizontally (left-to-right) returning a copy of the  124     array. By a 2D array I mean a list containing lists containing strings. 125  126     If an empty element occurs (a string of length 0), then it gets replaced 127     with blanks - whatever is provided through the parameter 'empty'. 128  129     If the array is ragged (contains rows of different lengths), then it will  130     be filled out with blanks during the transformation. 131  132     Note that the original array is left intact.""" 133  134     return transpose(array, _flip_horz_trans, swap=False, empty=empty)135  136 def flip_vertical(array, empty=EMPTY_SYM):137     """ Flip a 2D array vertically (upside-down) returning a copy of the 138     array. By a 2D array I mean a list containing lists containing strings. 139  140     If an empty element occurs (a string of length 0), then it gets replaced 141     with blanks - whatever is provided through the parameter 'empty'. 142  143     If the array is ragged (contains rows of different lengths), then it will 144     be filled out with blanks during the transformation. 145  146     Note that the original array is left intact.""" 147  148     return transpose(array, _flip_vert_trans, swap=False, empty=empty)149  150 def flip_horizontal_and_vertical(array, empty=EMPTY_SYM):151     """ Flip a 2D array vertically and horizontally (both upside-down and 152     left-to-right) returning a copy of the array. By a 2D array I mean a list 153     containing lists containing strings. 154  155     If an empty element occurs (a string of length 0), then it gets replaced 156     with blanks - whatever is provided through the parameter 'empty'. 157  158     If the array is ragged (contains rows of different lengths), then it will  159     be filled out with blanks during the transformation. 160  161     Note that the original array is left intact.""" 162  163     return transpose(array, _flip_both_trans, swap=False, empty=empty)164  165 def rotate_not_at_all(array, empty=EMPTY_SYM):166     """ Rotate a 2D array 0° returning a copy of the array. That is to say,  167     do not rotate the array at all, but apply all the delimiter splitting and  168     conversions that would've been applied had the array been rotated.  169  170     By a 2D array I mean a list containing lists containing strings. 171  172     If an empty element occurs (a string of length 0), then it gets replaced 173     with blanks - whatever is provided through the parameter 'empty'. 174  175     If the array is ragged (contains rows of different lengths), then it will  176     be filled out with blanks during the transformation. 177  178     Note that the original array is left intact.""" 179  180     return transpose(array, _meh_coord_trans, swap=False, empty=empty)181  182 def transpose(arr, coord_trans, swap=True, empty=EMPTY_SYM):183     """ Perform a specific transformation on the given array returning a new 184     version of the array. 185  186     The general functioning can be roughly (and cryptically) described as: 187         (i', j') ← coord_trans(i, j, n, m)       i = 1...n, j = 1...m 188         ∀i,∀j new_array[i'][j'] ← old_array[i][j] 189  190     That is, the coordinates of each elements in the old array are translated 191     in the coordinates the elemnt should be in in the new array, and when those 192     are ready, the element is pasted there. 193  194     The parameter coord_trans specifies a function that translates indexes of  195     the old array to the indexes of the new array, and it is of type: 196          int->int->int->int->(int, int)          197  198     This means it takes 4 integers as arguments and returns a two-integer  199     tupple.The parameters are:  200         * i - a row index of the old array (1st dimension) 201         * j - a column index of the old array (2nd dimension) 202         * n - the number of rows  203         * m - the number of columns (the maximum number of columns, if ragged) 204  205     An example of a coord_trans function (that rotates the array clockwise) is: 206         coord_trans = lambda i, j, n, m: (j, n - 1 - i) 207  208     The parameter swap indicates whether the array will be put on one of its  209     sides or if it will be put on its top or bottom. True indicates that the 210     array will be on the side after the transformation (rows become columns and 211     vice-versa; in other words, i' will be a function of j and j' a function  212     of i). False indicates that the array will be either as it was or  213     upside-down or similar (columns remain columns, etc; in other words, i'  214     will be a function of i, and j' a function of j). 215  216     The parameter empty is a string which will be used if the cell with a given 217     index does not exist in the original array (e.g. because it was ragged) or 218     instead of any cell that holds an empty string (string of length 0).""" 219  220     isize = len(arr)221     jsize = reduce(max, map(len, arr))222     trans_f = lambda n, m: [[empty for c in range(m) ] for r in range(n)]223     trans = trans_f(jsize, isize) if swap else trans_f(isize, jsize)224  225     for i in range(isize):226         for j in range(len(arr[i])):227             ni, nj = coord_trans(i, j, isize, jsize)228             #print "%s, %s\t%s, %s\t'%s'" % (i,j, ni, nj, arr[i][j]) 229             trans[ni][nj] = arr[i][j] if len(arr[i][j]) else empty230  231     return trans232  233 _cw_coord_trans  = lambda i, j, n, m: (j, n-1-i)234 _ccw_coord_trans = lambda i, j, n, m: (m-1-j, i)235 _180_coord_trans = lambda i, j, n, m: (n-1-i, m-1-j)236 _meh_coord_trans = lambda i, j, n, m: (i, j)237 _flip_horz_trans = lambda i, j, n, m: (i, m-1-j)238 _flip_vert_trans = lambda i, j, n, m: (n-1-i, j)239 _flip_both_trans = lambda i, j, n, m: (n-1-i, m-1-j)240  241 def parse(text, col_delim=COL_DELIM, row_delim=ROW_DELIM):242     """ Read a block of text and divide it according to the specified row and 243     column delimiters to form a 2D array (a list of lists of strings).""" 244  245     return [[c for c in r.split(col_delim)] for r in text.split(row_delim)]246  247 def tostr(arr, col_delim=COL_DELIM, row_delim=ROW_DELIM, tp = PRINT_TPL):248     """ Create a string representation of a given 2D array using the specified 249     delimiters to separate rows and columns. 250  251     A template may be specified accoring to python string formating utilities  252     for all the cells to use. For instance, using tp='"%20s"' will create an  253     output string where each cell uses a minimum of 20 characters and is  254     surrounded by double quotes.""" 255  256     return row_delim.join(map(lambda c: col_delim.join([tp%f for f in c]), arr))257  258 _C_CW, _C_CCW, _C_180, _C_VERT, _C_HORZ, _C_BOTH, _C_MEH = range(6) + [None]259  260 _OPERATIONS = {261     _C_CW: rotate_clockwise,262     _C_CCW: rotate_counterclockwise,263     _C_180: rotate_180_degrees,264     _C_HORZ: flip_horizontal,265     _C_VERT: flip_vertical,266     _C_BOTH: flip_horizontal_and_vertical,267     _C_MEH: rotate_not_at_all,268 }269  270 if __name__ == '__main__':271     from optparse import OptionParser, OptionGroup272     from sys import argv273     from os.path import basename274  275     # Prepare the parser. 276     usage = '%s [OPTIONS]' % basename(argv[0])277     parser = OptionParser(usage=usage)278  279     # Prepare all the parse options that have to do with rotating the array. 280     rotate = OptionGroup(parser, "Rotating")281     rotate.add_option('-c', '--clockwise', \ 282         action="append_const", dest="operations", const=_C_CW, \ 283         help='rotate the array clockwise (right).')284     rotate.add_option('-C', '--counter-clockwise', \ 285         action="append_const", dest="operations", const=_C_CCW, \ 286         help='rotate the array counter-clockwise (left).')287     rotate.add_option('-r', '--rotate-180', \ 288         action="append_const", dest="operations", const=_C_180, \ 289         help=u'rotate the array 180° - make it upside-down.')290     rotate.add_option('-m', '--meh', '--do-nothing', \ 291         action="append_const", dest="operations", const=_C_MEH, \ 292         help=u'rotate the array 0° - not at all.')293     parser.add_option_group(rotate)294  295     # Prepare all the parse options that have to do with flipping the array. 296     flip = OptionGroup(parser, "Flipping (mirroring)")297     flip.add_option('-f', '--horizontal-flip', \ 298         action="append_const", dest="operations", const=_C_HORZ, \ 299         help='flip the array horizontally.')300     flip.add_option('-v', '--vertical-flip', \ 301         action="append_const", dest="operations", const=_C_VERT, \ 302         help='flip the array vertically.')303     flip.add_option('-b', '--horizontal-and-vertical-flip', \ 304         action="append_const", dest="operations", const=_C_BOTH, \ 305         help='flip the array both vertically and horizontally.')306     parser.add_option_group(flip)307  308     # Prepare all the parse options that have to do with delimiters used for 309     # splitting the array on input and joining it back together on output. 310     delims = OptionGroup(parser, "Delimiters")311     delims.add_option('-d', '--field-delimiter', \ 312         metavar='DELIM', dest="col_delim", default=COL_DELIM, \ 313         help='specify a field (column) delimiter instead of "%s".' % COL_DELIM)314     delims.add_option('-D', '--row-delimiter', \ 315         metavar='DELIM', dest="row_delim", default=ROW_DELIM, \ 316         help='specify a row delimiter instead of "%s".' % ROW_DELIM)317     delims.add_option('-o', '--output-field-delimiter', \ 318         metavar='DELIM', dest="out_col_delim", default=O_COL_DELIM, \ 319         help='specify an output field delimiter instead of "%s".' % O_COL_DELIM)320     delims.add_option('-O', '--output-row-delimiter', \ 321         metavar='DELIM', dest="out_row_delim", default=O_ROW_DELIM, \ 322         help='specify an output row delimiter instead of "%s".' % O_ROW_DELIM)323     delims.add_option('-S', '--output-delimiters-same', \ 324         dest="same_delims", default=False, action="store_true", \ 325         help='specify output delimiters as the same as input delimiters.')326     parser.add_option_group(delims)327  328     # Assorted options, that have something vaguely to do with printing. 329     printing = OptionGroup(parser, "Printing")330     printing.add_option('-s', '--print-template', \ 331         metavar='TEMPLATE', dest="print_tpl", default=PRINT_TPL, \ 332         help='set a sprintf-like template for cells, default: "%s"' % PRINT_TPL)333     printing.add_option('-e', '--empty-field-symbol', \ 334         metavar='STRING', dest="empty_sym", default=EMPTY_SYM, \ 335         help='set a string to use for empty cells instead of "%s".' % EMPTY_SYM)336     printing.add_option('-n', '--no-newline', \ 337         action="store_false", dest="newline", default=True, \ 338         help="do not append new line at the end of output.")339     parser.add_option_group(printing)340  341     opts, args = parser.parse_args()342  343     from sys import stdin, stdout344  345     # Read in an array from standard input in string form and normalize new 346     # lines. 347     input = '\n'.join(map(lambda s: s.strip('\n'), stdin))348  349     # Convert the string into an array, apply a transformation and the convert 350     # the new array back into a string. 351     arr = parse(input, opts.col_delim, opts.row_delim)352     if opts.same_delims:353         opts.out_row_delim = opts.row_delim354         opts.out_col_delim = opts.col_delim355     if not opts.operations:356         opts.operations = [None]357     for operation in opts.operations:358         arr = _OPERATIONS[operation](arr, opts.empty_sym)359     str = tostr(arr, opts.out_col_delim, opts.out_row_delim, opts.print_tpl)360  361     # Write out the new array to standard output, possibly adding an extra line 362     # at the end so it doesn't get glued to \$PS1 (I hate when that happens). 363     stdout.write(str + '\n' if opts.newline else str)364

The code is also available at GitHub as python/tabrot.py.