Friday, April 9, 2010

Tabrot

Here's another one that stems from the practical programs of my everyday life.

So there I was, writing an article in LaTeX using tabular to create this huge table. If you don't know tabular, it supports this very bare-bones way fo creating tables, where the ampersand separates the columns, and the double slash separates the rows, e.g.:

\begin{tabular}{c c c}
(1,1) &(1,2) & (1,3) \\
(2,1) & (2,2) & (2,3) \\
\end{tabular}

This particular table was something like a million rows and a billion columns... or at least it felt that way when I was typing it in. The thing was though, that right after I finished typesetting and compiling it I noticed two things:
(a) the content would make more sense if I swapped the rows with the columns, and (b) the damn table would also actually fit on the page if I did that.

Damn and blast. I'm not going to copy-paste it into submission and neither am I going to retype it, no matter how efficiently I can do it in vim... it's just wrong to do it manually. Just wrong!

So I wrote a quick script that could do it with just a little Math and some basic parsing. And it worked.

And then I figured I could very quickly add the option to go counter-clockwise with just a bit of effort. And then I thought that if I made this and that more in accordance with the functional
wossname, I could knock up a couple more transitions very easily. And then I decided that I could make those other functions more function-oriented too, if I used those newfangled map, reduce, and filter functions I keep hearing so much about. And then I figured I could play with optparser for a change (I usually do my option parsing on a slightly lower level with getopt). And then all that was left was to put comments on the thing and maybe clean up the code a bit, and here it is.

Honestly, this script practically built itself...

Anyway, here's how to use it. Supposing you have a CVS file that you want rotated clockwise to its current layout.

cat file.cvs | ./tabrot.py -c > new_file.cvs

How about if the file used semicolons instead of commas for delimiters?

cat file.cvs | ./tabrot.py -c -d';' > new_file.cvs

And so on. It's pretty basic.

Update: I added some stuff later on, like output delimiters (April 17th 2010).

And here's the code. Is it not pretty? Well, I think it's very pretty.
 
1 #!/usr/bin/env python
2 # -*- coding: utf-8 -*-
3 #
4 # Tabrot
5 #
6 # The script reads the array from standard input, applies the specified
7 # transformations and prints the new array to standard output. The
8 # transformations include rotating the array (90°, 180°, 270°, and 0°) or
9 # flipping it (horizontally, vertically, or both).
10 #
11 # The input array is expected to be in a text format. The particular row and
12 # field delimiters can be set up with the appropriate options (see below).
13 #
14 # The functions used in the script can also be used via Python instead of a
15 # commandline tool, if your fancy takes you that way.
16 #
17 # Usage:
18 # tabrot.py [OPTIONS]
19 #
20 # Options:
21 # Rotating:
22 # -c, --clockwise rotate the array clockwise (right).
23 # -C, --counter-clockwise
24 # rotate the array counter-clockwise (left).
25 # -r, --rotate-180 rotate the array 180? - make it upside-down.
26 # -m, --meh, --do-nothing
27 # rotate the array 0? - not at all.
28 # Flipping (mirroring):
29 # -f, --horizontal-flip
30 # flip the array horizontally.
31 # -v, --vertical-flip
32 # flip the array vertically.
33 # -b, --horizontal-and-vertical-flip
34 # flip the array both vertically and horizontally.
35 # Delimiters:
36 # -d DELIM, --field-delimiter=DELIM
37 # specify a field (column) delimiter instead of ",".
38 # -D DELIM, --row-delimiter=DELIM
39 # specify a row delimiter instead of " ".
40 # -o DELIM, --output-field-delimiter=DELIM
41 # specify an output field delimiter instead of ",".
42 # -O DELIM, --output-row-delimiter=DELIM
43 # specify an output row delimiter instead of " ".
44 # -S, --output-delimiters-same
45 # specify output delimiters as the same as input
46 # delimiters.
47 # Printing:
48 # -s TEMPLATE, --print-template=TEMPLATE
49 # set a sprintf-like template for cells, default: "%s"
50 # -e STRING, --empty-field-symbol=STRING
51 # set a string to use for empty cells instead of " ".
52 # -n, --no-newline do not append new line at the end of output.
53 # -h, --help show this help message and exit
54 #
55 # Examples:
56 # echo -en "a,b,c\n1,2,3" | ./tabrot.py -c # result: 1,a\n2,b\n3,c
57 # echo 'a&b&c|1&2&3' | ./tabrot.py -d'&' -D'|' -f # result: c&b&a|3&2&1
58 #
59 # Author:
60 # Konrad Siek <konrad.siek@gmail.com>
61 #
62 # License:
63 # Copyright 2010 Konrad Siek
64 #
65 # This program is free software: you can redistribute it and/or modify it
66 # under the terms of the GNU General Public License as published by the Free
67 # Software Foundation, either version 3 of the License, or (at your option)
68 # any later version. See <http://www.gnu.org/licenses/> for details.
69
70 # Default constants that get used by all the functions unless other values are
71 # specified via arguments.
72 COL_DELIM = ','
73 ROW_DELIM = '\n'
74 O_COL_DELIM = ','
75 O_ROW_DELIM = '\n'
76 EMPTY_SYM = ' '
77 PRINT_TPL = '%s'
78
79 def rotate_clockwise(array, empty=EMPTY_SYM):
80 """ Rotate a 2D array clockwise (rightward) returning a copy of the array.
81 By a 2D array I mean a list containing lists containing strings.
82
83 If an empty element occurs (a string of length 0), then it gets replaced
84 with blanks - whatever is provided through the parameter 'empty'.
85
86 If the array is ragged (contains rows of different lengths), then it will
87 be filled out with blanks during the transformation.
88
89 Note that the original array is left intact."""
90
91 return transpose(array, _cw_coord_trans, empty=empty)
92
93 def rotate_counterclockwise(array, empty=EMPTY_SYM):
94 """ Rotate a 2D array counter-clockwise (leftward) returning a copy of the
95 array. By a 2D array I mean a list containing lists containing strings.
96
97 If an empty element occurs (a string of length 0), then it gets replaced
98 with blanks - whatever is provided through the parameter 'empty'.
99
100 If the array is ragged (contains rows of different lengths), then it will
101 be filled out with blanks during the transformation.
102
103 Note that the original array is left intact."""
104
105 return transpose(array, _ccw_coord_trans, empty=empty)
106
107 def rotate_180_degrees(array, empty=EMPTY_SYM):
108 """ Rotate a 2D array upside-down (180° as the name implies) returning a
109 copy of the array. By a 2D array I mean a list containing lists containing
110 strings.
111
112 If an empty element occurs (a string of length 0), then it gets replaced
113 with blanks - whatever is provided through the parameter 'empty'.
114
115 If the array is ragged (contains rows of different lengths), then it will
116 be filled out with blanks during the transformation.
117
118 Note that the original array is left intact."""
119
120 return transpose(array, _180_coord_trans, swap=False, empty=empty)
121
122 def flip_horizontal(array, empty=EMPTY_SYM):
123 """ Flip a 2D array horizontally (left-to-right) returning a copy of the
124 array. By a 2D array I mean a list containing lists containing strings.
125
126 If an empty element occurs (a string of length 0), then it gets replaced
127 with blanks - whatever is provided through the parameter 'empty'.
128
129 If the array is ragged (contains rows of different lengths), then it will
130 be filled out with blanks during the transformation.
131
132 Note that the original array is left intact."""
133
134 return transpose(array, _flip_horz_trans, swap=False, empty=empty)
135
136 def flip_vertical(array, empty=EMPTY_SYM):
137 """ Flip a 2D array vertically (upside-down) returning a copy of the
138 array. By a 2D array I mean a list containing lists containing strings.
139
140 If an empty element occurs (a string of length 0), then it gets replaced
141 with blanks - whatever is provided through the parameter 'empty'.
142
143 If the array is ragged (contains rows of different lengths), then it will
144 be filled out with blanks during the transformation.
145
146 Note that the original array is left intact."""
147
148 return transpose(array, _flip_vert_trans, swap=False, empty=empty)
149
150 def flip_horizontal_and_vertical(array, empty=EMPTY_SYM):
151 """ Flip a 2D array vertically and horizontally (both upside-down and
152 left-to-right) returning a copy of the array. By a 2D array I mean a list
153 containing lists containing strings.
154
155 If an empty element occurs (a string of length 0), then it gets replaced
156 with blanks - whatever is provided through the parameter 'empty'.
157
158 If the array is ragged (contains rows of different lengths), then it will
159 be filled out with blanks during the transformation.
160
161 Note that the original array is left intact."""
162
163 return transpose(array, _flip_both_trans, swap=False, empty=empty)
164
165 def rotate_not_at_all(array, empty=EMPTY_SYM):
166 """ Rotate a 2D array 0° returning a copy of the array. That is to say,
167 do not rotate the array at all, but apply all the delimiter splitting and
168 conversions that would've been applied had the array been rotated.
169
170 By a 2D array I mean a list containing lists containing strings.
171
172 If an empty element occurs (a string of length 0), then it gets replaced
173 with blanks - whatever is provided through the parameter 'empty'.
174
175 If the array is ragged (contains rows of different lengths), then it will
176 be filled out with blanks during the transformation.
177
178 Note that the original array is left intact."""
179
180 return transpose(array, _meh_coord_trans, swap=False, empty=empty)
181
182 def transpose(arr, coord_trans, swap=True, empty=EMPTY_SYM):
183 """ Perform a specific transformation on the given array returning a new
184 version of the array.
185
186 The general functioning can be roughly (and cryptically) described as:
187 (i', j') ← coord_trans(i, j, n, m) i = 1...n, j = 1...m
188 ∀i,∀j new_array[i'][j'] ← old_array[i][j]
189
190 That is, the coordinates of each elements in the old array are translated
191 in the coordinates the elemnt should be in in the new array, and when those
192 are ready, the element is pasted there.
193
194 The parameter coord_trans specifies a function that translates indexes of
195 the old array to the indexes of the new array, and it is of type:
196 int->int->int->int->(int, int)
197
198 This means it takes 4 integers as arguments and returns a two-integer
199 tupple.The parameters are:
200 * i - a row index of the old array (1st dimension)
201 * j - a column index of the old array (2nd dimension)
202 * n - the number of rows
203 * m - the number of columns (the maximum number of columns, if ragged)
204
205 An example of a coord_trans function (that rotates the array clockwise) is:
206 coord_trans = lambda i, j, n, m: (j, n - 1 - i)
207
208 The parameter swap indicates whether the array will be put on one of its
209 sides or if it will be put on its top or bottom. True indicates that the
210 array will be on the side after the transformation (rows become columns and
211 vice-versa; in other words, i' will be a function of j and j' a function
212 of i). False indicates that the array will be either as it was or
213 upside-down or similar (columns remain columns, etc; in other words, i'
214 will be a function of i, and j' a function of j).
215
216 The parameter empty is a string which will be used if the cell with a given
217 index does not exist in the original array (e.g. because it was ragged) or
218 instead of any cell that holds an empty string (string of length 0)."""
219
220 isize = len(arr)
221 jsize = reduce(max, map(len, arr))
222 trans_f = lambda n, m: [[empty for c in range(m) ] for r in range(n)]
223 trans = trans_f(jsize, isize) if swap else trans_f(isize, jsize)
224
225 for i in range(isize):
226 for j in range(len(arr[i])):
227 ni, nj = coord_trans(i, j, isize, jsize)
228 #print "%s, %s\t%s, %s\t'%s'" % (i,j, ni, nj, arr[i][j])
229 trans[ni][nj] = arr[i][j] if len(arr[i][j]) else empty
230
231 return trans
232
233 _cw_coord_trans = lambda i, j, n, m: (j, n-1-i)
234 _ccw_coord_trans = lambda i, j, n, m: (m-1-j, i)
235 _180_coord_trans = lambda i, j, n, m: (n-1-i, m-1-j)
236 _meh_coord_trans = lambda i, j, n, m: (i, j)
237 _flip_horz_trans = lambda i, j, n, m: (i, m-1-j)
238 _flip_vert_trans = lambda i, j, n, m: (n-1-i, j)
239 _flip_both_trans = lambda i, j, n, m: (n-1-i, m-1-j)
240
241 def parse(text, col_delim=COL_DELIM, row_delim=ROW_DELIM):
242 """ Read a block of text and divide it according to the specified row and
243 column delimiters to form a 2D array (a list of lists of strings)."""
244
245 return [[c for c in r.split(col_delim)] for r in text.split(row_delim)]
246
247 def tostr(arr, col_delim=COL_DELIM, row_delim=ROW_DELIM, tp = PRINT_TPL):
248 """ Create a string representation of a given 2D array using the specified
249 delimiters to separate rows and columns.
250
251 A template may be specified accoring to python string formating utilities
252 for all the cells to use. For instance, using tp='"%20s"' will create an
253 output string where each cell uses a minimum of 20 characters and is
254 surrounded by double quotes."""
255
256 return row_delim.join(map(lambda c: col_delim.join([tp%f for f in c]), arr))
257
258 _C_CW, _C_CCW, _C_180, _C_VERT, _C_HORZ, _C_BOTH, _C_MEH = range(6) + [None]
259
260 _OPERATIONS = {
261 _C_CW: rotate_clockwise,
262 _C_CCW: rotate_counterclockwise,
263 _C_180: rotate_180_degrees,
264 _C_HORZ: flip_horizontal,
265 _C_VERT: flip_vertical,
266 _C_BOTH: flip_horizontal_and_vertical,
267 _C_MEH: rotate_not_at_all,
268 }
269
270 if __name__ == '__main__':
271 from optparse import OptionParser, OptionGroup
272 from sys import argv
273 from os.path import basename
274
275 # Prepare the parser.
276 usage = '%s [OPTIONS]' % basename(argv[0])
277 parser = OptionParser(usage=usage)
278
279 # Prepare all the parse options that have to do with rotating the array.
280 rotate = OptionGroup(parser, "Rotating")
281 rotate.add_option('-c', '--clockwise', \
282 action="append_const", dest="operations", const=_C_CW, \
283 help='rotate the array clockwise (right).')
284 rotate.add_option('-C', '--counter-clockwise', \
285 action="append_const", dest="operations", const=_C_CCW, \
286 help='rotate the array counter-clockwise (left).')
287 rotate.add_option('-r', '--rotate-180', \
288 action="append_const", dest="operations", const=_C_180, \
289 help=u'rotate the array 180° - make it upside-down.')
290 rotate.add_option('-m', '--meh', '--do-nothing', \
291 action="append_const", dest="operations", const=_C_MEH, \
292 help=u'rotate the array 0° - not at all.')
293 parser.add_option_group(rotate)
294
295 # Prepare all the parse options that have to do with flipping the array.
296 flip = OptionGroup(parser, "Flipping (mirroring)")
297 flip.add_option('-f', '--horizontal-flip', \
298 action="append_const", dest="operations", const=_C_HORZ, \
299 help='flip the array horizontally.')
300 flip.add_option('-v', '--vertical-flip', \
301 action="append_const", dest="operations", const=_C_VERT, \
302 help='flip the array vertically.')
303 flip.add_option('-b', '--horizontal-and-vertical-flip', \
304 action="append_const", dest="operations", const=_C_BOTH, \
305 help='flip the array both vertically and horizontally.')
306 parser.add_option_group(flip)
307
308 # Prepare all the parse options that have to do with delimiters used for
309 # splitting the array on input and joining it back together on output.
310 delims = OptionGroup(parser, "Delimiters")
311 delims.add_option('-d', '--field-delimiter', \
312 metavar='DELIM', dest="col_delim", default=COL_DELIM, \
313 help='specify a field (column) delimiter instead of "%s".' % COL_DELIM)
314 delims.add_option('-D', '--row-delimiter', \
315 metavar='DELIM', dest="row_delim", default=ROW_DELIM, \
316 help='specify a row delimiter instead of "%s".' % ROW_DELIM)
317 delims.add_option('-o', '--output-field-delimiter', \
318 metavar='DELIM', dest="out_col_delim", default=O_COL_DELIM, \
319 help='specify an output field delimiter instead of "%s".' % O_COL_DELIM)
320 delims.add_option('-O', '--output-row-delimiter', \
321 metavar='DELIM', dest="out_row_delim", default=O_ROW_DELIM, \
322 help='specify an output row delimiter instead of "%s".' % O_ROW_DELIM)
323 delims.add_option('-S', '--output-delimiters-same', \
324 dest="same_delims", default=False, action="store_true", \
325 help='specify output delimiters as the same as input delimiters.')
326 parser.add_option_group(delims)
327
328 # Assorted options, that have something vaguely to do with printing.
329 printing = OptionGroup(parser, "Printing")
330 printing.add_option('-s', '--print-template', \
331 metavar='TEMPLATE', dest="print_tpl", default=PRINT_TPL, \
332 help='set a sprintf-like template for cells, default: "%s"' % PRINT_TPL)
333 printing.add_option('-e', '--empty-field-symbol', \
334 metavar='STRING', dest="empty_sym", default=EMPTY_SYM, \
335 help='set a string to use for empty cells instead of "%s".' % EMPTY_SYM)
336 printing.add_option('-n', '--no-newline', \
337 action="store_false", dest="newline", default=True, \
338 help="do not append new line at the end of output.")
339 parser.add_option_group(printing)
340
341 opts, args = parser.parse_args()
342
343 from sys import stdin, stdout
344
345 # Read in an array from standard input in string form and normalize new
346 # lines.
347 input = '\n'.join(map(lambda s: s.strip('\n'), stdin))
348
349 # Convert the string into an array, apply a transformation and the convert
350 # the new array back into a string.
351 arr = parse(input, opts.col_delim, opts.row_delim)
352 if opts.same_delims:
353 opts.out_row_delim = opts.row_delim
354 opts.out_col_delim = opts.col_delim
355 if not opts.operations:
356 opts.operations = [None]
357 for operation in opts.operations:
358 arr = _OPERATIONS[operation](arr, opts.empty_sym)
359 str = tostr(arr, opts.out_col_delim, opts.out_row_delim, opts.print_tpl)
360
361 # Write out the new array to standard output, possibly adding an extra line
362 # at the end so it doesn't get glued to $PS1 (I hate when that happens).
363 stdout.write(str + '\n' if opts.newline else str)
364


The code is also available at GitHub as python/tabrot.py.