Monday, April 13, 2009

Convert file encodings

Here's to a new bit of code!

I wrote this program once, called Tester, which my dad is using in his day-to-day language teaching.

I wrote the thing quite a long time ago, so it's a bit buggy here and there, and one of the problems is that it doesn't handle encodings well. Since dad is thinking about using Ubuntu, the mishandled encodings are a problem: he used to have them encoded as WINDOWS-1250, but the current settings allow him to use only UTF-8 on Ubuntu.

Hence, this program, to convert between the two. Since the files come in bulk, I figured doing entire directories at once is a good idea.

Also, I wrote it in about an hour and a bit, so it might be a bit buggy here and there...

Here's the code:
 
1 #!/bin/bash
2
3 # Convert directory
4 #
5 # Convert text files in the directory from one encoding to another,
6 # all in a simple GUI. Converts between UTF-8 and WINDOWS-1250.
7 #
8 # Requires:
9 # zenity
10 # iconv
11 # Author:
12 # Konrad Siek
13
14
15 # Get directory to convert files in
16 directory=$(\
17 zenity \
18 --file-selection \
19 --directory \
20 --title="Select a directory to convert" \
21 )
22
23 # If none selected, quit.
24 if [ $? != 0 ]
25 then
26 exit 1
27 fi
28
29 # Select source encoding from a list.
30 source_encoding=$(\
31 zenity --list \
32 --column="Encoding" \
33 --title="Source encoding" \
34 --text="The files are currently encoded as... " \
35 WINDOWS-1250 UTF-8 \
36 )
37
38 # If none selected, quit.
39 if [ $? != 0 ]
40 then
41 exit 1
42 fi
43
44 # Select destination encoding from a list.
45 destination_encoding=$(\
46 zenity --list \
47 --column="Encoding" \
48 --title="Destination encoding" \
49 --text="And you want these files encoded as... " \
50 UTF-8 WINDOWS-1250 \
51 )
52
53 # If none selected, quit.
54 if [ $? != 0 ]
55 then
56 exit 1
57 fi
58
59 # For all files in the selected directory...
60 find "$directory" -type f | while read f
61 do
62 # Get information about the file.
63 extension=${f#*.}
64 basename=$(basename "$f" ".$extension")
65 addition=$(echo "$destination_encoding" | tr -d - | tr [A-Z] [a-z])
66 output="$directory/$basename.$addition.$extension"
67
68 # Convert encoding.
69 iconv \
70 --from-code="$source_encoding" \
71 --to-code="$destination_encoding" \
72 --output="$output" \
73 "$f"
74
75 echo "Created $directory/$basename.$addition.$extension"
76 done
77
78 # Notify on finish
79 zenity --info --text="Operation complete." --title="Complete"
80


The code is also available at GitHub as bash/convert_directory.

No comments: