Thursday, February 26, 2009

Flatten

Update: Applied changes in accordance with a comment. [Fri Aug 28, 8:46 AM CEST]

And then there are these times when you have a music album or something that's split into 2 CDs, or a bunch of PDFs in very deep directories.

Sometimes you'd like to move them to one directory, for instance, because your crappy car audio player can only play music files from its root directory (because its designer had a problem with their brains being missing).

And that's exactly what this script does! Hurray!

I actually wrote about writing this one in chapter 10 of my fabulous book (NaNoWriMo)! I was struggling with the problem then, and then finally did it all in a very roundabout way. I wasn't exactly happy with the solution... and today I did it in 1 line because of inspiration (and probably sugar). So I split it up into more lines for readability and here it is!

The script itself has one brilliant bit, which I didn't invent. I got it from the Internet somewhere, probably from this guy, but I'm not certain... Anywho, the thing you see at line 42 is pure brilliance and deals neatly with the problem of spaces in filenames.

There are a lot of things that can be tweaked in this script, like filtering out filenames (add a rule at line 44) or ask if files should be overwritten (add a -i flag at line 47). However, I'm not going to do any of those things. I like this script the way it is. and if you don't... well, it's CC, init?

Here's the code:
 
1 #!/bin/bash
2 #
3 # Flatten
4 #
5 # Copies all the files from a given directory tree to one directory.
6 # For instance, if you've got a directory tree like this one:
7 # moo/cow.txt
8 # other/cats/cheezburger.txt
9 # other/cats/evil.txt
10 # other/ferrets/evil.txt
11 # The result of running `flatten . out` will be:
12 # out/other_cats_cheezburger.txt
13 # out/other_cats_evil.txt
14 # out/other_ferrets_evil.txt
15 # out/moo_cow.txt
16 # Parameters:
17 # source - which tree to copy, default: here,
18 # target - where to dump the files, default: here.
19 # Warning:
20 # Some names might overlap! Check if everything copied propery
21 # before removing anything.
22 # Author:
23 # Konrad Siek
24
25 # Set source.
26 if [ "$1" == "" ]
27 then
28 source="."
29 else
30 source="$1"
31 fi
32
33 # Set target.
34 if [ "$2" == "" ]
35 then
36 target="."
37 else
38 target="$2"
39 fi
40
41 # Copy all files.
42 find -L "$source" -not -type d | while read i
43 do
44 e=${i#./}
45 cp "$i" "$target/${e//\//_}"
46 done


The code is also available at GitHub as bash/flatten.

2 comments:

jay@thecapacity said...

I think you could optimize it by adding -not -type f to your find command so you don't have to skip the directory test.

Still damn useful and I'm still scratching my head over line 47! :)

Kondziu said...

Thanks, that's a very good point! Using find would definitely be more elegant and efficient. I'll update the script sometime soon.

As for line 47, I'm guessing this may be a bit cryptic: ${e//\//_}"). This is string substitution, which takes the variable called e and substitutes all occurrences of \ for _.

If it were written as: ${e/\/_}" it would possibly look more familiar, but it would substitute only the first match, and not all matches.

If you're interested, you can find more info about substring replacement in bash, as well as other string manipulations here: http://tldp.org/LDP/abs/html/string-manipulation.html.