Posts Tagged ‘read’

Whitespaces in filenames and how to get rid of them

Sunday, September 26th, 2010

Although it has been more than four years since I switched from Windows to GNU/Linux, I still manage to stumble upon files, either being brought back from backups, or downloaded from the net, that contain spaces, and need to be handled.

Since I got the hang of shell scripting I have stopped dreading having to rename these files manually (which was my previous m.o. for that scenario).

Imagine a file named “My super cool science report.pdf”. Now, for a single file it might be ok to just manually rename the sucker, either via your file manager of choice, or through  a quick (tab-complete supported) mv. Fair enough, but what if you have ten files?

This task, when being converted into a shell script, can first be broken into smaller tasks.

Step 1 is that we need some way of listing the files we wish to operate over. If they are all stored in directory separate from other files, and there are no sub-directories in that directory etc, one can simply use ls -1 (i.e. ell ess dash one)

Otherwise, find is a VERY useful tool.

$ find /path/to/document/directory -maxdepth 1 -type f -name '* *'

This simply says “in the specified path, look only in the current directory (i.e. don’t traverse downwards) for files with a name matching whatever followed by a space followed by whatever.

Now that we have a list of files to work with, comes step 2: iterating over the files.

This is what has tripped me up in the past. I’ve always tried constructs along the lines of for filename in `expression`, where expression is some lame attempt to list the files I want to work with. I could probably have gotten it to work, but it requires more patience that I was willing to muster ;)

Besides, while read filename; do command(s); done works immediately.

To transfer the list of files from find / ls we simply pipe it to the while loop:

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do ...; done

Had this been put in a script, instead of written on the command line, we would now have something looking a lot like this:

find ./ -maxdepth 1 -type f -name '* *' | while read filename;

Step 3 then, is obviously about actually transforming the filename.

For simple substitutions like this, tr is a great tool, e.g.

$ echo "This is a test" | tr ' ' '_'

This simply takes stuff from stdin, replaces all spaces with underscores, and pushes it to stdout.

tr also has great functionality for altogether removing specified characters from the given string, e.g.

$ echo 'What?!' | tr -d '!'

Finally, tr is a pretty cool guy, converts between cases and doesn’t afraid of anything:

$ echo "Soon this will be shouted" | tr 'a-z' 'A-Z'

Ok, enought about tr, but it is pretty cool, and quite enough for this task. So now we know how to list the files, iterate over them, and transform the filename from the original one, to a new, better one. Now what?

Now we need to save the transformed name into a temporary variable (since mv requires both a source path and a destination path) which is done with:

newfilename=$(echo "$filename" | tr ' ' '_')

One could also use backticks:

newfilename=`echo "$filename" | tr ' ' '_'`

But I am always wary of using this online as they tend to look a little bit too much like single quotes.

Now, since we are not stupid, we will of course test this script before unleashing it on our poor unsuspecting files. This is step 4, and it is the most important step!

So in our loop we do:

echo mv "$filename" "$newfilename"

Notice the echo. It is there for a reason. This script, when run, will only produce a lot of text, printed to stdout. This is the time the scripter would do well to pay attention. Does the resulting lines with “mv My fancy report 1.pdf My_fancy_report_1.pdf” look correct?

If it doesn’t, go back and tweak the line setting the newfilename variable until it looks correct.

Test script:

find ./ -maxdepth 1 -type f -name '* *' | while read filename;
    newfilename=$(echo "$filename" | tr ' ' '_')
    echo mv "$filename" "$newfilename"


$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do newfilename=$(echo "$filename" | tr ' ' '_'); echo mv "$filename" "$newfilename"; done

Otherwise, proceed to step 5: removal of echo.

Yeah, that’s really all. That little echo in front of  mv “$filename” “$newfilename”… remove that, and the script will be unleashed on the listed files.

And the final script:

find ./ -maxdepth 1 -type f -name '* *' | while read filename;
    newfilename=$(echo "$filename" | tr ' ' '_')
    mv "$filename" "$newfilename"

or, for the one-liner type of guy:

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do newfilename=$(echo "$filename" | tr ' ' '_'); mv "$filename" "$newfilename"; done

Finally, if you want moar power you could either pipe together several tr after one another, or tr other stuff, like sed…

Your imagination, understanding of pipes, and knowledge of regular expressions is the limit ;)