Whitespaces in filenames and how to get rid of them

Although it has been more than four years since I switched from Windows to GNU/Linux, I still manage to stumble upon files, either being brought back from backups, or downloaded from the net, that contain spaces, and need to be handled.

Since I got the hang of shell scripting I have stopped dreading having to rename these files manually (which was my previous m.o. for that scenario).

Imagine a file named “My super cool science report.pdf”. Now, for a single file it might be ok to just manually rename the sucker, either via your file manager of choice, or through  a quick (tab-complete supported) mv. Fair enough, but what if you have ten files?

This task, when being converted into a shell script, can first be broken into smaller tasks.

Step 1 is that we need some way of listing the files we wish to operate over. If they are all stored in directory separate from other files, and there are no sub-directories in that directory etc, one can simply use ls -1 (i.e. ell ess dash one)

Otherwise, find is a VERY useful tool.

$ find /path/to/document/directory -maxdepth 1 -type f -name '* *'

This simply says “in the specified path, look only in the current directory (i.e. don’t traverse downwards) for files with a name matching whatever followed by a space followed by whatever.

Now that we have a list of files to work with, comes step 2: iterating over the files.

This is what has tripped me up in the past. I’ve always tried constructs along the lines of for filename in `expression`, where expression is some lame attempt to list the files I want to work with. I could probably have gotten it to work, but it requires more patience that I was willing to muster ;)

Besides, while read filename; do command(s); done works immediately.

To transfer the list of files from find / ls we simply pipe it to the while loop:

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do ...; done

Had this been put in a script, instead of written on the command line, we would now have something looking a lot like this:

#!/bin/bash
find ./ -maxdepth 1 -type f -name '* *' | while read filename;
do
    ...
done

Step 3 then, is obviously about actually transforming the filename.

For simple substitutions like this, tr is a great tool, e.g.

$ echo "This is a test" | tr ' ' '_'
This_is_a_test

This simply takes stuff from stdin, replaces all spaces with underscores, and pushes it to stdout.

tr also has great functionality for altogether removing specified characters from the given string, e.g.

$ echo 'What?!' | tr -d '!'
What?

Finally, tr is a pretty cool guy, converts between cases and doesn’t afraid of anything:

$ echo "Soon this will be shouted" | tr 'a-z' 'A-Z'
SOON THIS WILL BE SHOUTED

Ok, enought about tr, but it is pretty cool, and quite enough for this task. So now we know how to list the files, iterate over them, and transform the filename from the original one, to a new, better one. Now what?

Now we need to save the transformed name into a temporary variable (since mv requires both a source path and a destination path) which is done with:

newfilename=$(echo "$filename" | tr ' ' '_')

One could also use backticks:

newfilename=`echo "$filename" | tr ' ' '_'`

But I am always wary of using this online as they tend to look a little bit too much like single quotes.

Now, since we are not stupid, we will of course test this script before unleashing it on our poor unsuspecting files. This is step 4, and it is the most important step!

So in our loop we do:

echo mv "$filename" "$newfilename"

Notice the echo. It is there for a reason. This script, when run, will only produce a lot of text, printed to stdout. This is the time the scripter would do well to pay attention. Does the resulting lines with “mv My fancy report 1.pdf My_fancy_report_1.pdf” look correct?

If it doesn’t, go back and tweak the line setting the newfilename variable until it looks correct.

Test script:

#!/bin/bash
find ./ -maxdepth 1 -type f -name '* *' | while read filename;
do
    newfilename=$(echo "$filename" | tr ' ' '_')
    echo mv "$filename" "$newfilename"
done

or

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do newfilename=$(echo "$filename" | tr ' ' '_'); echo mv "$filename" "$newfilename"; done

Otherwise, proceed to step 5: removal of echo.

Yeah, that’s really all. That little echo in front of  mv “$filename” “$newfilename”… remove that, and the script will be unleashed on the listed files.

And the final script:

#!/bin/bash
find ./ -maxdepth 1 -type f -name '* *' | while read filename;
do
    newfilename=$(echo "$filename" | tr ' ' '_')
    mv "$filename" "$newfilename"
done

or, for the one-liner type of guy:

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do newfilename=$(echo "$filename" | tr ' ' '_'); mv "$filename" "$newfilename"; done

Finally, if you want moar power you could either pipe together several tr after one another, or tr other stuff, like sed…

Your imagination, understanding of pipes, and knowledge of regular expressions is the limit ;)

Tags: , , , , , , ,

2 Responses to “Whitespaces in filenames and how to get rid of them”

  1. Ewe says:

    Why???

    Shouldn’t the filename represent the contents as it should be, with spaces.
    I have not yet seen a report spelled out with _ instead of spaces.

    Sure there is some usability problems when using the command line and writing scripts but thats where the problem lies then rather than naming the files.

    Still your article could easily be transformed into how to remove those annoying _ in filenames and replace them with their correct representation, the space.

  2. Patrik says:

    I obviously disagree with your point of view ;)

    The contents of the file should obviously not have spaces replaced by underscores, but for filenames I really don’t see the problem. Whatever makes it more comfortable for me to do my work, the better, right?

    There could perhaps exist readability issues for dyslexic people, and for that reason one might wish to show some respect by making it easier for them, but I don’t have any information that either would be more or less readable. (The thought never occurred to me until answering this comment) ;)

    Anyway, had these usability problems been only with the command line… well I’d probably still done it. But the way I see it, it isn’t only an issue there.

    http://example.org/my_fancy_report.pdf is superior (from my p.o.v) to http://example.org/my%20fancy%20report.pdf

    I don’t think we’ll see eye to eye on this, but that matters little, you could use the script to undo my silliness and I’ll use it to undo yours ;D
    Thank you for the comment nonetheless :)