Posts Tagged ‘find’

Whitespaces in filenames and how to get rid of them

Sunday, September 26th, 2010

Although it has been more than four years since I switched from Windows to GNU/Linux, I still manage to stumble upon files, either being brought back from backups, or downloaded from the net, that contain spaces, and need to be handled.

Since I got the hang of shell scripting I have stopped dreading having to rename these files manually (which was my previous m.o. for that scenario).

Imagine a file named “My super cool science report.pdf”. Now, for a single file it might be ok to just manually rename the sucker, either via your file manager of choice, or through  a quick (tab-complete supported) mv. Fair enough, but what if you have ten files?

This task, when being converted into a shell script, can first be broken into smaller tasks.

Step 1 is that we need some way of listing the files we wish to operate over. If they are all stored in directory separate from other files, and there are no sub-directories in that directory etc, one can simply use ls -1 (i.e. ell ess dash one)

Otherwise, find is a VERY useful tool.

$ find /path/to/document/directory -maxdepth 1 -type f -name '* *'

This simply says “in the specified path, look only in the current directory (i.e. don’t traverse downwards) for files with a name matching whatever followed by a space followed by whatever.

Now that we have a list of files to work with, comes step 2: iterating over the files.

This is what has tripped me up in the past. I’ve always tried constructs along the lines of for filename in `expression`, where expression is some lame attempt to list the files I want to work with. I could probably have gotten it to work, but it requires more patience that I was willing to muster ;)

Besides, while read filename; do command(s); done works immediately.

To transfer the list of files from find / ls we simply pipe it to the while loop:

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do ...; done

Had this been put in a script, instead of written on the command line, we would now have something looking a lot like this:

#!/bin/bash
find ./ -maxdepth 1 -type f -name '* *' | while read filename;
do
    ...
done

Step 3 then, is obviously about actually transforming the filename.

For simple substitutions like this, tr is a great tool, e.g.

$ echo "This is a test" | tr ' ' '_'
This_is_a_test

This simply takes stuff from stdin, replaces all spaces with underscores, and pushes it to stdout.

tr also has great functionality for altogether removing specified characters from the given string, e.g.

$ echo 'What?!' | tr -d '!'
What?

Finally, tr is a pretty cool guy, converts between cases and doesn’t afraid of anything:

$ echo "Soon this will be shouted" | tr 'a-z' 'A-Z'
SOON THIS WILL BE SHOUTED

Ok, enought about tr, but it is pretty cool, and quite enough for this task. So now we know how to list the files, iterate over them, and transform the filename from the original one, to a new, better one. Now what?

Now we need to save the transformed name into a temporary variable (since mv requires both a source path and a destination path) which is done with:

newfilename=$(echo "$filename" | tr ' ' '_')

One could also use backticks:

newfilename=`echo "$filename" | tr ' ' '_'`

But I am always wary of using this online as they tend to look a little bit too much like single quotes.

Now, since we are not stupid, we will of course test this script before unleashing it on our poor unsuspecting files. This is step 4, and it is the most important step!

So in our loop we do:

echo mv "$filename" "$newfilename"

Notice the echo. It is there for a reason. This script, when run, will only produce a lot of text, printed to stdout. This is the time the scripter would do well to pay attention. Does the resulting lines with “mv My fancy report 1.pdf My_fancy_report_1.pdf” look correct?

If it doesn’t, go back and tweak the line setting the newfilename variable until it looks correct.

Test script:

#!/bin/bash
find ./ -maxdepth 1 -type f -name '* *' | while read filename;
do
    newfilename=$(echo "$filename" | tr ' ' '_')
    echo mv "$filename" "$newfilename"
done

or

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do newfilename=$(echo "$filename" | tr ' ' '_'); echo mv "$filename" "$newfilename"; done

Otherwise, proceed to step 5: removal of echo.

Yeah, that’s really all. That little echo in front of  mv “$filename” “$newfilename”… remove that, and the script will be unleashed on the listed files.

And the final script:

#!/bin/bash
find ./ -maxdepth 1 -type f -name '* *' | while read filename;
do
    newfilename=$(echo "$filename" | tr ' ' '_')
    mv "$filename" "$newfilename"
done

or, for the one-liner type of guy:

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do newfilename=$(echo "$filename" | tr ' ' '_'); mv "$filename" "$newfilename"; done

Finally, if you want moar power you could either pipe together several tr after one another, or tr other stuff, like sed…

Your imagination, understanding of pipes, and knowledge of regular expressions is the limit ;)

Easy permission sanitizing using chmod

Thursday, June 3rd, 2010

Let’s say you have a web app, such as WordPress, and you have installed it on your own server. You are of course security conscious, so you wish to have the permissions set up correctly, no exceptions. this usually means 755 (rwxr-xr-x) for directories and 644 (rw-r–r–) for files.

The way I used to solve this, on every server I worked, I set up a small shell script (sanitize-perms.sh) along the lines of:

#!/bin/sh
TARGET=$1
find $TARGET -type f | xargs chmod 0644
find $TARGET -type d | xargs chmod 0755

This worked well, with one huge caveat: What if you, somewhere in that directory structure had a file which needed to be executable?

I don’t know if such a case exists in WordPress, I’ve used that script on a couple of WP installations without any noticeable side-effects, but it’s obviously a flawed approach.

I’ll side-track this post a bit, since it is relevant to the overall post, that I, through identi.ca, stumbled upon this blog post (which is awesome by the way, go read it!) about why LaTeX is so cool, and why it can be useful writing your résumé using it.

Just by chance I continued into Dan’s code section, and long story short, I found some cool stuff in his .bashrc file. Most notably this little beauty:

# sanitize - set file/directory owner and permissions to normal values (644/755)
# Usage: sanitize <file>
sanitize() {
	chmod -R u=rwX,go=rX "$@"
	chown -R ${USER}.users "$@"
}

I personally, for some reason, have always tended more to the octal representation than the [ugo][+-=][rwx] syntax, but that single chmod line is so outstandingly brilliant that I am almost forced to switch.

In one fell swoop Dan’s command does what I need two commands (really, with the xargs and I suppose one new process per found file/directory to execute chmod, my script needs a lot of processes) to accomplish.

The magic happens in that capital X, which is defined in the chmod man-file as: “execute/search only  if  the file is a directory or already has execute permission for some user”.

Directories automatically receives the executable flag, and any file which already has it, maintains it. Bloody brilliant!

Many thanks to Dan for sharing his configuration files, one of these days I’ll have to follow his good example.