Posts Tagged ‘tr’

2012w25

Sunday, June 24th, 2012

Quite a while since I wrote a post now, I’ve not been sick or anything, but there has been a lot of work abound, and outside work I prioritized sleeping over writing. But now I’m back for the moment, so let’s get down to business :)

Since last time I’ve come up with new ways of abusing awk, such as having it find the highest value from a command outputting in the following syntax:

\t<characters, integers, fullstop>: <integer>\n

To make it a little more different, the command also spits out a header, as well as an additional newline after the end of output.

I just now, while writing this, came up with a different solution, which doesn’t use awk:

theCommand | grep -v '^[^ \t]\+' | tr -d ' ' | cut -d':' -f2 | sort -r | head -n 1

but what I ended up using was:

theCommand | awk 'BEGIN { highest = 0 } $0 ~ /^[ \t]/ { if ( $2 > highest ) { highest = $2 } } END { print highest }'

In this case, from what I can gather, awk is the more efficient solution. One process versus five.

Update: As Werner points out, the if statement isn’t really necessary (which also makes it possible to cut out the BEGIN statement as well):

theCommand | awk '/^[^ \t]/ && $2 > highest { highest = $2 } END { printf "%d\n", highest }'

Utilities

  • ditaa (a.k.a DIagrams Through Ascii Art) makes it easy to generate nice-looking diagram images from… rather nice-looking ASCII diagrams
  • docopt, a command-line interface description language, which also seems to support generating the parser for the CLI being described
  • Peity for generating different types of charts using jQuery and <canvas>
  • Ghost.py interacting with web pages, programmatically

As of late I have been thinking a great deal about backups and the project which seems the most interesting to me is Duplicity.

Random tech stuff

Other random not-so-techy stuff

What I pass for humour

:wq

Whitespaces in filenames and how to get rid of them

Sunday, September 26th, 2010

Although it has been more than four years since I switched from Windows to GNU/Linux, I still manage to stumble upon files, either being brought back from backups, or downloaded from the net, that contain spaces, and need to be handled.

Since I got the hang of shell scripting I have stopped dreading having to rename these files manually (which was my previous m.o. for that scenario).

Imagine a file named “My super cool science report.pdf”. Now, for a single file it might be ok to just manually rename the sucker, either via your file manager of choice, or through  a quick (tab-complete supported) mv. Fair enough, but what if you have ten files?

This task, when being converted into a shell script, can first be broken into smaller tasks.

Step 1 is that we need some way of listing the files we wish to operate over. If they are all stored in directory separate from other files, and there are no sub-directories in that directory etc, one can simply use ls -1 (i.e. ell ess dash one)

Otherwise, find is a VERY useful tool.

$ find /path/to/document/directory -maxdepth 1 -type f -name '* *'

This simply says “in the specified path, look only in the current directory (i.e. don’t traverse downwards) for files with a name matching whatever followed by a space followed by whatever.

Now that we have a list of files to work with, comes step 2: iterating over the files.

This is what has tripped me up in the past. I’ve always tried constructs along the lines of for filename in `expression`, where expression is some lame attempt to list the files I want to work with. I could probably have gotten it to work, but it requires more patience that I was willing to muster ;)

Besides, while read filename; do command(s); done works immediately.

To transfer the list of files from find / ls we simply pipe it to the while loop:

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do ...; done

Had this been put in a script, instead of written on the command line, we would now have something looking a lot like this:

#!/bin/bash
find ./ -maxdepth 1 -type f -name '* *' | while read filename;
do
    ...
done

Step 3 then, is obviously about actually transforming the filename.

For simple substitutions like this, tr is a great tool, e.g.

$ echo "This is a test" | tr ' ' '_'
This_is_a_test

This simply takes stuff from stdin, replaces all spaces with underscores, and pushes it to stdout.

tr also has great functionality for altogether removing specified characters from the given string, e.g.

$ echo 'What?!' | tr -d '!'
What?

Finally, tr is a pretty cool guy, converts between cases and doesn’t afraid of anything:

$ echo "Soon this will be shouted" | tr 'a-z' 'A-Z'
SOON THIS WILL BE SHOUTED

Ok, enought about tr, but it is pretty cool, and quite enough for this task. So now we know how to list the files, iterate over them, and transform the filename from the original one, to a new, better one. Now what?

Now we need to save the transformed name into a temporary variable (since mv requires both a source path and a destination path) which is done with:

newfilename=$(echo "$filename" | tr ' ' '_')

One could also use backticks:

newfilename=`echo "$filename" | tr ' ' '_'`

But I am always wary of using this online as they tend to look a little bit too much like single quotes.

Now, since we are not stupid, we will of course test this script before unleashing it on our poor unsuspecting files. This is step 4, and it is the most important step!

So in our loop we do:

echo mv "$filename" "$newfilename"

Notice the echo. It is there for a reason. This script, when run, will only produce a lot of text, printed to stdout. This is the time the scripter would do well to pay attention. Does the resulting lines with “mv My fancy report 1.pdf My_fancy_report_1.pdf” look correct?

If it doesn’t, go back and tweak the line setting the newfilename variable until it looks correct.

Test script:

#!/bin/bash
find ./ -maxdepth 1 -type f -name '* *' | while read filename;
do
    newfilename=$(echo "$filename" | tr ' ' '_')
    echo mv "$filename" "$newfilename"
done

or

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do newfilename=$(echo "$filename" | tr ' ' '_'); echo mv "$filename" "$newfilename"; done

Otherwise, proceed to step 5: removal of echo.

Yeah, that’s really all. That little echo in front of  mv “$filename” “$newfilename”… remove that, and the script will be unleashed on the listed files.

And the final script:

#!/bin/bash
find ./ -maxdepth 1 -type f -name '* *' | while read filename;
do
    newfilename=$(echo "$filename" | tr ' ' '_')
    mv "$filename" "$newfilename"
done

or, for the one-liner type of guy:

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do newfilename=$(echo "$filename" | tr ' ' '_'); mv "$filename" "$newfilename"; done

Finally, if you want moar power you could either pipe together several tr after one another, or tr other stuff, like sed…

Your imagination, understanding of pipes, and knowledge of regular expressions is the limit ;)