Posts Tagged ‘awk’

2012w01

Sunday, January 8th, 2012

column

The other day I wanted some prettier (tabularized) output and of course someone has already wanted this and of course there are tools for that :)

bash_completion

This is so frakking cool! I’ve built this little shellscript “vault.sh” which is a simple wrapper script for mounting and unmounting encfs mounts.

It takes two parameters: operation and target, where operation can be one of “lock” and “unlock”, and target—at present—resolves to “thunderbird” (signifying my .thunderbird directory).

Since I intend to expand this with more encrypted directories as I see fit, I don’t want to hard-code that.

What I did want, however, was to be able to auto complete operation and target. So I looked around, and found this post, and although I couldn’t derive enough knowledge from it to solve my particular problem, having multiple levels of completion, the author was gracious enough to provide references to where s/he had found the knowledge (here, here and here). That second link was what did it for me.

My /etc/bash_completion.d/vault.sh now looks like this:

_vault()
{
    local cur prev opts
    COMPREPLY=()
    cur="${COMP_WORDS[COMP_CWORD]}"
    prev="${COMP_WORDS[COMP_CWORD-1]}"
    first="lock unlock"
    second="thunderbird"

    if [[ ${cur} == * && ${COMP_CWORD} -eq 2 ]] ; then
        COMPREPLY=( $(compgen -W "${second}" -- ${cur}) )
        return 0
    fi

    if [[ ${cur} == * && ${COMP_CWORD} -eq 1 ]] ; then
        COMPREPLY=( $(compgen -W "${first}" -- ${cur}) )
        return 0
    fi
}
complete -F _vault vault.sh

And all the magic is happening in the two if-statements. Essentially: if current word (presently half typed and tabbed) is whatever, and this is the second argument to the command, respond with suggestions taken from the variable $second.

Otherwise, if current word is whatever, and this is the first parameter, take suggestions from the variable $first.

Awsum!

awk for great good

Another great use for awk: viewing selected portions of source code. For instance, in Perl, if you just want to view a specific subroutine, without getting distracted by all the other crud, you could do: $ awk '/sub SomeSubName/,/}/' somePerlModule.pm

Links

If PHP were British, perhaps it’s just me, but I find it hilarious.

PayPal just keeps working their charm…

Belarus just… wait what?

Why we need version control

Preserving space, neat!

Fuzzy string matching in Python

If you aren’t embarrassed by v1.0 you didn’t release it early enough

The makers schedule, oldie but goldie

CSS Media Queries are pretty cool

Static site generator using the shell and awk

A netstat companion

Reducing code nesting

Comparing images using perceptual hashes

Microsofts GPS “avoid ghetto” routing algorithm patent…

My Software Stack 2011 edition

Saturday, December 31st, 2011

I realize that I haven’t written my customary “software stack” post for this year yet. But hey, from where I’m sitting, I still have … 36 minutes to spare ;)

I’ll be using the same categories as last year; system, communications, web, development, office suite, server, organization, and entertainment.

System

The OS of choice is still Archlinux, my window manager is still wmii, my terminal emulator is rxvt-unicode, upgraded by also installing urxvt-tabbedex.

My shell is still bash, my cron daemon is still fcron, and my network manager is wicd.

To this configuration I’ve added the terminal multiplexer tmux, and have lately found out just how useful mc can be. Oh, and qmv from the renameutils package is now a given part of the stack.

Communications

Not much change here, Thunderbird for email, Pidgin for instant messaging, irssi for IRC.

Heybuddy has been replaced by identicurse as my micro-blogging (identi.ca) client. Heybuddy is very nice, but I can use identicurse from the commandline, and it has vim-like bindings.

For Pidgin I use OTR to encrypt conversations. For Thunderbird I use the enigmail addon along with GnuPG.

This means that Thunderbird still hasn’t been replaced by the “mutt-stack” (mutt, msmtp, offlineimap and mairix) and this is mostly due to me not having the energy to learn how to configure mutt.

I also considered trying to replace Pidgin with irssi and bitlbee but Pidgin + OTR works so well, and I have no idea about how well OTR works with bitlbee/irssi (well, actually, I’ve found irssi + OTR to be flaky at best.

Web

Not much changed here either, Firefox dominates, and I haven’t looked further into uzbl although that is still on the TODO list, for some day.

I do some times also use w3m, elinks, wget, curl and perl-libwww.

My Firefox is customized with NoScript, RequestPolicy, some other stuff, and Pentadactyl.

Privoxy is nowadays also part of the loadout, to filter out ads and other undesirable web “resources”.

Development

In this category there has actually been some changes:

  • gvim has been completely dropped
  • eclipse has been dropped, using vim instead
  • mercurial has been replaced by git

Thanks in no small part to my job, I have gotten more intimate knowledge of awk and expect, as well as beginning to learn Perl.

I still do some Python hacking, a whole lot of shell scripting, and for many of these hacks, SQLite is a faithful companion.

Doh! I completely forgot that I’ve been dabbling around with Erlang as well, and that mscgen has been immensely helpful in helping me visualize communication paths between various modules.

“Office suite”

I still use LaTeX for PDF creation (sorry hook, still haven’t gotten around to checking out ConTeXt), I haven’t really used sc at all, it was just too hard to learn the controls, and I had too few spreadsheets in need of creating. I use qalculate almost on a weekly basis, but for shell scripts I’ve started using bc instead.

A potential replacement for sc could be teapot, but again, I usually don’t create spreadsheets…

Server

Since I’ve dropped mercurial, and since the mercurial-server package suddenly stopped working after a system update, I couldn’t be bothered to fix it, and it is now dropped.

screen and irssi is of course always a winning combination.

nginx and uwsgi has not been used to any extent, I haven’t tried setting up a VPN service, but I have a couple of ideas for the coming year (mumble, some VPN service, some nginx + Python/Perl thingies, bitlbee) and maybe replace the Ubuntu installation with Debian.

Organization

I still use both vimwiki and vim outliner, and my Important Dates Notifier script.

Still no TaskJuggler, and I haven’t gotten much use out of abook.

remind has completely replaced when, while I haven’t gotten any use what so ever out of wyrd.

Entertainment

For consuming stuff I use evince (PDF), mplayer (video), while for music, moc has had to step down from the throne, to leave place for mpd and ncmpcpp.

eog along with gthumb (replacing geeqie) handles viewing images.

For manipulation/creation needs I use LaTeX, or possibly Scribus, ffmpeg, audacity, imagemagick, inkscape, and gimp.

Bonus: Security

I thought I’d add another category, security, since I finally have something worthwhile to report here.

I’ve begun encrypting selected parts of my hard drive (mostly my email directory) using EncFS, and I use my passtore script for password management.

And sometimes (this was mostly relevant for when debugging passtore after having begun actively using it) when I have a sensitive file which I for a session need to store on the hard drive, in clear text, I use quixand to create an encrypted directory with a session key only stored in RAM. So once the session has ended, there is little chance of retrieving the key and decrypting the encrypted directory.

Ending notes

That’s about it. Some new stuff, mostly old stuff, only a few things getting kicked off the list. My stack is pretty stable for now. I wonder what cool stuff I will find in 2012 :D

:wq

awk, filtering and counting

Monday, December 5th, 2011

Suppose that you have a file containing some structured data, something perhaps along the lines of this, highly fictive but yet remarkably common, syntax:

<id><separator><somestring><separator><integer>

Now, let’s say that there were 99999 lines of this to go through, and the file is unsorted, and you wanted to find all the lines where SOMESTRING is foo, and then sum up the INTEGER field of those lines.

I almost had this problem at work, except my file probably didn’t contain more than a hundred or so lines.

For this I wrote a Perl script, which worked well, with the small inconvenience that I’d have to move that script onto each system where I’d want to use it.

Pontus, never the one to berate anyones efforts, but still finding room for improvements, both in the fact that my approach, the script, carried that inconvenience, and that is was very verbose when compared to the solution he ultimately suggested, he showed me a better way, the awk way.

$ awk -F<separatorGoesHere> 'BEGIN { SUM = 0 } /<someStringGoesHERE/ { SUM += $3 } END { print SUM }' <fileToBeParsedGoesHere>

I said before that my real file, at work, was small, so awk crunched through it at lightning speed. I also suggested a file containing 99.999 lines, and I did that to prove a point, namely:

Using this script:

#!/usr/bin/env python2

import random

filename = "awk.example.txt"
index = 0
iterations = 100000
choices = ['foo', 'bar', 'baz']
fh = open(filename, 'w')

for index in range(1, iterations):
    fh.write("%d, %s, %d\n" % (index,
                               random.choice(choices),
                               random.randint(0, 100)))
fh.close()

I generated a file (~1.5Mb) with a couple of lines ;) and let awk loose on it:

$ time awk -F, 'BEGIN { SUM = 0 } /foo/ { SUM += $3 } END { print SUM }' awk.example.txt

Which on my netbook took 0.241 seconds to complete.

real	0m0.241s
user	0m0.237s
sys	0m0.000s

Or in other words: awk if pretty frakking fast!

Now, let’s break it down:

awk

obviously, is the command, and it rocks, ‘nuf said.

-F,

means “change the field separator (from whitespace) to commas”

And then it gets tricky, but not as tricky as at least I was lead to believe.

There are two single-quotes, and between these we place all the things we want awk to do for us.

One good thing to note is that the syntax for awk is quite simple, something I didn’t grasp at first. It goes like this:

<somePattern> { <someAction> }

And that’s it. You can chain several <pattern>{<actions>} after each other.

In my, well Pontus’, command above, there are three such pairs:

BEGIN { SUM = 0 }

which is just another way of saying “before we start executing, create a variable SUM and set its value to 0″

/foo/ { SUM += $3 }

If you’re familiar with regular expressions you might have stumbled upon the pattern in which you enclose an expression between two slashes, and that pattern is used to search (or match) contents of lines or files. That’s what we’re doing here. So we’re basically saying “find lines containing foo, and from these lines extract column number three ($3), and increment the variable SUM by the value stored in column three.”

If instead, you’d wanted to count all the lines containing foo, SUM += 1 would have done that job.

Finally:

END { print SUM }

which should be pretty obvious: “When all is said and done, print whatever is stored in the variable SUM”

And last but not least, outside the single-quotes, we give awk the name of the file we wish it to process.

This is just a fantastic tool which I regret not having taking the time to learn the basics of earlier. Thank you Pontus for making me see the light (again) ;)

:wq

2011w48

Sunday, December 4th, 2011

Where the frakk did this week go?!?!

Work has been progressing, I can’t say that I am good at it yet, but I am better than I was just last week, which is thoroughly encouraging :)

Pontus made me realize that knowing sed is not enough, for some things you really need awk. Another thing to push to the toLearn stack…

I’ve been doing some more Perl hackery, but nothing worth showing, but I did however come across a site which I believe to be rather good and helpful regarding learning basic things about Perl.

Something which passed me by completely was that this Monday saw the release of YaCy 1.0 (in German), but as you can see on Alessandro’s blog I might have been just about the only one who didn’t get that particular news item. Congratulations on making version 1.0 guys!

I was also toying with the idea the other day of making quarterly summaries as well. One blog post a week is great as it forces me to write, thus improving my writing, but it doesn’t really do anything for discerning trends, or changes in the way I work. This could be interesting :)

Finally, I should really start planning for writing my yearly “technology stack” post by diffing what I used back then and what I’m using now.

I am already certain that I’ve disappointed myself in some aspects, and surprised myself in others…

:wq

Awk hackery

Sunday, May 30th, 2010

I’ve always leaned more towards sed than awk, since I’ve always gotten the impression that they have more or less the same capabilities, just with different syntaxes.

But the more command line parsing I do, the more I’ve begun to realize that there are certain things I find easier to do in awk, while some things are easier with sed.

One of these things, that I find awk a better tool for, is getting specific columns of data from structured files (most often, but not limited to, logs).

I have for some time known about

cat somefile | awk '{ print $1 }'

Which will output the first column of every line from somefile. A couple of weeks ago, I needed to fetch two columns from a file (I can’t remember now what the file or task was, I’ll substitute with a poor example instead)

ls -l | awk '{ print $1, $8 }'

This will give you the permissions and names of all directories and files in pwd. One could of course switch places of $1 and $8 (i.e. print $8, $1) to get names first and then the permissions.

Recently I found myself needing to find all the commands executed from a crontab (part of a script, to create another script which was to verify that a migration had gone right, by allowing me to execute those commands whenever I wanted, and not just whenever the crontab specified)

Luckily for me, and this blogpost ;) , none of those commands were executed with parameters, and since I am too lazy to actually count how many fields there are in a crontab file, I got to use:

crontab -l | grep '^*0-9' | awk '{ print $(NF) }'

Which lists the content of the present users crontab, finds all lines which either begin with a number or an asterisk, and then prints the last column of that line. Magic!