Posts Tagged ‘tail’

AWK hackery++

Thursday, September 13th, 2012

While I cannot take credit for the solution which follows (the real star is Pontus) what I can do is document it for posterity:

The scenario is as follows: You have a couple of log files, foo.X.log for X in [1,2,..,N-1,N] and you want to watch all of these for a certain pattern, P.

And all of these matching results you want to write to a condensed log, in the format <originatingLogFile>: <lineThatMatchedPattern>

You have figured out that tail -f foo.*.log will yield you all the lines, and separate output from the various files using ==> foo.X.log <==

You might have tried with grep, to get the correct lines, and perhaps even got it to output to the file using --line-buffered but you are not out of the woods, that pesky filename at the beginning of the line is still nowhere to be seen.

grep? where we’re going, we don’t need grep

Every time that tail outputs a line from one of the other files than the last one indicated by ==> ... <== it outputs that pattern again on the line before the output.

E.g.:

$ tail -f foo.*.log
==> foo.1.log <==

==> foo.2.log <==

==> foo.3.log <==

Now, from another terminal, while the tail is still following all three of those logs, let us insert some content:

$ echo "PATTERN, some more stuff" >> foo.2.log

Which results in the first terminal now displaying:

$ tail -f foo.*.log
==> foo.1.log <==

==> foo.2.log <== 

==> foo.3.log <==

==> foo.2.log <==
PATTERN, some more stuff

Ok, so whenever new content is pushed onto any one of the tailed files, we will get notified of it, along with what file it was pushed to (note though, that if we were to push more content into the same file as was previously pushed to (foo.2.log) it would NOT generate another ==> foo.2.log <== line, it would simply append to the ongoing tail:

$ echo "PATTERN, even more stuff" >> foo.2.log

$ tail -f foo.*.log
==> foo.1.log <==

==> foo.2.log <==

==> foo.3.log <==

==> foo.2.log <==
PATTERN, some more stuff
PATTERN, even more stuff

This doesn’t matter much, our solution will adapt to this, but it is a gotcha worth noting (although it should be instantly obvious to anyone reproducing this on their own anyway)

Anyway, with that out of the way, lets get down to business: AWK hackery!

First order of business: We have a pattern we wish to match. This may or may not change at a later data, and sure, we could practice YAGNI (You Ain’t Gonna Need It) but let us prepare for it anyway by assigning the pattern to be matched in the invocation of AWK (this will assume that we have stored the pattern to be matched inside the variable “$_P”):

awk -v pattern="$_P"

Since AWK operates on lines (that is a simplification since one could redefine what constitutes a line) and it won’t remember whatever was in the line if processed in the previous iteration (previous line) we need to store the latest file(name) into which a line was added.

To achieve this we need to detect whenever new content has been added to a new file (i.e. whenever a new line consisting of ==> foo.*.log <== appears.

BEGIN { filename = "" }
/==> .* <==/ { filename = $2; next }

Since BEGIN triggers before AWK starts any processing, it is the perfect place to define our placeholder variable filename in which we are going to temporarily store the filename of the file into which data is presently being entered into.

The next line matches any line containing ==> followed by anything really, followed by <==. Upon encountering one of those lines, we store the stuff between the arrows in the filename variable.

$0 signifies the entire line, and everything inside that line, separated by newlines (start and end of line) or whitespace, is assigned a numbered variable $1 .. $x for x in number of whitespace-separated words in line. (Again, we could reconfigure AWK to not split things on whitespace, but in this case there would be no sense in doing that.)

And just to ensure that we won’t ever match the ==> .* <== line with whatever we put in $_P we tell AWK that once we have stored the filename, we should move on to the next line.

Next on the agenda is to match the pattern:

$0 ~ pattern { printf "%s: %s\n", filename, $0 }

Essentially this tells AWK to look inside $0 (the whole (current) line) for the stuff we have stored in the variable pattern (i.e. $_P) and if there is a match, print it out, first the value stored in the filename variable, then a colon and a space, followed by the entire line that matched ($0) and finally a newline.

Rinse, repeat, until we abort the command with C^c.

Now, this is not the end of the story though. We wanted to write this to a file. But AWK does not write to file until it has processed all of the input, and this will not happen since tail is following and will not end until we kill it (and in doing that we lose it all before it has a chance to be written to file). pesa points out that I got this wrong, go downwards and read his comment for the real deal

Doh!

What we need is a way to tell AWK to write directly to file directly after it has found a match and printed. fflush() is the function to make this happen for us:

$0 ~ pattern { printf "%s: %s\n", filename, $0; fflush() }

Now all we have to do is wrap the whole thing up, redirect to the filename of the condensed log we want to establish:

tail -f foo.*.log | awk -v pattern="$_P" '
    BEGIN { filename = "" }
    /==> .* <==/ { filename = $2; next }
    $0 ~ pattern { printf "%s: %s\n", filename, $0; fflush() }
' > condensedLog.log

There is one big gotcha here though:

If you are starting a tail on log files which are already populated, tail will first output everything from the first log (which is the first file to be read I suppose depends on some system settings (locales comes to mind)) and then move on to the next file, etc. So regardless of the chronological appearence of matching lines in the different files, the first files matching lines will always be presented first.

All matching lines entered AFTER this one-liner has started running, will appear in the correct chronological order.

This simply means that you will have to be prepared for this, and possibly do processing on the condensed log file as well (hopefully the logs you are tailing all have timestamps in them so that sort has something easy to operate on once the lines are entered into the condensed log)

:wq

Strange things you find out about your system half past six on a Thursday morning

Thursday, May 28th, 2009

Woke up somewhere around 0500 hours, heartburn… couldn’t go back to sleep so landed in front of the computer. Read an article (in Swedish) at idg.se about EU and the Telecoms-package nonsense. Apparently cookies are still unsafe… uh-huh.

There was a comment to that article about Local_Shared_Objects which caught my eye, and after having examined my ~/.macromedia-directory I could conclude that Flash stores its “cookies” there. To my surprise they took up quite some space, so I removed those domain-directories which lay inside the “random-id” directory.

For some reason, while Googling in order to ascertain whether it would be safe to remove the directories (I found nothing that indicated it would be safe, nor that it wouldn’t be safe), I found a post about an Ubuntu user who needed help cleaning up his “filled-to-the-brim” partition, and asking what he could remove.

Some responses told him to set his eye on /var/log among other places, and realizing that it was quite some time since I did that myself, I too headed for /var/log

And I started chopping away at the gzipped archive files there (to be honest, it was on fell “sudo rm *.gz” swoop, but who is counting?)

du -sh . indicated there was still someĀ  309 Mb of “stuff” in /var/log (down from 312 Mb or something) so I was not impressed. What was taking up all that space?

Digging a little further I finally zeroed in on the guilty party. /var/log/acpid occupying 297 Mb of my harddrive. Running tail on that file a couple of times made me realize that it made entries into that log more than once every second…

So just to ensure that this wasn’t all just some stupid me poking around the system, spur of activity logging, I told grep to find all lines containing the string “May 27″ (which now in retrospect would match previous years May 27 as well, which means I could have been greping lines as far back as May 2007, this is a Feisty-box, although I am pretty sure that it took me a while after Feisty was released for me to give up Edgy, all in all, I don’t think I had Feisty installed by May 27th 2007, so two years worth of logs) and counted the lines of that outputĀ  grep ‘May 27′ .acpid | wc -l, which returned around 1.2 million hits.

I assume an equal distribution of entries per year, so 600.000 entries made yesterday. 600000 (log entries) / 86400 (seconds in a day) is almost 7 writes a second!

This was clearly not acceptable. I hit Google again, what would be the best way to kill all acpi logging? The launchpad bug report I found indicates that the bug is closed, having been fixed, which is good, once I upgrade when my harddrive goes to… whatever place harddrives go when they have served their time, this will not come back to haunt me.

But Feisty isn’t being bug fixed anymore, so how would I do it?

By adding the arguments “-l /dev/null” to whatever script that start the acpi daemon (acpid). I.e. /etc/init.d/acpid

Again, solutions offered in the forums seemed to target a different version (probably older) than Feisty, as I could not find a line containing $ACPID_BIN = /sbin/acpid

I did however find out that my version used start-stop-daemon to umm… start the daemon. Which takes the flags –exec [arg] and -c [args] (arg being the path to a daemon to start, and args being the arguments to pass to the daemon)

Very nice!

start-stop-daemon –start –quiet –exec /usr/sbin/acpid — -c /etc/acpi/events $OPTIONS

becomes

start-stop-daemon –start –quiet –exec /usr/sbin/acpid — -c /etc/acpi/events $OPTIONS -l /dev/null

I stopped and restarted the the acpid (since the restart sequence looked a little different and I didn’t want to muck with that, I know my own illiteracy and incompetence ;)), killed off the acpid log, and my /var/log is now down to 12 Mb in size all in all.

Reading further in the bug report it would seem that this little acpid “I’m gonna log the shit out of you” behaviour is, to some extent, connected to the laptop harddrive-killing bug. Thankfully my harddrive seem to have survived that bug quite well (probably due to my early hacking of /etc/hdparm.conf as per this page).

Easy identification of USB storage disks

Sunday, March 1st, 2009

I have during the last couple of weeks begun the ponder about the hard drives in my computers. Specifically how much longer they are going to live. Today I rectified the problem by purchasing a 500Gb USB disk, formatted it to ext3 and began the process of backing up all the stuff I wanted to survive in the event of a hard drive failure. I figured the fastest way to get the data onto the disk, was to mount it in each separate system (since the computers are located in direct vicinity of each other.

One of them runs as a server, no X, no desktop environment, no automounting. Automounting is a nice feature and having grown accustomed to having it around, I had to look up how to identify the disk in /dev/.

Enter this post, which neatly solved my problem. I modified it a bit however, issuing:

sudo tail -f /var/log/messages

before I plugged in/powered up the disk. From there on I could easily identify the disk as /dev/sdb and I proceeded to mount /dev/sdb1 to a mount point (/media/disk)

sudo mount /dev/sdb1 /media/disk

Piece of cake, when you know what to mount ;D