Posts Tagged ‘grep’

2011w46

Sunday, November 20th, 2011

First of all: this is really disturbing.


Commands and flags

I think I’ve already mentioned watch, and how that could be useful at time (e.g. $ watch -n 10 -d 'ls -l')

I just found out about a value which can optionally be appended to the -d flag: -d=cumulative

It has its own flag as well --cumulative, and quoting the man-page it makes highlighting “sticky”, presenting a running display of all positions that have ever changed.

Also, this week I learnt about sdiff, which seems neat if you’re on a system which doesn’t have vim (and thus vimdiff) installed.

Anoter nice flag I just found for grep is -m <int> which tells grep to stop looking after the INT first matches.

Scripting Vim

Ok, so I’ve been running into this problem where I am using my own .vimrc configuration in other places, in systems where the vim version isn’t the same as the one I use myself.

This has proven problematic as some of the configuration options I use (most notably set cul (which gives me a better indication about which line the cursor is on)) doesn’t exist in … say a vim version less than 7.

Which meant that if I loaded the same .vimrc config on a system running a vim version earlier than 7, I’d get a warning at startup, which I’d have to press enter to pass by. Irritating.

As luck would have it, it isn’t all that difficult to make a little conditional to check which version is currently loading the config and just ignore the settings which won’t work for that version, such as:

if v:version >= 700
    set cul
endif

Links

Finally, at this years FSCONS I was introduced to the site renderfarm.fi where people can go to either contribute CPU-cycles, or get CPU-cycles, (or both) to help speed up rendering.

:wq

2011w42

Sunday, October 23rd, 2011

Perl

Progress! This week I wrote my first perl script, to parse some data on one of my colleagues nodes. In doing so I also, inadvertedly, made another one of my colleagues express something along the lines of <q>”very nice, now we have another scripting guy on our team.”</q> ;D

grep count occurrences on single line

Say you have a line (or multiple, that you are iterating through one at a time) of data structured in some way representable and matchable by a regular expression, and that you feel an overwhelming need to count the number of occurences in each line.

Did you ever imagine that grep and a couple of pipes were all you’d ever need to realize this wish?

$ echo "foo foo foo" | grep -o 'foo' | grep -c 'foo'

Important Dates Notifier

Saturday, January 29th, 2011

I have never been especially good at remembering dates or appointments. My memory just doesn’t seem constructed to handle things like “on YYYY-MM-DD do X” (peculiarly enough, “on Friday next week, have X done” works much better).

I guess things relative to “now” works better for me (at least in the short term) than some abstract date some time into the future.

Luckily enough, I don’t seem to be the only one suffering from having an “appointment-impaired memory” so others have created calendar applications and what not.

These work great, the one I presently use, remind is awesome. But sometimes it isn’t “intrusive” enough.

There are some appointments/dates that are important enough that I would like for remind to hunt me down an alley and smack me over the head with the notification, as opposed to presenting it to me when I query it for today’s appointments.

So I put together a little shell script which push notifications (the ones I feel are the most important) to me via jabber.

The solution involves cron, remind, grep and a nifty little program called sendxmpp.

This script I have placed on my “laptop-made-server” which coincidentally also happens to be the central mercurial server, through which I synchronize the repositories on my desktop and netbook.

Which means that if I just take care to clone the specific repository containing my remind files to some place local on the server, I could have a cronjob pull and update that repository and it would thus always (as long as I have pushed changes made from the desktop/netbook) have the most up to date files available.

If setting up a repository server seems to big of a hassle, one could of course (at least with remind) have a master ~/.reminders file, which then invokes an INCLUDE expression.

This makes remind look in the specified directory for other files, and in that directory have one file for each computer (along the lines of .rem) and have each individual machine scp (and overwrite) their individual file every now and then.

In any case, once the server have fresh and updated sources of data, all it need do is execute the idn.sh script once every day (preferably early, this will work pretty well as the jabber-server I use caches all messages sent to me while I was offline, so once I log in again, I’ll get the notifications then).

As sendxmpp takes the message from STDIN, recipient-addresses as parameters, and parses a configuration file to divine what account should be used to send the message (I set up a “bot” account to send from, and then just authorized that bot in my primary jabber-account), I see no reason why someone couldn’t modify the script to instead (or in addition) use something like msmtp to send out and email instead.

The script itself, in its current form, is rather straightforward, although I’m sure there are still room for optimizations.

#!/bin/sh
TAGS="#Birthday #ATTENTION"

for t in `echo "$TAGS"`;
do
    rem | grep -i "$t" | while read line;
    do
        echo "$line" | sendxmpp recipient@jabber.example.org
    done
done
exit 0

Relevant parts of my crontab:

0   7   *   *   *       cd /home/patrik/remind-repo; /usr/bin/hg pull -u 2>&1
5   7   *   *   *       cd /home/patrik/idn-repo; /usr/bin/hg pull -u 2>&1
10  7   *   *   *       /bin/bash /home/patrik/bin/idn.sh 2>&1

In /home/patrik I have created symlinks /home/patrik/.remind -> /home/patrik/remind-repo/.remind and /home/patrik/.reminders -> /home/patrik/remind-repo/.reminders

And in /home/patrik/bin/ I have a symlink (idn.sh) to /home/patrik/idn-repo/idn.sh. So in case I change the script, like add a tag to look for or something (ok, that should be moved out to a configuration file, that will be part of the next version) that will be picked up as well, before the notifications goes out.

And that’s about it. Risk of forgetting something important: mitigated.

:wq

Awk hackery

Sunday, May 30th, 2010

I’ve always leaned more towards sed than awk, since I’ve always gotten the impression that they have more or less the same capabilities, just with different syntaxes.

But the more command line parsing I do, the more I’ve begun to realize that there are certain things I find easier to do in awk, while some things are easier with sed.

One of these things, that I find awk a better tool for, is getting specific columns of data from structured files (most often, but not limited to, logs).

I have for some time known about

cat somefile | awk '{ print $1 }'

Which will output the first column of every line from somefile. A couple of weeks ago, I needed to fetch two columns from a file (I can’t remember now what the file or task was, I’ll substitute with a poor example instead)

ls -l | awk '{ print $1, $8 }'

This will give you the permissions and names of all directories and files in pwd. One could of course switch places of $1 and $8 (i.e. print $8, $1) to get names first and then the permissions.

Recently I found myself needing to find all the commands executed from a crontab (part of a script, to create another script which was to verify that a migration had gone right, by allowing me to execute those commands whenever I wanted, and not just whenever the crontab specified)

Luckily for me, and this blogpost ;) , none of those commands were executed with parameters, and since I am too lazy to actually count how many fields there are in a crontab file, I got to use:

crontab -l | grep '^*0-9' | awk '{ print $(NF) }'

Which lists the content of the present users crontab, finds all lines which either begin with a number or an asterisk, and then prints the last column of that line. Magic!

Netbooks, bash-scripting and rmmod

Saturday, September 12th, 2009

I recently bought a netbook (Acer Aspire One A531H) which I promptly installed Ubuntu Netbook Remix on. This has worked very well so far, and except for an early problem with wlan (which was fixed after a couple of minutes worth of searching and reading) the only real problem I have had with this little guy is something I experience with all laptops.

The Problem

The sensitive touch-pad of doom. Perhaps I am doing something wrong, I don’t know, but the touch-pads ALWAYS gives me trouble (mostly by “conveniently” moving the cursor to another part of the text while I am writing something).

I tried a workaround, using syndaemon with the flag -d, to have the touch-pad temporarily disabled while using the keyboard, moving that into a script and configure that script to run at startup. It is a nice idea, but it re-enables the touch-pad too quickly again, so while cutting the number of incidents in more than half, I still wasn’t satisfied.

On my regular laptop, which I always connect an external trackball to, I have permanently disabled the touch-pad (sudo rmmod psmouse at upstart) but permanently disabling it on the netbook wouldn’t work either, since for some tasks (like web-surfing, no I haven’t gotten around to learning the vimperator add-on just yet) are quite a lot easier with a mouse than without it.

The Solution

So what I really wanted was a convenient way of quickly enabling and disabling the touch-pad, when I needed to.

Reusing old knowledge about how to manually add shortcuts to metacity, all I had to do was to create a script which ascertained the status of the psmouse module (loaded or not) and upon that, either removed or loaded it.

To get the state of a module Foo, one can use lsmod | grep Foo, which in this case leads to lsmod | grep psmouse. This will either yield nothing (module not loaded) or a line (module loaded).

We can improve on this a bit, making sure we always get some kind of value returned, something like lsmod | grep Foo | wc -l. Since the last command in the chain now counts the number of lines that was returned from grep, we now either get 0 or 1 returned.

So there I was, thinking I am done, having entered the gconf-editor, pointed the script to command_2 (apps > metacity > keybinding_commands) and assigned a key-binding (<Control><Alt>t) to run_command_2 (apps > metacity > global_keybindings). Life was playing, all was well. Except for the fact that hitting that key combination did absolutely nothing to shut down the touch-pad.

Which was odd, since running the script worked. The individual commands to disable and enable the touch-pad (sudo rmmod psmouse and sudo modprobe psmouse, respectively) worked flawlessly… so why didn’t this work?

Then it hit me. Running either of those commands from the command-line, would result in it prompting me for my password, something a poor script without any ability to accept input from stdin can’t do. It couldn’t even tell me about it since there was no stdout for it to use either.

gksudo to the rescue. Since gksudo pushes up a graphical password prompt, the script could once more ask me for a password, and I could again supply it. And now it works nicely :D

In closing, the script:

#!/bin/sh

if [ $(lsmod | grep psmouse | wc -l) -eq 0 ]
then
    gksudo modprobe psmouse
else
    gksudo rmmod psmouse
fi
exit 0

Strange things you find out about your system half past six on a Thursday morning

Thursday, May 28th, 2009

Woke up somewhere around 0500 hours, heartburn… couldn’t go back to sleep so landed in front of the computer. Read an article (in Swedish) at idg.se about EU and the Telecoms-package nonsense. Apparently cookies are still unsafe… uh-huh.

There was a comment to that article about Local_Shared_Objects which caught my eye, and after having examined my ~/.macromedia-directory I could conclude that Flash stores its “cookies” there. To my surprise they took up quite some space, so I removed those domain-directories which lay inside the “random-id” directory.

For some reason, while Googling in order to ascertain whether it would be safe to remove the directories (I found nothing that indicated it would be safe, nor that it wouldn’t be safe), I found a post about an Ubuntu user who needed help cleaning up his “filled-to-the-brim” partition, and asking what he could remove.

Some responses told him to set his eye on /var/log among other places, and realizing that it was quite some time since I did that myself, I too headed for /var/log

And I started chopping away at the gzipped archive files there (to be honest, it was on fell “sudo rm *.gz” swoop, but who is counting?)

du -sh . indicated there was still some  309 Mb of “stuff” in /var/log (down from 312 Mb or something) so I was not impressed. What was taking up all that space?

Digging a little further I finally zeroed in on the guilty party. /var/log/acpid occupying 297 Mb of my harddrive. Running tail on that file a couple of times made me realize that it made entries into that log more than once every second…

So just to ensure that this wasn’t all just some stupid me poking around the system, spur of activity logging, I told grep to find all lines containing the string “May 27″ (which now in retrospect would match previous years May 27 as well, which means I could have been greping lines as far back as May 2007, this is a Feisty-box, although I am pretty sure that it took me a while after Feisty was released for me to give up Edgy, all in all, I don’t think I had Feisty installed by May 27th 2007, so two years worth of logs) and counted the lines of that output  grep ‘May 27′ .acpid | wc -l, which returned around 1.2 million hits.

I assume an equal distribution of entries per year, so 600.000 entries made yesterday. 600000 (log entries) / 86400 (seconds in a day) is almost 7 writes a second!

This was clearly not acceptable. I hit Google again, what would be the best way to kill all acpi logging? The launchpad bug report I found indicates that the bug is closed, having been fixed, which is good, once I upgrade when my harddrive goes to… whatever place harddrives go when they have served their time, this will not come back to haunt me.

But Feisty isn’t being bug fixed anymore, so how would I do it?

By adding the arguments “-l /dev/null” to whatever script that start the acpi daemon (acpid). I.e. /etc/init.d/acpid

Again, solutions offered in the forums seemed to target a different version (probably older) than Feisty, as I could not find a line containing $ACPID_BIN = /sbin/acpid

I did however find out that my version used start-stop-daemon to umm… start the daemon. Which takes the flags –exec [arg] and -c [args] (arg being the path to a daemon to start, and args being the arguments to pass to the daemon)

Very nice!

start-stop-daemon –start –quiet –exec /usr/sbin/acpid — -c /etc/acpi/events $OPTIONS

becomes

start-stop-daemon –start –quiet –exec /usr/sbin/acpid — -c /etc/acpi/events $OPTIONS -l /dev/null

I stopped and restarted the the acpid (since the restart sequence looked a little different and I didn’t want to muck with that, I know my own illiteracy and incompetence ;) ), killed off the acpid log, and my /var/log is now down to 12 Mb in size all in all.

Reading further in the bug report it would seem that this little acpid “I’m gonna log the shit out of you” behaviour is, to some extent, connected to the laptop harddrive-killing bug. Thankfully my harddrive seem to have survived that bug quite well (probably due to my early hacking of /etc/hdparm.conf as per this page).

LaTeX, ligatures and grep

Thursday, April 16th, 2009

Having finally finished a long overdue paper, I thought I’d share a little knowledge, well, semi-knowledge/-ugly hack actually, that I have found useful while working on this paper.

I like justified text, I think it make the content look sharp. LaTeX seem to agree with me on that point, at least in the style I used (report). Justified text in LaTeX has one drawback however. Sometimes the letter spacing between certain letters become too small, resulting in what I surmise typographers call “broken ligatures”. The term “ligature” seem to simply  refer to a specific part of a letter. A broken ligature, then, would happen when the ligature in a preceding letter “floats into” the next one.

Justified text is sharp, justified text with broken ligatures… not so much. And LaTeX doesn’t seem to be fully able to handle this on its own, so manual intervention seem necessary. (It could of course just be that the version I use (texlive) is silly, but I recall having similar problems back in Uni while I used tetex)

In any case, ugly-hacking tiem!

SEARCH

First priority: find all occurrences of potential broken ligatures.

One could visually (using the ole trusty eyeball mk.1) scan the generated document for imperfections. That takes a lot of time and there is a large risk that some occurrences “slip through”. Also, in some places the ligatures won’t be broken, because the text has a good fit on the row at present time. But then someone adds a word, a sentence, or just fix a grammatical bug, whatever, and the fit is not so good anymore.

Of course, it is wholly unnecessary to run this procedure until the document is “frozen” and won’t accept any more addition to it in terms of text. I ran it three times, one time before each “beta”/”release candidate” which I sent to some friends for critique/proof-reading/sanity checking, and then once more after having incorporated the input from my friends.

To identify potential trouble, grep is called in to find every instance of the character combinations which can break. In my experience, these combinations are “ff”, “fi” and “fl”.

$ grep -rn f[fil] chapters/*.tex

Only lower-case letters seem to cause trouble, but that is an assumption I make. I could well see problems stemming from having an initial lower-case f, followed by an upper-case letter. I have never encountered this, so I don’t search for it, but as usual, ymmw.

Now I have a nifty little list with all occurrences of the letter sequences “ff”, “fi” and “fl”, nice! Now what?

DESTROY

The solution should, preferably, be applied to nearly all instances of these sequences, so that a present “good fit” line, if modified, would just automagically work later on as well. This means that the solution should not screw up the formatting of the “good fit” cases, while kicking into action, iff the good fit turn bad.

The solution I use is “\hbox{}”. This is inserted between the characters (f\hbox{}f, f\hbox{}i, f\hbox{}l) What makes this ugly is of course that your LaTeX code is now littered with this… well umm… shit. This method will of course give your spell checker a nervous breakdown.

Now you are probably thinking that this is a non-issue, just create a small shell-script to use sed, and produce new files with the modified content, copy these files into a build directory and have the make script invoke that shell-script before invoking the build command.

There is a potential pitfall in that solution. My paper linked to a couple of websites, as in clickable hyperlinks inside the pdf. Imagine the fun that would be derived when sed would hit upon \url{http://www.openoffice.org/} and transform that into \url{http://www.openof\hbox{}f\hbox{}ice.org/}.

Making sed aware of the \url{} tag, and verbatim quotes (probably all of the quoting systems), and making it leave the content inside well enough alone is probably doable, but having my favorite text-editor to an interactive search/replace was the method I opted for.

Grep

Monday, August 25th, 2008

Grep is one of those tools that every GNU/Linux user should have at least a rudimentary understanding of. You will get by without it of course, but it can speed up things quite a bit.

Just today a friend and former classmate had a problem: In a large C++ code base, find the one file printing a specific error message. Opening every file and manually checking them: not feasible and surely not cost-effective.

He asked me for any insight in searching, and from the ole’ toolbox I brought grep. Now I will readily admit, I am no superuser, or guru or anything of the sort. My grep skills are not what they probably should be, so my first attempts was rather unsuccessful.

Framing the problem even more, the .cpp files where spread over a number of directories, and in the project root directory there where no files, only directories.

Since I mostly program in Python those where the files I had available to test my grep commands on:

$ grep 'import' *.py

I was greeted with an error, *.py no such file or directory. But the syntax was right, right? Went into a sub directory containing python files, ran the same command again, and was rewarded with a list of files.

Ok, so the problem wasn’t the syntax, it was targeting. What about

$ grep -R 'import' *.py

Again with the error message… ok, quick check in the man-page, yes, -R -r or –recursive all works, great, next try:

$ grep -r 'import' ./*.py

That error message is getting tedious… what about

$ grep -r 'import' ./

Now we are rolling, but it is chewing on things I have no interest in listing… like Vim’s .swp files etc. How do we fix that? Enter the man-page again, aha –include

$ grep -r --include '*.py' 'import' ./

Very nice, recursive search throughout all sub directories for files ending with .py containing the string ‘import’. Now to help him out a little more, let’s add -n also, so that he will see on what line the error message is printed.

$ grep -r -n --include '*.py' 'import' ./

And there you have it. Just one of the various uses of grep.