Posts Tagged ‘sed’

2011w48

Sunday, December 4th, 2011

Where the frakk did this week go?!?!

Work has been progressing, I can’t say that I am good at it yet, but I am better than I was just last week, which is thoroughly encouraging :)

Pontus made me realize that knowing sed is not enough, for some things you really need awk. Another thing to push to the toLearn stack…

I’ve been doing some more Perl hackery, but nothing worth showing, but I did however come across a site which I believe to be rather good and helpful regarding learning basic things about Perl.

Something which passed me by completely was that this Monday saw the release of YaCy 1.0 (in German), but as you can see on Alessandro’s blog I might have been just about the only one who didn’t get that particular news item. Congratulations on making version 1.0 guys!

I was also toying with the idea the other day of making quarterly summaries as well. One blog post a week is great as it forces me to write, thus improving my writing, but it doesn’t really do anything for discerning trends, or changes in the way I work. This could be interesting :)

Finally, I should really start planning for writing my yearly “technology stack” post by diffing what I used back then and what I’m using now.

I am already certain that I’ve disappointed myself in some aspects, and surprised myself in others…

:wq

2011w25

Sunday, June 26th, 2011

Last week was rather eventful, the largest thing being the one thing I naturally forgot to write about (go figure…), my appointment as deputy coordinator of FSFE Sweden. This is nice :)

That has however meant that this week hasn’t seemed as eventful, and I don’t know, for some reason I got off to a really slow start of the week, the only worthwhile things to write about started happening this Thursday.

nginx and password-protected directories

My father asked me for help in getting a bunch of files in his possession, over to some friends of his (who are, to the best of my knowledge, as computer illiterate as he is).

This meant that my first idea, to just set up an FTP-account on my server and have them log into that and download the files, wouldn’t work. I would need something simpler, but still with restricted access.

Preferably they’d just surf to some place, enter a password, and download a zip-archive (since all Windows versions since XP handles zip-archives like compressed folders, this should fall into the realm of what a computer user should be able to handle).

Something like Apache’s htpasswd stuff. And I wanted to do it with nginx, because I really want to get better at using and working with it.

The first task, obviously, was to check if nginx had that capability at all (it has), and if so, how it works.

This post showed me that it was possible, and how to do it.

A note here though: I first tried to set a password containing Swedish characters (åäö) and this didn’t work at all.

ticket

I have been wrestling with the question of how I would manage to create a database which individual users can read from and write to, but which they shouldn’t be able to remove from the filesystem (I know, a DROP or DELETE command can be just as devastating, so I must continue thinking about this).

alamar at StackOverflow solved this for me. The solution is to let the file be read and writeable, but have the parent directory not be writable.

This however makes it impossible to add new files to the directory. But since I am working with the idea that there should be a “ticket” user with a corresponding “ticket” group, and that every individual who should have access to the tracker will be in that ticket-group, the directory could disallow writing for group and other, leaving the ticket-user free to create more databases…

Although I now realize that this would make it easy for anyone in the ticket-group to screw around with any ticket database (insert, update, delete).

This clearly needs more design thought put behind it.

ArchLinux and MySQL client binaries

I needed to interact with a MySQL database on another server, but MySQL (the server) wasn’t installed on my desktop, and I didn’t really want to have to install the entire server just to get hold of the mysql client binary so that I could interact with the remote server.

Turns out that in ArchLinux, themysql binaries are split into a clients and a server package, perfect for when you wish to interact with MySQL databases, but not have the entire frakking server installed on your machine.

Accessibility, HTML and myConf

Since FSCONS is striving to be accessible, and the little “myConf” technology demonstrator I wrote the other week was intended for FSCONS, I have been trying to figure out how to make that as accessible as it can be (first of all, I have no idea what so ever, if a screen reader even parses javascript, and as the myConf demonstrator is mostly implemented in jQuery that might present itself a showstopper).

But given the assumption that a screen reader can parse javascript, and will output that big ‘ol table which is created, how do I make an html table accessible? Since a screen reader makes use of the html code, and even a sighted person could get tripped up trying to parse the markup of a table, this looks like a worthwhile venture.

Sadly, like all documents from w3.org, they just leave me more confused and without any questions answered than when I began, but luckily, there seems to be other resources more knowledgeable, and with more understandable wording/examples, although I haven’t had the time to read through them all yet (I’m mostly just dumping them here so that I’ll be able to find the pages again once I again have the time to look into it):

Ed Weissman has taken his most insightful comments from Hacker News and compiled it into a book, which he then graciously made available for free.

Now, I have to admit, until this week, I’d never heard of Ed, and I have rarely read stuff on Hacker News, but from what I’ve read so far of his book, I might have to change this.

Optimizing Vim usage

A fellow… hmmm, a fellow Fellow made me realize just how long it is going to take me to fully grok Vim. I have been using ggVGd to:

  1. Go to the first line (“gg”)
  2. Enter visual mode (spanning entire lines) (“V”)
  3. Go to the last line (“G”)
  4. And finally delete the selection (the entire contents of the file) (“d”)

Or one could just do :%d as the fellow showed me… And I have been using the pattern :%s/foo/bar/ for quite some time, understanding perfectly that “%” in this context means “for every line do…”

I just never made the connection that it could be applied to something simpler than a sed substitution.

Links

Lack of (American) geeks is a national security risk according to DoD. Funny, anyone else who thinkgs that if they just stop prosecuting every kid who is playing around with security systems, or dowload music, or build (more or less dangerous) stuff from schematics they found online, this problem might just go away on its own?

Whitespaces in filenames and how to get rid of them

Sunday, September 26th, 2010

Although it has been more than four years since I switched from Windows to GNU/Linux, I still manage to stumble upon files, either being brought back from backups, or downloaded from the net, that contain spaces, and need to be handled.

Since I got the hang of shell scripting I have stopped dreading having to rename these files manually (which was my previous m.o. for that scenario).

Imagine a file named “My super cool science report.pdf”. Now, for a single file it might be ok to just manually rename the sucker, either via your file manager of choice, or through  a quick (tab-complete supported) mv. Fair enough, but what if you have ten files?

This task, when being converted into a shell script, can first be broken into smaller tasks.

Step 1 is that we need some way of listing the files we wish to operate over. If they are all stored in directory separate from other files, and there are no sub-directories in that directory etc, one can simply use ls -1 (i.e. ell ess dash one)

Otherwise, find is a VERY useful tool.

$ find /path/to/document/directory -maxdepth 1 -type f -name '* *'

This simply says “in the specified path, look only in the current directory (i.e. don’t traverse downwards) for files with a name matching whatever followed by a space followed by whatever.

Now that we have a list of files to work with, comes step 2: iterating over the files.

This is what has tripped me up in the past. I’ve always tried constructs along the lines of for filename in `expression`, where expression is some lame attempt to list the files I want to work with. I could probably have gotten it to work, but it requires more patience that I was willing to muster ;)

Besides, while read filename; do command(s); done works immediately.

To transfer the list of files from find / ls we simply pipe it to the while loop:

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do ...; done

Had this been put in a script, instead of written on the command line, we would now have something looking a lot like this:

#!/bin/bash
find ./ -maxdepth 1 -type f -name '* *' | while read filename;
do
    ...
done

Step 3 then, is obviously about actually transforming the filename.

For simple substitutions like this, tr is a great tool, e.g.

$ echo "This is a test" | tr ' ' '_'
This_is_a_test

This simply takes stuff from stdin, replaces all spaces with underscores, and pushes it to stdout.

tr also has great functionality for altogether removing specified characters from the given string, e.g.

$ echo 'What?!' | tr -d '!'
What?

Finally, tr is a pretty cool guy, converts between cases and doesn’t afraid of anything:

$ echo "Soon this will be shouted" | tr 'a-z' 'A-Z'
SOON THIS WILL BE SHOUTED

Ok, enought about tr, but it is pretty cool, and quite enough for this task. So now we know how to list the files, iterate over them, and transform the filename from the original one, to a new, better one. Now what?

Now we need to save the transformed name into a temporary variable (since mv requires both a source path and a destination path) which is done with:

newfilename=$(echo "$filename" | tr ' ' '_')

One could also use backticks:

newfilename=`echo "$filename" | tr ' ' '_'`

But I am always wary of using this online as they tend to look a little bit too much like single quotes.

Now, since we are not stupid, we will of course test this script before unleashing it on our poor unsuspecting files. This is step 4, and it is the most important step!

So in our loop we do:

echo mv "$filename" "$newfilename"

Notice the echo. It is there for a reason. This script, when run, will only produce a lot of text, printed to stdout. This is the time the scripter would do well to pay attention. Does the resulting lines with “mv My fancy report 1.pdf My_fancy_report_1.pdf” look correct?

If it doesn’t, go back and tweak the line setting the newfilename variable until it looks correct.

Test script:

#!/bin/bash
find ./ -maxdepth 1 -type f -name '* *' | while read filename;
do
    newfilename=$(echo "$filename" | tr ' ' '_')
    echo mv "$filename" "$newfilename"
done

or

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do newfilename=$(echo "$filename" | tr ' ' '_'); echo mv "$filename" "$newfilename"; done

Otherwise, proceed to step 5: removal of echo.

Yeah, that’s really all. That little echo in front of  mv “$filename” “$newfilename”… remove that, and the script will be unleashed on the listed files.

And the final script:

#!/bin/bash
find ./ -maxdepth 1 -type f -name '* *' | while read filename;
do
    newfilename=$(echo "$filename" | tr ' ' '_')
    mv "$filename" "$newfilename"
done

or, for the one-liner type of guy:

$ find ./ -maxdepth 1 -type f -name '* *' | while read filename; do newfilename=$(echo "$filename" | tr ' ' '_'); mv "$filename" "$newfilename"; done

Finally, if you want moar power you could either pipe together several tr after one another, or tr other stuff, like sed…

Your imagination, understanding of pipes, and knowledge of regular expressions is the limit ;)

Awk hackery

Sunday, May 30th, 2010

I’ve always leaned more towards sed than awk, since I’ve always gotten the impression that they have more or less the same capabilities, just with different syntaxes.

But the more command line parsing I do, the more I’ve begun to realize that there are certain things I find easier to do in awk, while some things are easier with sed.

One of these things, that I find awk a better tool for, is getting specific columns of data from structured files (most often, but not limited to, logs).

I have for some time known about

cat somefile | awk '{ print $1 }'

Which will output the first column of every line from somefile. A couple of weeks ago, I needed to fetch two columns from a file (I can’t remember now what the file or task was, I’ll substitute with a poor example instead)

ls -l | awk '{ print $1, $8 }'

This will give you the permissions and names of all directories and files in pwd. One could of course switch places of $1 and $8 (i.e. print $8, $1) to get names first and then the permissions.

Recently I found myself needing to find all the commands executed from a crontab (part of a script, to create another script which was to verify that a migration had gone right, by allowing me to execute those commands whenever I wanted, and not just whenever the crontab specified)

Luckily for me, and this blogpost ;), none of those commands were executed with parameters, and since I am too lazy to actually count how many fields there are in a crontab file, I got to use:

crontab -l | grep '^*0-9' | awk '{ print $(NF) }'

Which lists the content of the present users crontab, finds all lines which either begin with a number or an asterisk, and then prints the last column of that line. Magic!

CLI “magic”

Wednesday, February 4th, 2009

Another day, another question. A friend of mine is working on his thesis, and wanted to replace all instances of a term, throughout a range of files. The problem could be formulated:

For all files of type X, search through them, replacing every instance of foo with bar.

In this particular case, the search term needed to be capitalized. So “foo” needed to be “FOO”. Why? Not my place to speculate, and not important to the problem, or the solution.

Building upon previous experiences, sed was called back into service.

$ sed -i 's/foo/FOO/gi' FILE

does the job. But on one file only. Time to widen the comfort zone a bit. I normally don’t use loops in the shell, mostly due to the fact that I haven’t taken the time to learn the syntax, but also out of a good portion of respect for them. Whatever command executed, is magnified by the use of loops. They should always be handled with a great deal of respect.

Personally I can live with some manual labor (i.e. executing the same command over and over feeding a new parameter every time) as long as I know that I can count on the command. It endows me with a sense of control. But my friend chose to believe in his version control system, and that his disks wouldn’t fail, that his backups wouldn’t be magnetically erased, that the big GNU in the sky (or whatever $DEITY he believes in) would have his back, and that I am competent enough to write a bash-script which would work according to specification.

Ballsy, stupid but ballsy ;)

So off I went to the Internet, searching for the dark incantation I would need to have the command executed repeatedly over all his designated files.

The answer came in the form

$ for i in `command`
> do
> command $i
> done

After quick experimentation I concluded that “ls *.txt” would indeed only display the files ending with “.txt” in the given directory. Neat! All the pieces are in place, now to put it all together:

$ for f in `ls *.txt`
> do
> sed -i 's/foo/FOO/gi' $f
> done

which, when collapsed into a single row amounts to:

$ for f in `ls *.txt`; do sed -i 's/foo/FOO/ig' $f; done

Or, you could just manually open up all the files in a text-editor, and for each file hit search and replace… The only thing I feel right now is that there probably exist an option in sed for modifying case built into sed, which would make it a bit more flexible to search for variable terms which share a common root (as an example, what if you wanted to capitalize all occurrences of president, presidents and presidential? There simply must be such a command in sed, so once I find it I will update this post)

UPDATE:

The solution did indeed exist, and was of course, simple.

$ sed -i 's/\(foo\)/\U\1/gi' FILE

In order to do post-processing on the output, it can no longer be a static string (indeed that would not work since the whole point was to be able to match words with a common root, i.e several different but similar words), so it needs to be replaced by a back-reference to whatever was matched. Which means we now have to group the term we are searching for.

So the final incantation would look like this:

$ for f in `ls *.txt`; do sed -i 's/\(foo\)/\U\1/gi' $f; done

sed

Friday, September 26th, 2008

I recently wrote a post about a friend of mine, and how he needed some help locating files containing certain phrases. He returned again today, seeking advice on how to remove information at the beginning of each line, from a file.

Depending on the structure of this data (or junk as he referred to it ;)) this may or may not prove a hard nut to crack.

He wanted to remove the junk data so that he could run a diff between another trace file and this trace file, which is really clever, and I was happy that I could be of assistance.

My first instinctual question to him was if the (junk) data was uniform in any way. It contained timestamps in the format HH:MM:SS, so the structure was identical, but each line would look a little different.

New angle of attack, is the junk data separated from the … well non-junk data in any way?

Yes, between the junk and the valuable data one can always find the string “TRACE: “. Bingo!
I asked for a couple of lines of output, so that I could get to work crafting a sed command which would do his bidding.

An example of the lines I received is:

18340:3063011104] 10:30:40 TRACE: Moving absolute 0 from -1

Basically, numbers, colon, numbers, ending square bracket, space, numbers, colon, numbers, colon, numbers, colon, space, TRACE, colon, space.

My first thought about what tool to use was: “sed”. I’ve used it before, not much as of late, so I had to experiment a bit before being able to hand over a working solution. One of the things that clung to my mind and wouldn’t go away is that I recalled needing to use the flag -i if I wanted to work on a file. Several attempts with the resulting error message “sed: no input files” later, I finally gave up on -i and tried without it.

UPDATE: As pesa pointed out, I was almost correct, and had I tried it a bit further, I would have dropped the “<” from the command, and it had worked. As he also suggested, the command I ended up with, cat:ing the file and piping the output to sed, was probably more portable as some installations of sed, on various systems, might not support the notion of an input file. Anyway, my memory wasn’t wrong, I was just in to great a rush to actually figure it out while trying to help out.

The first attempt, which FAILED miserably:

$ sed -i -e 's/^[0-9:\] ]* TRACE: //' < sed.txt > sed.1

I am a big fan of leaving the source untouched, and redirecting any changes into a new file instead, so the idea was simply to execute the substitution on the data taken from sed.txt, and put it in sed.1. Don’t ask me about the filename, I couldn’t tell you. “sed.txt.bak” was already taken by a backup of sed.txt (in case I would screw the original up I wanted to be able to do more tests).

I won’t bore you with all the attempts, but the final, WORKING command was:

$ cat sed.txt | sed -e 's/^[0-9]*:[0-9]*\] [0-9]*:[0-9]*:[0-9]* TRACE: //' > sed.1

There is probably room for a lot of improvement on the regexp, but it works, my friend is happier and more productive than ever (his own words) and all is well.

Mission accomplished.