Archive for the ‘School’ Category

2012w06

Sunday, February 12th, 2012

Update: Ooops, I guess we gone incremented the year again… and no one thought to tell me :(

ACTA

It’s comforting to know that the people we elect to rule us at least know what they’re doing… Oh… wait…

git and branches

Last week, for the first time, I think I groked branches. The headline mentions git branches, and if they are different from other VCS’ branches, then last week I think I groked git branches :P

I’ve known about branching for quite a while, but never gotten past anything other than a rudimentary understanding.

I think I understood how mercurial does it (simply clone the repository, name the root directory of that clone to whatever you want to call that branch, and presto. (And yes, I am aware that mercurial has a branch command as well, so my understanding on that point is probably incorrect).

Either way, what finally gave me an “aha”-moment was this blogpost.

And while one the subject: Other uses of git. I am going to take a closer look at especially Prophet.

Links

AUTOMATE ALL THE THINGS

No but seriously, frakking do it. Automation ftw.

Adventures in Javascript land

Tuesday, May 5th, 2009

Last night an old classmate of mine approached me with a programming problem, this time in Javascript. This is not a language I have spent much time with for various reasons and my previous attempts at grokking it have all been unsuccessful.

But times seem to have changed. Either I am a better programmer now, or Javascript has matured (I haven’t been keeping tabs on what is going on in the Javascript camp so I am still a bit fuzzy on the whole DOM thingy and if you can, nowadays, write a Javascript which both Internet Explorer and all the other browsers will understand without jumping through at least a couple of hoops), but in any case I was able to help him.

The problem he was having oriented around string manipulation. Given a pipe-separated string of “key=value” pairs, extract the values and store them in a variable of their own.

I am guessing he was going to extract all the values, but the example code he sent to me just included the extraction of a single value. His attempt was not bad, combining indexOf() and substring(), and I would like to think he had a pretty smart idea going, but Javascript was not intelligent enough for his idea.

The “mistake” he made was thinking that indexOf() would continue and find the next occurrence once it had been called once thus saving its “state” from the previous search, much like how a file/array iterator would behave.

He also had some rather “funny” constraints, or rather the reasoning behind them. There could be no magic numbers in the code (which is of course a good thing in and of its own) because the given string could be subject to change in the future.

I have no objection to this, except for the fact that he used hard-coded strings as argument to indexOf(). But sure, having to update that one place instead of two is better because otherwise you’ll just miss the other place when you update the first. (The magic numbers constraint surfaced when I suggested adding an offset to the index returned by indexOf() so that he would get the appropriate parts of the string.)

The initial code (obscured for his benefit) looked something like this:

var arg = "key1=value1|key2=value2|key3=value3|key4=value4";
var index = arg.indexOf("key2");
var key2 = arg.substring(index, indexOf("|"));

There were some initial mistakes in the code (second indexOf() called on nothing) as well as the assumption described earlier, that indexOf() kept a state. This code, after modification, would of course have tried to return a negative length substring. Now I haven’t checked, but somehow I doubt that Javascript would actually return a reversed substring of the indicated length and position.

My suggestion was on par with:

var arg = "key1=value1|key2=value2|key3=value3|key4=value4";
var index = arg.indexOf("key2") + 4;
var key2 = arg.substring(index, arg.indexOf("|", index));

This brought up the magic numbers constraint, so I then suggested replacing line 2 with:

var index = arg.indexOf("key2") + "key2".length;

But that was not satisfactory either. I believe he ended up with a solution of his own, calculating a second index finding the first “=” after “key2″ (using index as offset) and then from there in the substring incrementing the second index by one:

var arg = "key1=value1|key2=value2|key3=value3|key4=value4";
var index = arg.indexOf("key2");
var index2 = arg.indexOf("=", index);
var key2 = arg.substring(index2 + 1, arg.indexOf("|", index));

He was happy with that solution, but I was less than enthusiastic with any of that code so I started digging on my own. One solution could be regular expressions (I don’t know if it is a good or bad thing that I almost always nowadays think “regexp” when I have to find and/or break out data from a string…) and I found a pretty neat solution due to the fact that the Javascript string type has a built-in function match() which takes a regexp pattern as argument.

var arg = "key1=value1|key2=value2|key3=value3|key4=value4";
var key2_pattern = /key2=(\d+)/;
var key2 = arg.match(key2_pattern)[1];
// [0] == entire matched string, [1] == matched group (i.e. \d+)

Still I was not entirely satisfied, regular expressions can be great, if kept uncomplicated, but really how often do things stay uncomplicated? I was also wondering whether it would be possible to just store it all, dynamically, in a dictionary type of structure. That’s when I realized that Javascript has associative arrays. :D

var arg = "key1=value1|key2=value2|key3=value3|key4=value4";
var tmp_array = arg.split("|"); // array with "key=value" elements
var dictionary = new Array();
while (e = tmp_array.pop())
{
    tmp = e.split("=");
    dictionary[tmp[0]] = tmp[1]; // dictionary["key"] = "value";
}

I haven’t fully tried out this last suggestion (it works, but I haven’t yet tried how Javascript deals with “2” as opposed to 2, and how this might screw things up. I am thinking that it probably will screw things up. On the other hand so should substring (since it per definition also returns a string).

None of this really matters, my friend came to a working solution and I got to play with Javascript. Win-win :)

LaTeX, ligatures and grep

Thursday, April 16th, 2009

Having finally finished a long overdue paper, I thought I’d share a little knowledge, well, semi-knowledge/-ugly hack actually, that I have found useful while working on this paper.

I like justified text, I think it make the content look sharp. LaTeX seem to agree with me on that point, at least in the style I used (report). Justified text in LaTeX has one drawback however. Sometimes the letter spacing between certain letters become too small, resulting in what I surmise typographers call “broken ligatures”. The term “ligature” seem to simply  refer to a specific part of a letter. A broken ligature, then, would happen when the ligature in a preceding letter “floats into” the next one.

Justified text is sharp, justified text with broken ligatures… not so much. And LaTeX doesn’t seem to be fully able to handle this on its own, so manual intervention seem necessary. (It could of course just be that the version I use (texlive) is silly, but I recall having similar problems back in Uni while I used tetex)

In any case, ugly-hacking tiem!

SEARCH

First priority: find all occurrences of potential broken ligatures.

One could visually (using the ole trusty eyeball mk.1) scan the generated document for imperfections. That takes a lot of time and there is a large risk that some occurrences “slip through”. Also, in some places the ligatures won’t be broken, because the text has a good fit on the row at present time. But then someone adds a word, a sentence, or just fix a grammatical bug, whatever, and the fit is not so good anymore.

Of course, it is wholly unnecessary to run this procedure until the document is “frozen” and won’t accept any more addition to it in terms of text. I ran it three times, one time before each “beta”/”release candidate” which I sent to some friends for critique/proof-reading/sanity checking, and then once more after having incorporated the input from my friends.

To identify potential trouble, grep is called in to find every instance of the character combinations which can break. In my experience, these combinations are “ff”, “fi” and “fl”.

$ grep -rn f[fil] chapters/*.tex

Only lower-case letters seem to cause trouble, but that is an assumption I make. I could well see problems stemming from having an initial lower-case f, followed by an upper-case letter. I have never encountered this, so I don’t search for it, but as usual, ymmw.

Now I have a nifty little list with all occurrences of the letter sequences “ff”, “fi” and “fl”, nice! Now what?

DESTROY

The solution should, preferably, be applied to nearly all instances of these sequences, so that a present “good fit” line, if modified, would just automagically work later on as well. This means that the solution should not screw up the formatting of the “good fit” cases, while kicking into action, iff the good fit turn bad.

The solution I use is “\hbox{}”. This is inserted between the characters (f\hbox{}f, f\hbox{}i, f\hbox{}l) What makes this ugly is of course that your LaTeX code is now littered with this… well umm… shit. This method will of course give your spell checker a nervous breakdown.

Now you are probably thinking that this is a non-issue, just create a small shell-script to use sed, and produce new files with the modified content, copy these files into a build directory and have the make script invoke that shell-script before invoking the build command.

There is a potential pitfall in that solution. My paper linked to a couple of websites, as in clickable hyperlinks inside the pdf. Imagine the fun that would be derived when sed would hit upon \url{http://www.openoffice.org/} and transform that into \url{http://www.openof\hbox{}f\hbox{}ice.org/}.

Making sed aware of the \url{} tag, and verbatim quotes (probably all of the quoting systems), and making it leave the content inside well enough alone is probably doable, but having my favorite text-editor to an interactive search/replace was the method I opted for.

Vim indoctrination

Thursday, April 9th, 2009

Having used Vim (mostly gVim) exclusively for the last year or so my muscle-memory has thoroughly set. Which I was reminded of yesterday when a classmate from Uni asked me if I could lighten his load a bit by quickly adding a piece of functionality to some code he was working on.

“Sure” I thought, I can do this. So I launched Eclipse to carry out a small scale controlled environment kind of test. The task was to, from an existing for-loop with its own functionality, add in the required statements to have the loop also build a comma-separated string of the values retrieved by the loop.

That little adventure made  me discover of two things:

  1. “:w” won’t save the file in Eclipse… it will however insert those very characters into the code, breaking it. The same goes for “V”, “d” and “Y”. Also, “,cc” won’t comment out a line… that readjustment from Vim to Eclipse took way more time than hacking the actual Java
  2. Python has spoiled me

But all in all, it worked out pretty good, and I got to use StringBuilder for the first time ever. The resulting code looked something like this:

StringBuilder sb = new StringBuilder();
for-loop here {
    pre-existing code here...
    if (sb.length() > 0) {
        sb.append(",");
    }
    sb.append(obj.toString());
}
String s = sb.toString();

I’m sure there are better ways to accomplish this, like just tacking on a comma after each append and then on the resulting string working some string-manipulation magic to remove the final trailing comma, but for some inexplicable reason this just “felt” like a better solution.

It might just be that now most of the actual code is grouped together, so in the event of a refactoring, there is less of a risk that the string manipulation code is forgotten.

Anyway, it was almost fun to hack Java again… almost… ;)

Fun with LaTeX

Thursday, March 26th, 2009

So I have finally gotten my shit together and seriously started putting my ideas for the FS/OS course into writing. $DIETY knows cultivating those ideas has taken long enough…

I started out, as I usually do, with my trusty LaTeX template:

\documentclass[english,a4paper,utf8]{report}
\usepackage[utf8]{inputenc}
\usepackage{verbatim}
\usepackage[dvips,bookmarks=false]{hyperref}
\hypersetup{
    colorlinks=true,
    citecolor=black,
    filecolor=black,
    linkcolor=black,
    urlcolor=blue
}
\author{}
\title{}

\begin{document}
    \maketitle
    \tableofcontents
    \input{./00_chapters}
    \bibliographystyle{unsrt}
    \bibliography{./bibtex/ref}
\end{document}

I then proceeded to copy the old build-system which mra rigged for us while doing our bachelor thesis, and all seemed good and well, until I realized that the hyperrefs (i.e. supposedly clickable URLs) weren’t all that clickable. I was baffled. What had gone wrong?

Had I missed to install a required package? Why then had not rubber (which the build-system use)  died with an error? No, packages seemed fine.

Had I found a feature which Adobe Acrobat Reader possessed, but Evince didn’t? Nope, opening the pdf-file in didn’t yield a better result (only a slower result… jeebuz acroread is bloated…)

I knew that I had gotten clickable links to work in LaTeX-generated pdfs before, so what was different? Ah! It might be that I used mra’s old build-script, the one he wrote before learning about rubber. Ok, $ less bin/makedoci.sh told me all I needed to know. The relevant procedure in that file was:

  1. call latex
  2. call bibtex
  3. call latex
  4. call latex
  5. call dvips
  6. call ps2pdf

As it turns out, the new “rubberized” build-system called rubber with flags -d and -f (i.e. produce pdf output, and force compilation). At the same time I was following up another lead, trying to figure out the documentation for the hyperref package in CTAN. I may have spent too little time reading the actual content in there, but when I came over a list about drivers and \special commands, I started seeing some patterns.

rubber -d calls pdftex, and it might have just been easier to switch “dvips” in the hyperref configuration in the template, but then I’d have to check and possibly dig even deeper to find what the actual string to put in the configuration should be.

This was less attractive since I knew that the current template had worked before (using dvips). But that would involve finding out if rubber could pass through DVI, to PS, and then to PDF. Coincidentally, this is just what rubber -p does.

Which sortof create a really cute little circumstance, to create a pdf-file, you call rubber with the flags -p -d and -f.

PDF, rubber -p -d -f, get it? XD

Putting technologies to use in peculiar ways

Wednesday, March 4th, 2009

I just read a daily WTF and I can’t be sure why, possibly because they were generating invoices, an activity which my mind for some reason has been linked to PDFs, I had a flashback to term 5 at ITU, where our project group collected a bunch of data through a web-based questionnaire, and stored in a database.

Then there was the question about retrieving the information and presenting it in our document (a PDF, generated by LaTeX), which, if I remember correct, was done by me by ugly-hacking together a PHP-script which, depending on what script you called from the webserver, either presented you with a csv file, or a LaTeX formatted file. To be completely honest I guess stream would be the better description, which the browser interpreted as a file and rendered.

In any case, I have a little suspicion that this wasn’t one of the intended domains for PHP, but it did the job well nonetheless.

A piece of Python

Tuesday, January 13th, 2009

The other day, a friend of mine from the university IM:ed me, asking for reading material on Python. My first instinct was to refer her to the Python documentation. Then my curiosity grabbed hold of me. It turns out she had an assignment, which boiled down to reading the first two lines in a sequence of text files.

She had crafted a solution which almost worked, but the inner loop (the one outputting lines 1 and 2 from each file) was giving her a head-ache, so she asked me for input.

My first approach was this:

c = 0
for line in file:
    if c >= 2:
        break
    # do stuff with "line"
    c += 1

It was eerily similar to her attempt, but I felt uncomfortable with the code… it just didn’t “feel” right. It felt as if there had to be a less stupid way to go about this problem (not calling her solution stupid, this was her first attempt at coding Python, I on the other hand, should be ashamed).

Thinking about it, one really didn’t need a loop at all for this assignment. In order to read the first two lines in a file, you’d need a handle to the file, and then call the function readline() on that handle, twice.

file.readline()
file.readline()

This is, of course, not the least bit maintainable, for instance, what would the solution be if the assignment was changed to read the 5, 10 or 15 first lines of text? No, that would just be plain ugly. So we bring the loop back in from the cold, to manage the number of lines to extract:

for i in range(1, NUMBER_OF_LINES):
    file.readline()

The assignment didn’t say anything about line numbering, and although we could use “i” for this purpose it seems wasteful to assign a value to it and then ending up never using it, but seeing as we would otherwise have to create a counter anyway, to manually manage when to break out of the loop, I believe this solution to be the cleanest, simplest, most readable, and thus, the most beautiful. At least until someone comes up with something even better.

Writing this post, I am currently pondering about a while-loop approach, something along the lines of:

lines_left = NUMBER_OF_LINES
while lines_left > 0:
    file.readline()
    lines_left -= 1

However, now seeing it in writing, and not just in my mind I realize that it is double the size of the previous solution, and for what? Trying to get away from  a counter variable, which I end up using anyway, only backwards… no, the for-loop won this round.

Django, command_extensions and pygraphviz

Wednesday, November 26th, 2008

Trying to find a way to comply with the last week’s assignment (profiling your software) I today found out that the command_extensions for Django could provide some help (runprofileserver). However, that is not why I am currently writing.

The reason for this post is another command, graph_models, which can be used as such:

wildcard /home/wildcard/voxsite
$ python manage.py graph_models -a -g -o my_project_visualized.png

This however requires a few things to work, namely python-pygraphviz and graphviz-dev (if you’re a Ubuntu user at least). But this is pretty cool, now I have automatically generated class-diagrams of my project.

\o/

Lessons learnt: Python and importing

Friday, November 21st, 2008

This will probably not be something you will do every day, but some day you might need to import a module from an altogether different directory not on the python path. Let’s for instance say that you have a script in your home folder:

~/some_script.py

This script needs to import another module, and, as in my case, you are only given the file system path to the directory in which you can find said module. What to do?

/opt/some_module.py

The solution is rather simple. some_script.py will need to import sys, in order to get a hold of the sys.path variable, to which we can append the path.

import sys
sys.path.append('/opt/')
import some_module

Tah-dah. Once the script has been executed and dies, the sys.path is restored, so no extra fiddling needed. The one gotcha I encountered, which made this problem take way much longer than it should have:

I was wrapping this code up in a function, and that made the import local to that function, and not visible in the rest of the script, so binding functions/variables from the imported module to local ditos is advised, and then moving these around instead.

What kind of ugly beast of a script I needed something as convoluted as this for? A script which tries to verify that a piece of installed software has been installed correctly, and at the correct place with respect to other software, which I from the onset cannot know where it exists (this is for Vox Anonymus, and I simply needed to check that the Django site specific settings file had been correctly updated and could find Vox Anon.

Documentation, best practices?

Friday, November 14th, 2008

I am, as part of the AFST course, working on a free software project, Vox Anonymus. One of the requirements for the software is that it should come complete with documentation on how to install it (not at all unreasonable by any measure).

But I find asking myself how to handle this install information. I needs to be included in the INSTALL file, as well as on the website. At the same time, I feel the urge to not repeat myself. DRY (Don’t Repeat Yourself).

I’ve read up on some techniques, reStructured Text, python-docutils, etc. but I have been unable to find a suitable solution which would convert some simple text format to both (x)html and some reasonable plain text representation for the INSTALL file.

The simplest solution would probably be to use some mark-up language, and a formatting system, and then let the source file be the INSTALL file, from which the html file can be generated. This would leave some “mark-up artifacts” for the prospective users of the application.

Second easist solution: Have the html file be the source file, and generate the INSTALL file by stripping the tags out of the html file. While this would be acceptable, two things bother me:

  • It could potentially take some work in order to make the stripping / reformatting perform properly (with regards to newlines, indentation, etc)
  • Going against YAGNI (You Ain’t Gonna Need It), what if there is a future format I would wish to support?

(I will have to admit though, whilst browsing through “Beginning Python” by Magnus Lie Hetland (Apress) I discovered a chapter (20) outlining a simple system for doing just this, and it sparked my curiosity, so I might have been more than a little “influenced” to reject all other ideas ;))

The third option, then, the path I at least for the moment, have settled on, is to create a miniature mark-up syntax, with the accompanying formatter scripts, to allow for generation of both plain text and html, and extensibility for the future.

The final tipping point is that I can have more automation this way. With the first approach, this would have had to have been a tack-on ugly hack. With the second approach, since a couple of simple sed commands would have done the trick, and I would thus have employed a shell script to reformat the text I would have had to solve it manually, but now with the third option, it can be brought into the cor functionality;

The tarball I generate is given a filename consisting of the name of the project, as well as the version of the project, as can be found in setup.py. Hence, the web-page download link need to be updated every now and then. If I am generating the html, it would make sense to have python also generate an up to date link to the tarball.

Overall, this seem to have the makings of a good solution (all things considered) as well as being a good learning experience. Win-win.

But this was actually not what my post was supposed to be about. In the title, notice the question mark? With it I was not implying that I might be on to the best practice, but rather a question aimed at you, the readers. How would you have done it? Because there is bound to be better ways, with better motivations, than what I have cobbled up. There just has to be, since people have been putting together INSTALL instructions and other application documentation for free software for at least a good… what? thirty – forty? years. And there are bound to be those who doesn’t like repeating yourselves.