Posts Tagged ‘Python’

pyflakes-vim

Saturday, July 18th, 2009

It all started with this notice on identi.ca:

@jamesw thanks! pyflakes-vim is a welcome addition to my !vim !python setup

Being the curious person I am, I sought and found what it is and does. It adds wavy red underlines in Vim to Python code which is determined incorrect. I tried it out just for a second, seeing if it could keep all it promised, and it indeed seem like it. It identifies unnecessary (unused) imports, it identified when I tried to append stuff to a variable I hadn’t declared as a list yet. All in all pretty nice.

I would have to say that I didn’t care much for the packaging of the vim-scripts, when I extracted them they spread over my ~/src directory instead of confining themselves to a pyflakes-vim/ directory inside the ~/src directory, but it was only three items, a readme, a script and a directory. Easily moved. Installation from that point was even simpler.

$ mkdir -p ~/ .vim/ftplugins/python
$ cp -R pyflakes* ~/.vim/ftplugins/python

And ensure that your ~/.vimrc has the following settings:

filetype on
filetype plugin on

With that done, every time you leave insert mode, pyflakes will scan the code and tell you about any* errors in the code

*small notice here: with “any errors” I mean any errors which pyflakes can reasonably find without actually executing the code.

I do believe I have been bitten by Python

Saturday, May 30th, 2009

Lately I have found myself writing short texts in Swedish, destined to end up at a friends computer. A Windows-using friend, with all the UTF-8 / ISO-8859-1 hassles this entails. For the first file, I simply copied it onto a memory stick and rebooted into the Windows partition, and search/replaced all the offending characters (å, ä, ö and the odd é). Then rebooted again (since I don’t have my emails set up in Windows) and fired off the mail.

I simply figured that this file would be kindof a one-shot deal and nothing more. About two weeks later, I wrote a second file, and re-did the entire reboot-procedure. I found myself writing a third file yesterday… I can’t for the life of me remember the saying, or where I read it, but it was something along the lines of if you do the same thing more than twice, automate the shit out of it.

An audience with the great oracle lead me to this blog post and after trying it out manually (which required me to reboot one more time just to verify that the converted file had in fact been converted) I was all set to write a little shell script. I came so far as to write the first lines of error handling in the script (make sure that the script had recieved a filename) before I realized that I really didn’t want to write a shell script. Not when I could piece together a Python script in half that time, which would have better error checking. And yes, that time estimate included researching how to have Python execute a system call. (subprocess.call() is what I settled on, as per advise from StackOverflow. It took me a minute or so of reading the manual to figure out how to redirect the output from that command (the full text, in ISO-8859-1 encoding) to a new file (getting a file pointer to the new file, and redirecting stdout from the subprocess.call() to that file pointer)

Something along these lines:

fp = open('myfile.iso.txt', 'w')
args = ['iconv', '--from-code=UTF-8', '--to-code=ISO-8859-1', 'myfile.txt']
subprocess.call(args, stdout=fp)
fp.close()

No more silly rebooting to convert plaintext files for me :D

My software stack

Friday, May 29th, 2009

A week or so ago I stumbled across this blog, which went almost instantly into my RSS feed, due not only to the name of a post which cracks me up (yes, I know my humor is off ;P) but also to the posts I found really interesting.

And then I came along this post which got me thinking about what software I ended up using towards the end of my bachelors. Or the software I have learned of since, but wish I’d known about earlier. I began to write a comment to her post, but realized that it would be too long, so I write here instead. All credit to Hazel though, since without her post I wouldn’t have been inspired to write this one.

My list, as compared to Hazels, will not be as well-rounded, it won’t necessarily fit every student the way her list do. Also, the software I list will only be guaranteed to work in GNU/Linux, as that is what I used in the final semesters, and have continued to use since.

First of all, a text editor. It doesn’t really matter which, just evaluate a bunch until you find one you feel comfortable with. Once you have found “the one” become intimate with it. Become a frakking Jedi-master at wielding it. I’m still a padawan-level user of Vim, but I’m getting there.

I say the same about web browsers, mail clients and instant messaging clients. Find a good one, learn as much as you can about it, and use it effectively. Firefox, Thunderbird and Pidgin are my preferred tools.

A bug-tracker, although often web based creating a need for a web server, can often provide more “good stuff” than just tracking bugs. Stuff like statistics, or, if you think outside the box you’d be able to track things other than bugs, which I guess it was issue-trackers does. Some of these also include a wiki-system, which makes establishing a project-specific knowledge-base kindof easy. In the one university project where we used such a system (and where I realized its potential) we used Trac.

A blogging-system with an RSS-feed capable of being filtered on tags or categories could be used to distribute status updates to other members of a group. That I’m using WordPress should be fairly obvious to all.

Use a version control system wherever and whenever possible. With the next two suggestions on the list, “wherever” will be a lot more commonplace than one might first believe, even for non-programmers. At the university we had access to SVN-servers, and also tried Mercurial, a distributed vcs. Mercurial stuck with me ever since.

From generic suggestions, let’s go specific.

I could encourage you to check out markup languages such as reStructuredText or Markdown, to find one which suits you best and to run with it. And since I’ve now written the terms you’d need to Google, you could do that, but I’ll simply recommend LaTeX. The reason for markup languages in general, and LaTeX specifically is that you can then store your information in one plaintext format (which makes it easy to manage in version control) and can then transform it to a slew of other formats as needed.

Most of the time we needed to hand in PDFs. LaTeX excels in that and manages all the typesetting stuff and (obvious) formatting. Which leaves you with more time to focus on the content. One could also either extend LaTeX with Beamer, to create presentations, or simply generate a PDF and run Impress!ve.

For diagrams, graphs and flowcharts or representations of state-machines, Graphviz would be my recommended way to go. Again using plaintext to control the content, again with the benefits of version control. Inkscape saves files in the SVG format (again, plaintext) which might be usable (especially since it can also save files as both PS and PDF)

If you need graphical representations of statistical data or other plots, matplotlib could be the way to go.

I personally don’t like managing things, or management-related stuff, but lately I have been haunted by the feeling that if I used management tools, even if I would only be managing myself and my pet projects, I could be more organized and efficient. So I have started looking at TaskJuggler. It is similar to Microsoft Project, with the largest difference being that… you guessed it, you code the project plan ;D. Plaintext yet again. And then you compile the plan and TaskJuggler attempts to verify that no resources have been double-booked.

Considering each piece in this list on their own, it might seem like a waste of time to exchange one software with another. I do find each of these softwares impressive in and on their own, but it is when they are put together, when all their strengths are combined, that you tend to get the most out of it.

The all plaintext approach I have tried, both in groupwork at the university, and later on my own, work rather well. That so many of the softwares on the list can be used to communicate and transfer information between parties is also intentional as without communication the chance of a successful project outcome diminish rapidly.

The last (bonus?) item on the list would be to recommend learning, at least superficially, a programming language which you could hack together small scripts with. Something which you could use to “glue” together the other parts. I adore Python, and many of the softwares listed above have python-bindings ready to use. Perl, Ruby and others, which elude me right now, would undoubtedly work equally well or better, but as with the text editor, pick a language you feel comfortable with, and rock on.

Thoughts? Questions?

Update: Fixed broken link

Vim indoctrination

Thursday, April 9th, 2009

Having used Vim (mostly gVim) exclusively for the last year or so my muscle-memory has thoroughly set. Which I was reminded of yesterday when a classmate from Uni asked me if I could lighten his load a bit by quickly adding a piece of functionality to some code he was working on.

“Sure” I thought, I can do this. So I launched Eclipse to carry out a small scale controlled environment kind of test. The task was to, from an existing for-loop with its own functionality, add in the required statements to have the loop also build a comma-separated string of the values retrieved by the loop.

That little adventure made  me discover of two things:

  1. “:w” won’t save the file in Eclipse… it will however insert those very characters into the code, breaking it. The same goes for “V”, “d” and “Y”. Also, “,cc” won’t comment out a line… that readjustment from Vim to Eclipse took way more time than hacking the actual Java
  2. Python has spoiled me

But all in all, it worked out pretty good, and I got to use StringBuilder for the first time ever. The resulting code looked something like this:

StringBuilder sb = new StringBuilder();
for-loop here {
    pre-existing code here...
    if (sb.length() > 0) {
        sb.append(",");
    }
    sb.append(obj.toString());
}
String s = sb.toString();

I’m sure there are better ways to accomplish this, like just tacking on a comma after each append and then on the resulting string working some string-manipulation magic to remove the final trailing comma, but for some inexplicable reason this just “felt” like a better solution.

It might just be that now most of the actual code is grouped together, so in the event of a refactoring, there is less of a risk that the string manipulation code is forgotten.

Anyway, it was almost fun to hack Java again… almost… ;)

Ideas

Wednesday, March 25th, 2009

This post I suspect will either have me written off as a complete idiot, or a genius. Some time ago my father had a problem at work. He needed to reformat an .xls-file. He was doing this manually, and sure, if you are masochistic enough (I’m not) you could probably have either written a VBScript or created a macro (same thing?) to do in a couple of minutes, what took him the idle-time of a couple of days.

I don’t know really how I got to thinking about it but I started wondering if there was a way to extract the data stored in the spreadsheet through SQL syntax. That made perfect sense to me. Database table, spreadsheet, same difference, right? Not so, apparantly.

Either my google-fu is failing me, or I am the only stupid bastard on the face of the planet to have ever thought this and found the idea to be any good. No matter what I search for I simply cannot get Google to fetch me a result of relevance.

The closest I get is a guy with a tutorial on how to use an .xls-file as a database to drive an ASP-powered website. I am guessing this would be a fitting alternative, but what I am searching for is either the built-in, or plugged-in functionality to have an input-field in the spreadsheet application, into which one would enter standard SQL syntax, and have it return a new spreadsheet with the filtered/updated table.

Why would you want this? Well frankly, as a programmer I find myself better versed in SQL than whatever filtering-functionality the developers of spreadsheet-app-XYZ have implemented.

A quick peek in the repository (apt-cache search python | grep spreadsheet) yields two results on my Gutsy install, one of which seem to be interesting (python-excelerator). Not that it matters anymore. He did eventually complete the manual labor, but I feel kindof bad for not being able to help him out :/

Pidgin buddy pounces

Tuesday, March 17th, 2009

I’m in an ambivalent state about Pidgin, I really have no love for the “expanding input field” “feature” the developers introduced a couple of versions back. Frankly, I think it sucks and I am not alone (search for “input” on that page), which is why, when I update to a newer version, download the source, modify it to always show, and ONLY show, four lines of text. No expanding, manual or automatic. Hard coded. I find that to be better than the alternative. My opinion is simply that the developers could have handled that issue much better.

But, Pidgin is not all suck. Far from it actually. Take the buddy pounce system for instance. That is a work of pure genius. There are so many options available! Your imagination is the limiting factor here.

Pidgin buddy pounce window

Pidgin buddy pounce window (click for full size)

Other than the devilishly annoying game you can play with other people by having  Pidgin send them a message as they start typing to you (preempting them), or return a message when you status is away and they have just sent a message.  But all this fades in comparison once you look at what options I have checked in the screenshot. When a user comes online, execute a command. I tested this, my Python script ran perfectly.

So what then could you do with this? You could turn Pidgin into a computer remote control (although I would probably advise against it) by doing something along these lines:

  1. Register a new account, and set it up to always be online on the computer to be remotely controlled
  2. Add your normal user to the contact list of the new user (disallow anyone else, for security reasons)
  3. Set up a buddy pounce for this new account, so that anytime a message is sent to it a script is executed, which reads the last line in the latest log-file for that account
  4. Do a little parsing perhaps, and then execute whatever command was sent in the message

I don’t think that I would ever use it for that, I have SSH for such things. But another thing one might do, is to create a second account, add it to the contact-list of the primary account, and have the primary account pounce when the new account sign on. And the executed command could be something as simple as opening a port in the firewall on the home computer. There would of course need to be a pounce for when the user sign off as well, to close the hole.

One could probably do something more advanced as well since, through the application, you can access databases, and thus can synchronize data over several users (what I’m thinking is that executing the same pounce for several users will mean that a command (inside the script being run for each of the users) can be executed when all of the users are online, offline or whatever, but otherwise do nothing. As I said, my/your imagination is the limiting factor here.

I’m not sure how “robust” (or fragile) the command execution is, if you have to return a value to Pidgin at end of execution, but I had my Python script terminate with a sys.exit(0) just to be on the safe side.

Midnight hacking

Thursday, February 12th, 2009

Last Saturday… sorry, early Sunday, way past any reasonable bedtime, I was twisting and turning, finding it impossible to fall asleep. Reading a magazine didn’t work, in fact it might just have had the opposite effect. It got my brain working, and all of the sudden an idea entered my mind.

I can’t understand it myself, so don’t bother asking, there will be no coherent or reasonable answer, but I got the idea to pull my Pidgin log-files, all 2900 of them, dating back from 2008.01.01, and have a program go through all of them, cataloging and counting the outgoing words.

Maybe it was some urge to code, maybe my subconscious has a plan for the code which it has yet to reveal to me, I couldn’t tell you, but the more I thought about it, the more the idea appealed to me. Within half an hour I knew roughly how I wanted to do it.

The premise was: 2900 HTML-formatted log-files describing interactions between me and one or more external parties. Pidgin stores each sent message on a separate line, so except for some meta-data about when the conversation took place, located at the top of the file, there was one line per message.

I wanted the code to be, I hesitate to call the resulting code “modular”, but “dynamic” might be better. So no hard coded values about what alias to look for. This worked out fine, as I soon realized I needed a file name for the SQLite database which would store the data.

The script is called with two parameters, an alias and a path to the directory in which the logs can be found. This is also where I cheated. I should have made the script recursively walk into any sub directory in that path, looking for HTML-files, but I opted instead to move all the files from their separate sub directories into one large directory. Nautilus gives me an angry stare every time I even hint at wanting to open that directory, but handling sub directories will come in a later revision.

So, given an alias (the unique identifier which every line that shall be consumed should have) and a path, list all HTML files found in that path. Once this list has been compiled, begin working through it, opening one file at a time, and for each line in that file, determine it the line should be consumed, or discarded.

Since the line contains HTML-formatting, as well as the alias and a timestamp, this would be prudent to scrape away, regular expressions to the rescue. Notice the trailing “s”, simple code is better than complex code, and a couple of fairly readable regular expressions is better than one monster of an expression. So away goes HTML-formatting, smileys, timestamps and the alias. What should now, theoretically, be left, is a string of words.

So that string is split up into words and fed into the SQLite database.

I was happy, this was my first attempt at working with SQLite, and thus my first attempt at getting Python to work with SQLite. It worked like a charm. Three separate queries where used, one, trying to select the word being stored. If the select returned a result, the returned value was incremented by one, and updated. Of no result was returned, a simple insert was called.

This is of course the naive and sub-optimal way to do it, but right then I was just so thrilled about coding something that I didn’t want to risk leaving “the zone”. Needless to say, doing two queries per word, means hitting the hard drive two times per word, EVERY word, for EVERY matching line, for EVERY one of the 2900 files. Yeah, I think there is room for some improvement here.

But I have to admit, I am impressed, in roughly four hours, give or take half an hour, I managed to put together a program which worksed surprisingly well. The one regret I have right now is that I didn’t write anything in the way of tests. No unit tests, no performance tests, no nothing. Of course, had I done that, I probably would have gotten bored half way through, and fallen asleep. And tests can be written after the fact.

Well, I wrote one simple progress measurement. The loop going through the files is called through the function enumerate, so I got hold of an index indicating what file was being processed, and for each file having been closed (processed and done) I printed the message “File %d done!”. From this I was able to clock the script at finishing roughly 20 files a minute (the measurements was taken at ten minute intervals) but this is rather inprecise as no file equals another in line or word length.

It was truly inspiring to realize how much can be done, in so little time. The next steps, besides the obvious room for improvement and optimization, is to use this little project as a real-life excercise to test how much I have learned by reading Martin Fowler’s Refactoring – Improving the Design of Exising Code.

Adding the ability to walk down into sub directories should of course be added, but the most interesting thing at the moment is going to be finding a way to detect typoes. The regexp rule for how to detect and split up words is a little… “stupid” at the moment.

Initially (after having slept through the day following that session) I thought about the typoes, and how to detect them, and how one might be able to use something like levenshtein, but again, this would entail IO heavy operations, and also start impacting the processor. There is probably some Python binding for Aspell one could use, I will have to look into that.

So, finally, why? What’s the reason? The motivation?

Well, somewhere in the back of my mind I remember having read an article somewhere which discussed the ability to use writing to identify people. So if I publish enough text on this blog, and then, on another, more anonymous blog, I publish something else, the words I use, or the frequency with which I use them, should give me away. In order to prove or disprove that hypothesis a signature would need to be identified, in the form of a database containing words and their frequency (in such a case it might even be beneficial to NOT attempt to correct spelling errors as they might indeed also be a “tell”) and then write a program which attempts to determine whether or not it is probable that this text was written by me.

While talking about the idea with a friend, she asked me about privacy concerns (I can only assume that she didn’t feel entirely satisfied with the thought of me sitting on a database with her words and their frequencies) and that is a valid concern. Besides the ethical ramifications of deriving data from and about people I call my friends, there is a potential flaw in trying to generate signatures for my friends from the partial data feed I am privy to. I base this potential flaw on the fact that I know that my relationship, experience and history with my various friends make for a diversified use of my vocabulary. In short, what language I use is determined by the person I speak with.

And now I have spewn out too many words again… *doh*

Infinite lists

Saturday, January 17th, 2009

So, I woke up at 3 a.m. this morning (which, horribly enough, is an improvement over yesterday when I woke up at 1 a.m…) and just couldn’t go back to sleep. So I sat down in front of the computer, and started reading some blogs. Slowly my mind drifted towards Project Euler, and when I found this post about infinite loops in Haskell (not really sure how applicable it may be on Project Euler) it triggered my “must code” urge.

The idea of making a list cyclic, instead of appending itself upon itself over and over is of course, when you think if it, obvious so no wonder I haven’t thought about it in that way before ;)

I have been reading Beginning Python – From Novice to Professional by Magnus Lie Hetland (Apress) in which chapter 9 has a subsection “The Basic Sequence and Mapping Protocol” that gives a little insight into how to make your own code behave like the built-in constructs (such as list, which was what I was looking for). That chapter coupled with this post supplied me with the knowledge needed to implement a quick and dirty version of a cyclic, and thus infinite, list.

A piece of Python

Tuesday, January 13th, 2009

The other day, a friend of mine from the university IM:ed me, asking for reading material on Python. My first instinct was to refer her to the Python documentation. Then my curiosity grabbed hold of me. It turns out she had an assignment, which boiled down to reading the first two lines in a sequence of text files.

She had crafted a solution which almost worked, but the inner loop (the one outputting lines 1 and 2 from each file) was giving her a head-ache, so she asked me for input.

My first approach was this:

c = 0
for line in file:
    if c >= 2:
        break
    # do stuff with "line"
    c += 1

It was eerily similar to her attempt, but I felt uncomfortable with the code… it just didn’t “feel” right. It felt as if there had to be a less stupid way to go about this problem (not calling her solution stupid, this was her first attempt at coding Python, I on the other hand, should be ashamed).

Thinking about it, one really didn’t need a loop at all for this assignment. In order to read the first two lines in a file, you’d need a handle to the file, and then call the function readline() on that handle, twice.

file.readline()
file.readline()

This is, of course, not the least bit maintainable, for instance, what would the solution be if the assignment was changed to read the 5, 10 or 15 first lines of text? No, that would just be plain ugly. So we bring the loop back in from the cold, to manage the number of lines to extract:

for i in range(1, NUMBER_OF_LINES):
    file.readline()

The assignment didn’t say anything about line numbering, and although we could use “i” for this purpose it seems wasteful to assign a value to it and then ending up never using it, but seeing as we would otherwise have to create a counter anyway, to manually manage when to break out of the loop, I believe this solution to be the cleanest, simplest, most readable, and thus, the most beautiful. At least until someone comes up with something even better.

Writing this post, I am currently pondering about a while-loop approach, something along the lines of:

lines_left = NUMBER_OF_LINES
while lines_left > 0:
    file.readline()
    lines_left -= 1

However, now seeing it in writing, and not just in my mind I realize that it is double the size of the previous solution, and for what? Trying to get away from  a counter variable, which I end up using anyway, only backwards… no, the for-loop won this round.

Django, command_extensions and pygraphviz

Wednesday, November 26th, 2008

Trying to find a way to comply with the last week’s assignment (profiling your software) I today found out that the command_extensions for Django could provide some help (runprofileserver). However, that is not why I am currently writing.

The reason for this post is another command, graph_models, which can be used as such:

wildcard /home/wildcard/voxsite
$ python manage.py graph_models -a -g -o my_project_visualized.png

This however requires a few things to work, namely python-pygraphviz and graphviz-dev (if you’re a Ubuntu user at least). But this is pretty cool, now I have automatically generated class-diagrams of my project.

\o/