Posts Tagged ‘ISO8859’

I do believe I have been bitten by Python

Saturday, May 30th, 2009

Lately I have found myself writing short texts in Swedish, destined to end up at a friends computer. A Windows-using friend, with all the UTF-8 / ISO-8859-1 hassles this entails. For the first file, I simply copied it onto a memory stick and rebooted into the Windows partition, and search/replaced all the offending characters (å, ä, ö and the odd é). Then rebooted again (since I don’t have my emails set up in Windows) and fired off the mail.

I simply figured that this file would be kindof a one-shot deal and nothing more. About two weeks later, I wrote a second file, and re-did the entire reboot-procedure. I found myself writing a third file yesterday… I can’t for the life of me remember the saying, or where I read it, but it was something along the lines of if you do the same thing more than twice, automate the shit out of it.

An audience with the great oracle lead me to this blog post and after trying it out manually (which required me to reboot one more time just to verify that the converted file had in fact been converted) I was all set to write a little shell script. I came so far as to write the first lines of error handling in the script (make sure that the script had recieved a filename) before I realized that I really didn’t want to write a shell script. Not when I could piece together a Python script in half that time, which would have better error checking. And yes, that time estimate included researching how to have Python execute a system call. ( is what I settled on, as per advise from StackOverflow. It took me a minute or so of reading the manual to figure out how to redirect the output from that command (the full text, in ISO-8859-1 encoding) to a new file (getting a file pointer to the new file, and redirecting stdout from the to that file pointer)

Something along these lines:

fp = open('myfile.iso.txt', 'w')
args = ['iconv', '--from-code=UTF-8', '--to-code=ISO-8859-1', 'myfile.txt'], stdout=fp)

No more silly rebooting to convert plaintext files for me :D