Posts Tagged ‘BeautifulSoup’

2011w31

Sunday, August 7th, 2011

(python) mechanize and BeautifulSoup

I’m slowly making preparations to replace WordPress with, as it currently stands, fugitive.

But I have a couple of posts in WordPress, 150—counting this one—to be more precise… which I don’t particularly feel like just abandoning, well… at least not all of them ;)

This means that they need to be preserved, and that’s where python’s two excellent modules mechanize and BeautifulSoup come into play.

Yes, I know relying on Beautiful Soup is discouraged nowadays, but as long as it works, I’ll be using it.

There is this great post about the basics of mechanize which, and since I only wanted to scrape a couple of posts, that was all I needed.

I needed Beautiful Soup to soup.find('link', {'rel': 'next'}) (i.e. get the URL for the “next” post) and then there was the little problem of retrieving the href-attribute from the found link-element.

StackOverflow to the rescue!

rsync

I found myself needing to synchronize a folder between two systems. I.e. new files added in the “source” system needed to be added to the “destination” which is a simple
$ rsync -av /path/to/source/directory/ user@remote:/path/to/destination/directory/
(please note the trailing slashes in BOTH paths, these in the source path, which tell rsync to copy the files INSIDE the directory, not including the directory itself).

However, for the first time I also needed that all removed files in source should be removed at the destination as well. I found this blogpost which gave me the information I needed.

This is… not trickier, but… not something you’d want to frakk up.

So, the flag which will delete on the removed files is simply --delete i.e.:

$ rsync -av --delete /path/to/source/directory/ user@remote:/path/to/destination/directory/

BEFORE you do this you REALLY should check that the operation will perform correctly by also attaching the flag --dry-run (which will simulate the real deal, without doing any changes on the remote end). Very nice.

$ rsync --dry-run -av --delete /path/to/source/directory/ user@remote:/path/to/destination/directory/

Musings

@shiny posted a notice about “open surface” being the perfect term for software that should be avoided.

@webmink, in what feels like true hacker fashion, cleverly played around with the words of the term and came up with “superficially open”.

Links

The Longest joke in the world

:wq