Sun, 11 Feb 2007
This is simple with twill
I don't feel like I need to "defend" twill -- it's successful beyond both my expectations and my cognizance (I have no idea who's actually using it, but it's apparently a lot of people!), but I may need to promote it better. I ran across this post earlier today. It shows how you can use mechanize to do some simple "screen scraping," and it spurred me to check the following scripts into a new "advocacy" section of the twill archive.
Here's how you can use a twill script to do what Greg did with mechanize:
add_extra_header User-Agent "Mozilla/5.0 (compatible; MyProgram/0.1)" go http://www.python.org/ show
You can also use straight Python, if that's your poison:
from twill.commands import *
import twill
add_extra_header('User-Agent', 'Mozilla/5.0 (compatible; MyProgram/0.1)')
go("http://python.org/")
html = twill.get_browser().get_html()
print html
Note that twill is based on mechanize, and so promoting twill doesn't mean pushing mechanize down. mechanize is amazingly powerful -- but sometimes you want to just go grab some HTML, and twill (tries to) make that easy.
One unexpected result of all this -- I discovered that the function get_browser() wasn't actually exported from twill.commands by default, which is silly. It's the first function I wanted to call in order to do something minimally complex. So now the API is one call bigger ;). One more corner rubbed off...
--titus
posted at: 23:01 | path: /feb-07 | 2 comments
Comments:
Posted by Diane at Tue Feb 13 14:30:34 2007:
$ easy_install twill
...
$ ipython
In: from twill.commands import *
In: import twill
In: go("http://google.com")
In: html = twill.get_browser().get_html()
In: html
Out: ''
In: twill._version_
Out: '0.8.5'
Somehow I think there's supposed to be some data in that html variable?
Also the version of twill obtained via easy_install doesn't have 'add_extra_header'.
Finally why is there twill.commands.get_browser() and twill.get_browser()
Posted by Titus Brown at Wed Feb 14 00:34:33 2007:
Now that's just mean, pointing out that I haven't made a release in a while ;)
twill.commands.get_browser implements twill.get_browser.
The other functions are all available in the latest version of twill.
--titus