This is simple with twill

I don't feel like I need to "defend" twill -- it's successful beyond both my expectations and my cognizance (I have no idea who's actually using it, but it's apparently a lot of people!), but I may need to promote it better. I ran across this post earlier today. It shows how you can use mechanize to do some simple "screen scraping," and it spurred me to check the following scripts into a new "advocacy" section of the twill archive.

Here's how you can use a twill script to do what Greg did with mechanize:

add_extra_header User-Agent "Mozilla/5.0 (compatible; MyProgram/0.1)"
go http://www.python.org/
show

You can also use straight Python, if that's your poison:

from twill.commands import *
import twill

add_extra_header('User-Agent', 'Mozilla/5.0 (compatible; MyProgram/0.1)')
go("http://python.org/")
html = twill.get_browser().get_html()
print html

Note that twill is based on mechanize, and so promoting twill doesn't mean pushing mechanize down. mechanize is amazingly powerful -- but sometimes you want to just go grab some HTML, and twill (tries to) make that easy.

One unexpected result of all this -- I discovered that the function get_browser() wasn't actually exported from twill.commands by default, which is silly. It's the first function I wanted to call in order to do something minimally complex. So now the API is one call bigger ;). One more corner rubbed off...

--titus


Legacy Comments

Posted by Diane on 2007-02-13 at 17:30.

$ easy_install twill  ...  $ ipython  In: from twill.commands import *
In: import twill  In: go("http://google.com")  In: html =
twill.get_browser().get_html()  In: html  Out: ''  In:
twill.<em>_version_</em>  Out: '0.8.5'    Somehow I think there's
supposed to be some data in that html variable?    Also the version of
twill obtained via easy_install doesn't have 'add_extra_header'.
Finally why is there twill.commands.get_browser() and
twill.get_browser()

Posted by Titus Brown on 2007-02-14 at 03:34.

Now that's just mean, pointing out that I haven't made a release in a
while ;)    twill.commands.get_browser implements twill.get_browser.
The other functions are all available in the latest version of twill.
--titus

Comments !

(Please check out the comments policy before commenting.)