Thu, 29 Nov 2007

Naive Lazyweb Question: Programmatic Form Editing


Dear lazyweb,

I would like to be able to write Python code like the following:

page_obj = parse(html_code)

form_obj = page_obj.forms[0]

form_obj.my_name = 'bob'
form_obj.my_select.choose('option2')

new_html_code = page_obj.emit_html()

That is, we have an existing HTML form, and I would like to be able to parse it, programmatically update the form settings, and then emit it again as HTML with the same style attributes, etc., but with updated form data.

(I'm horrendously out of date with HTML libraries in Python, so I thought I'd just ask... ;)

thanks,

--titus

posted at: 14:58 | path: /nov-07 | 7 comments

Tags:


Comments:

Posted by Jacob Kaplan-Moss at Thu Nov 29 15:51:38 2007:
I haven't looked too deeply into it, but it looks like htmlfill (http://formencode.org/htmlfill.html) will do what you want.

Posted by A Different Jacob at Thu Nov 29 20:00:55 2007:
htmlfill addresses this specific problem very directly, but I believe it uses the standard library's HTMLParser module, which is not up to the task of dealing with much real-world html.  I suggest lxml.etree.HTML, optionally in conjunction with lxml.objectify.

Posted by Ian Bicking at Thu Nov 29 21:23:03 2007:
Yes, htmlfill does exactly that.  I also implemented that in lxml.html: http://codespeak.net/svn/lxml/trunk/doc/lxmlhtml.txt (unfortunately the generated docs don't seem to be up to date, and so don't include information on forms).  With that you can do page_obj.forms[0].fields = request_dict

Posted by Max Ischenko at Fri Nov 30 00:41:26 2007:
I think Beautiful Soup does this, see http://www.crummy.com/software/BeautifulSoup/documentation.html.

Posted by Martijn Faassen at Fri Nov 30 08:56:13 2007:
Mechanize can do this. zope.testbrowser builds on top of this to integrate it into doctests for testing purposes.

http://pypi.python.org/pypi/mechanize

http://pypi.python.org/pypi/zope.testbrowser

Posted by Titus Brown at Fri Nov 30 11:12:47 2007:
Martijn, I'm pretty sure mechanize can't do this; ClientForm doesn't re-emit the HTML AFAIK.

BeautifulSoup, hmm, for some reason didn't think it could either.  Will have to check.

htmlfill looks perfect.

thanks!

--titus

Posted by Dan at Fri Nov 30 12:35:52 2007:
Beautiful Soup will certainly do it.e.g:

from BeautifulSoup import BeautifulSoup
soup=BeautifulSoup(originalhtml)
soup.find('input',attrs={'name':'somename')['value']="New Value"
newhtml=soup.renderContents()

Post a new comment:

Name:


E-mail:


URL:


Comment:


Note that comments must be manually approved; e-mail titus@idyll.org if your comment doesn't show up quickly.