Posts

Showing posts from August, 2014

Python Mechanize Cheat Sheet

Image
Mechanize Create a browser object Create a browser object and give it some optional settings. import mechanize br = mechanize.Browser() br.set_all_readonly(False)    # allow everything to be written to br.set_handle_robots(False)   # ignore robots br.set_handle_refresh(False)  # can sometimes hang without this br.addheaders =         # [('User-agent', 'Firefox')] Open a webpage Open a webpage and inspect its contents response = br.open(url) print response.read()      # the text of the page response1 = br.response()  # get the response again print response1.read()     # can apply lxml.html.fromstring() Using forms List the forms that are in the page for form in br.forms():     print "Form name:", form.name     print form To go on the mechanize browser object must have a form selected br.select_form("form1")         # works when form has a name br.form = list(br.forms())[0]  # use when form is unnamed Using Contro

Python mechanize For Browsing

Image
How to install Mechanize we can install mechanize in two ways  Using pip : pip install mechanize Or download the mechanize distribution,open it and run it: python setup.py install  Browsing with Mechanize  Here is an example on how to browse webpage in python program   import mechanize   br = mechanize.Browser()   br.open("http://www.example.com/") Follow second link with element text matching regular expression response1 = br.follow_link(text_regex=r"cheese\s*shop",nr=1) assert br.viewing_html()   print br.title()   print response1.geturl()   print response1.info() # headers   print response1.read() # body To get the response code from a website, you can the response.code   from mechanize import Browser   browser = Browser()   response = browser.open('http://www.google.com')   print response.code Get all forms from a website   import mechanize   br = mechanize.Browser() br.open("http://www.google.com/")