How to parse the respond url without actually open the webpage in python?

325 views Asked by At

I am now woking on a sina weibo crawler using its api. In order to use api, I have to access oauth2 authorizing page to retrive the code from url.

This is exactly how I do:

  1. Use my app_key and app_secret (both known)

  2. get the url of oauth2 webpage

  3. copy and paste the code from Respond URL manually.

This is my code:

#call official SDK
client = APIClient(app_key=APP_KEY, app_secret=APP_SECRET, redirect_uri=CALLBACK_URL)

#get url of callback page of authorization
url = client.get_authorize_url()
print url

#open webpage in browser
webbrowser.open_new(url)

#after the webpage responding, parse the code part in the url manually
print 'parse the string after 'code=' in url:'
code = raw_input()

My Question is exactly how to get rid of the manually parsing part?

Reference: http://blog.csdn.net/liuxuejiang158blog/article/details/30042493

1

There are 1 answers

3
SRC On

To get the contents of a page using requests, you can do like this

import requests

url = "http://example.com"

r = requests.get(url)

print r.text

You can see details of the requests library here. You can use pip to install it into your virtualenv / python dist.

For writing crawler, you can also use scrapy.

And finally, I did not understand one thing, if you have a official client then why do you need to parse the contents of an URL to get data. Doesn't the client give you data using some nice and easy to use functions?