I want to get a raw data using password from certain locked pastebin link with python. I can't figure out what to do.
Is it impossible to get pastebin raw data using python's requests module and post method? I tried it as below code but it returns error.
url = "https://pastebin.com/URL"
pass_data = {'PostPasswordVerificationForm[password]': 'password'}
res = requests.post(url, headers=headers, data = pass_data)
text = res.text
print(text)
It returns below error:
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='pastebin.com', port=443):
Max retries exceeded with url: /URL (Caused by SSLError(SSLCertVerificationError
(1, '[SSL: CERTIFICATE_VERIFY_FAILED]certificate verify failed:
self signed certificate in certificate chain (_ssl.c:1123)')))
Can someone please tell me which one I can use?
Note: Consider using Pastebin's API and Pastebin's scraping API.
Your certificate verification failed (proxy/tor/vpn/web without cert/misconfigured web?). If you still want to proceed, simply use
verify=Falseas an argument for therequests.post():If you are using a VPN, perhaps you've been provided with a root certificate for your machine and you can apply it with
cert=("path to cert", "path to key").If you are using Tor, better skip that circuit and re-create a new one.
For proxy, it's complicated and can be either cert issue or just being plainly misconfigured/broken.
You can verify there's no proxy used by checking your network sessings (OS specific) and environment variables
requestspackage works with:http_proxyHTTP_PROXYhttps_proxyHTTPS_PROXYcurl_ca_bundleEdit: I've just re-checked Pastebin, the RAW text option is only available for the unprotected pastes. However, you can get to the HTML version by inspecting the traffic, then re-assembling it with code simply by keeping the session, checking cookies and headers in the network tab. You should get something like this:
Afterwards just check for the tag with
RAWin it and then parse it either by some quick regex (obligatory "it's a stupid idea" post) or use a less error-prone solution such as BeautifulSoup.Nevertheless, captchas, IP blacklisting, "clever" CSRF handling and similar stuff will eventually prevent you from such scraping and if not it's just too easy to assemble an application that will dynamically change its class names, tag names, etc in Angular just to mess with your scraping for the lulz (Google Docs love this stuff, personal experience), so if you intend to do something serious with it, just use the API.
Edit2: Minor FAQ for scraping / why to use the API