I'm crawling a website via requests, but despite response.status_code returns 200, there's no content in response.text or response.content.
Another site with the code works well, in local Jupyter environment it works well too, but some reason I couldn't get past the firewall url below in 'Colab'.
Could you give some advice for me?
problem url: https://gall.dcinside.com/board/view/?id=piano&no=1&exception_mode=notice&page=1
import requests
from bs4 import BeautifulSoup as bs
url = 'https://gall.dcinside.com/board/view/?id=piano&no=1&exception_mode=notice&page=1'
headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Whale/3.25.232.19 Safari/537.36'}
response = requests.get(url, headers=headers, data={'buscar':100000})
soup = bs(response.content, "html.parser")
soup
<br/>
<br/>
<center>
<h2>
The request / response that are contrary to the Web firewall security policies have been blocked.
</h2>
<table>
<tr>
<td>Detect time</td>
<td>2024-03-12 21:52:05</td>
</tr>
<tr>
<td>Detect client IP</td>
<td>35.236.245.49</td>
</tr>
<tr>
<td>Detect URL</td>
<td>https://gall.dcinside.com/board/view/</td>
</tr>
</table>
</center>
<br/>
I tried to change user-agent, https to http, and the other advice of similar questions, everything doesn't work.
If you're facing issues with making HTTP requests using the requests module in Google Colab, there could be a few reasons for this behavior
1. Firewall or Network Restrictions: Sometimes, network or firewall restrictions might prevent the notebook from accessing external resources. If you are behind a proxy or firewall, you may need to configure the proxy settings in your notebook.
Use the following snippet to set proxy settings in your notebook:
2. Blocked Sites: If the website you are trying to access is blocked in the Colab environment, you won't be able to make requests to it.
Also, please add all the possible headers to avoid the blocking. Here is the revised version of code