I'm trying to get twitter profile name using profile url with beautifulsoup in python, but whatever html tags I use, I'm not able to get the name. What html tags can I use to get the profile name from twitter user page ?
url = 'https://twitter.com/twitterID'
html = requests.get(url).text
soup = BeautifulSoup(html, 'html.parser')
# Find the display name
name_element = soup.find('span', {'class':'css-901oao css-16my406 r-poiln3 r-bcqeeo r-qvutc0'})
if name_element != None:
display_name = name_element.text
else:
display_name = "error"
Twitter profile links cannot be scraped simply through
requestslike this since the contents of the profile pages are loaded with JavaScript [via the API], as you might notice if you previewed the source HTML on you browser's network logs or checked the fetched HTML.Even after fetching the right HTML, calling
.findlike that will result indisplay_namecontaining'To view keyboard shortcuts, press question mark'or'Don’t miss what’s happening'because there are 67spantags with that class. Calling.find_all(....)[6]might work but it's definitely not a reliable approach. You should instead consider using.selectwith CSS selectors to target the name.The
.findequivalent would bebut I find
.selectmuch more convenient.Selenium Example
Using two functions I often use for scraping -
linkToSoup_selenium(which takes a URL and returns aBeautifulSoupobject after using selenium andbs4to load and parse the HTML), andselectForList(which extracts details from bs4 Tags based on selectors [like in theselectorsdictionary below])Setup:
Setting
returnErr=Truereturns the error message (a string instead of the BeautifulSoup object) if anything goes wrong.ecxshould be set based on which part/s you want to load (it's a list so it can have multiple selectors).tmoutdoesn't have to be passed (default is 25sec), but if it is, it should be adjusted for the other arguments and your own device and browser speeds - on my browser,tmo=0.01is enough to load user details, but loading the first tweets takes at leasttmo=2.I wrote
scrapeTwitterProfilemostly so that I could gettuDets[below] in one line. The for-loop after that is just for printing the results.snscrape Example
snscrape has a module for Twitter that can be used to access Twitter data without having registered up for the API yourself. The example below prints a similar output to the previous example, but is faster.
You can get most of the attributes in
.entitywith.__dict__or print them all all with something likeSee this example from this tutorial if you are interested in scraping tweets as well.