Crawl Quora Q&As using BeautifulSoup

Question

Crawl Quora Q&As using BeautifulSoup

388 views Asked by z3y50n At 04 August 2020 at 16:04

My code for crawling a Quora question is the following:

import requests
from bs4 import BeautifulSoup
import pandas as pd

URL = "https://www.quora.com/What-is-the-best-workout-1"

page = requests.get(URL)

soup = BeautifulSoup(page.text, "html.parser")

print(soup.find_all("span", {"class": "q-box qu-userSelect--text"}))

The outcome is an empty list.

The problem is that page.text doesn't contain the same source code like the one I get when I inspect element on Quora.

Instead it contains the following text which doesn't include any <span> elements

Here is the code I get when using Inspect Element

Original Q&A

There are 1 answers

**UWTD TV** · Answer 1 · 2020-08-04T18:25:21+00:00

Try:

from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path='c:/program/geckodriver.exe')

URL = "https://www.quora.com/What-is-the-best-workout-1"
driver.get(URL)



PAUSE_TIME = 2


lh = driver.execute_script("return document.body.scrollHeight")

while True:

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(PAUSE_TIME)
    nh = driver.execute_script("return document.body.scrollHeight")
    if nh == lh:
        break
    lh = nh
spans = driver.find_elements_by_css_selector('span.q-box.qu-userSelect--text')
for span in spans:
    print(span.text)
    print('-' * 80)

prints:

What is the best workout?
--------------------------------------------------------------------------------
The best workout is the one you don't skip.
Look, you can discuss sets and reps, crossfit and powerlifting, diet and supplements endlessly. And there is some value in it, if only just for entertainment sometimes (especially on the internet). But let's just get one thing straight here - if you are doing any kind of workout then it's going to have a greater impact than if you weren't. Simple as that.
Of course there are caveats. You don't want to get hurt, so they can pretty much all be summed up into one commandment: Thou shalt not be an idiot. Getting under a bar loaded with 495 lbs and squattin
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
What are some at-home workouts?
--------------------------------------------------------------------------------
Gyms are closed here because of the Coronavirus. What are your top 3 bodyweight exercises for building muscle?
--------------------------------------------------------------------------------
What is the best body weight workout routine?
--------------------------------------------------------------------------------

and so on...

I not sure that it is q-box qu-userSelect--text you actually want. But it was what you asked for so..

Note selenium: You need selenium and geckodriver and in this code geckodriver is set to be imported from c:/program/geckodriver.exe

TechQA.

Crawl Quora Q&As using BeautifulSoup

There are 1 answers

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Related Questions in QUORA

Popular Questions

Trending Questions