I'm new to scrapy, I've been able to create a few spiders so far. I would like to write a spider that will crawl Yellowpages, looking for websites that have a 404 response, the spider is working OK, however, the pagination is not working. Any help will be much appreciated. thanks in advance
# -*- coding: utf-8 -*-
import scrapy
class SpiderSpider(scrapy.Spider):
name = 'spider'
#allowed_domains = ['www.yellowpages.com']
start_urls = ['https://www.yellowpages.com/search?search_terms=handyman&geo_location_terms=Miami%2C+FL']
def parse(self, response):
for listing in response.css('div.search-results.organic div.srp-listing'):
url = listing.css('a.track-visit-website::attr(href)').extract_first()
yield scrapy.Request(url=url, callback=self.parse_details)
# follow pagination links
next_page_url = response.css('a.next.ajax-page::attr(href)').extract_first()
next_page_url = response.urljoin(next_page_url)
if next_page_url:
yield scrapy.Request(url=next_page_url, callback=self.parse)
def parse_details(self,response):
yield{'Response': response,}
I ran your code and found out that there are some errors. In the first loop, you don't check the value of
urland sometimes it isNone. This error stops the execution, that's why you thought the pagination didn't work.Here is a working code: