Why scrapy image pipeline is not downloading images?

Question

Why scrapy image pipeline is not downloading images?

156 views Asked by Raisul Islam At 06 September 2022 at 18:31

I am trying to download all the images from the product gallery. I have tried the mentioned script but somehow I am not able to download the images. I could manage to download the main image which contains an id. The other images from the gallery do not contain any id and I failed to download them.

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule


class BasicSpider(CrawlSpider):
    name = 'basic'
    allowed_domains = ['www.leebmann24.de']
    start_urls = ['https://www.leebmann24.de/bmw.html']

    rules = (
        Rule(LinkExtractor(restrict_xpaths="//div[@class='category-products']/ul/li/h2/a"), callback='parse_item'),
        Rule(LinkExtractor(restrict_xpaths="//li[@class='next']/a"), callback='parse_item', follow=True),
    )

    def parse_item(self, response):

        yield {
            'URL': response.url,
            'Price': response.xpath("normalize-space(//span[@class='price']/text())").get(),
            'image_urls': response.xpath("//div[@class='item']/a/img/@src").getall()
        }

Original Q&A

There are 2 answers

gangabass On 06 September 2022 at 22:26

This expression will get all product images except main (you said that you already have it):

'//div[@id="itemslider-zoom"]//a/@href'

**Md. Fazlul Hoque** · Accepted Answer · 2022-09-06T20:10:57+00:00

@Raisul Islam, '//*[@id="image-main"]/@src' is generating the image url and I'm not getting any issues. Please, see the output whether that's your expacted or not.

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule


class BasicSpider(CrawlSpider):
    name = 'basic'
    allowed_domains = ['www.leebmann24.de']
    start_urls = ['https://www.leebmann24.de/bmw.html']

    rules = (
        Rule(LinkExtractor(restrict_xpaths="//div[@class='category-products']/ul/li/h2/a"), callback='parse_item'),
        Rule(LinkExtractor(restrict_xpaths="//li[@class='next']/a"), callback='parse_item', follow=True),
    )

    def parse_item(self, response):

        yield {
            'URL': response.url,
            'Price': response.xpath("normalize-space(//span[@class='price']/text())").get(),
            'image_urls': response.xpath('//*[@id="image-main"]/@src').get()
        }

Output:

{'URL': 'https://www.leebmann24.de/aruma-antirutschmatte-3er-f30-f31.html', 'Price': '57,29\xa0€', 'image_urls': 'https://www.leebmann24.de/media/catalog/product/cache/1/image/363x/040ec09b1e35df139433887a97daa66f/a/r/aruma-antirutschmatte-94452302924-1.jpg'}
2022-09-07 02:35:56 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.leebmann24.de/bmw-erste-hilfe-set-klarsichtbeutel-51477158344.html> (referer: https://www.leebmann24.de/bmw.html?p=2)
2022-09-07 02:35:56 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.leebmann24.de/bmw-erste-hilfe-set-klarsichtbeutel-51477158344.html>
{'URL': 'https://www.leebmann24.de/bmw-erste-hilfe-set-klarsichtbeutel-51477158344.html', 'Price': '15,64\xa0€', 'image_urls': 'https://www.leebmann24.de/media/catalog/product/cache/1/image/363x/040ec09b1e35df139433887a97daa66f/b/m/bmw-erste-hilfe-klarsichtbeutel-51477158433.jpg'}
2022-09-07 02:35:56 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.leebmann24.de/erste-hilfe-set.html> (failed 1 times): 503 Service Unavailable
2022-09-07 02:35:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.leebmann24.de/aruma-antirutschmatte-x5-f15.html> (referer: https://www.leebmann24.de/bmw.html)
2022-09-07 02:35:57 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.leebmann24.de/aruma-antirutschmatte-x5-f15.html>
{'URL': 'https://www.leebmann24.de/aruma-antirutschmatte-x5-f15.html', 'Price': '71,66\xa0€', 'image_urls': 'https://www.leebmann24.de/media/catalog/product/cache/1/image/363x/040ec09b1e35df139433887a97daa66f/a/r/aruma-antirutschmatte-94452347734-1.jpg'}

TechQA.

Why scrapy image pipeline is not downloading images?

There are 2 answers

Related Questions in PYTHON-3.X

Related Questions in WEB-SCRAPING

Related Questions in SCRAPY

Related Questions in SCRAPY-PIPELINE

Popular Questions

Trending Questions