API POST Request works with Postman but not with Scrapy

33 views Asked by At

I am making a POST request to an API, and it works fine over Postman but not in Scrapy. This is the 400 Status error Scrapy gives me:

2024-03-19 03:02:08 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <400 https://somesite.com/api/search> Set-Cookie: sUniqueID=20240319100208-50.106.12.146-dj56d4dide; expires=Sun, 19-Mar-2034 10:02:08 GMT; path=/; secure; HttpOnly

2024-03-19 03:02:08 [scrapy.core.engine] DEBUG: Crawled (400) <POST https://somesite.com/api/search> (referer: None) 2024-03-19 03:02:08 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <400 https://somesite.com/api/search>: HTTP status code is not handled or not allowed 2024-03-19 03:02:08 [scrapy.core.engine] INFO: Closing spider (finished) 2024-03-19 03:02:08 [scrapy.statscollectors]

This is my Scrapy code:

from scrapy.loader import ItemLoader
from scrapy.http import FormRequest
import scrapy
from some_scraper.items import SomeItem
from scrapy_playwright.page import PageMethod

class SomeScraperSpider(scrapy.Spider):
      name = 'someScraper'
    
      def start_requests(self):
          url = ('https://somesite.com/api/search')
    
          headers={
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64)',
                'Content-Type': 'application/json',
                'Host': 'somesite.com',
                'Accept': '*/*',
                'Accept-Encoding': 'gzip, deflate, br',
                'Connection': 'keep-alive'
            }
    
          frmdata={"token": "eg7t4q6p6pdv59m1cn22e58vmphiv2",
                   "cols": ["colmX", "colmY"],
                   "max": "80"
                   }
    
          yield scrapy.FormRequest(url=url,
                                   callback=self.parse_categories,
                                   headers=headers,
                                   formdata=frmdata,
                                   meta={'playwright': True})

settings.py:

BOT_NAME = "some_scraper"
USER_AGENT= 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'

SPIDER_MODULES = ["some_scraper.spiders"]
NEWSPIDER_MODULE = "some_scraper.spiders"


REQUEST_FINGERPRINTER_IMPLEMENTATION = "2.7"
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
FEED_EXPORT_ENCODING = "utf-8"

ROBOTSTXT_OBEY = False
DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler"
}
TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'

main.py:

from some_scraper.spiders.src import SomeScraperSpider
from scrapy.crawler import CrawlerProcess

def main():
    process = CrawlerProcess(settings={
        'COOKIES_DEBUG': 'True',
        'COOKIES_ENABLED': 'True',
        'PLAYWRIGHT_LAUNCH_OPTIONS':{
            'headless':'False',
            'timeout':2000*1000
        },
        'PLAYWRIGHT_BROWSER_TYPE':'firefox',
        'PLAYWRIGHT_SETTINGS':{
            'acceptsCookies': 'True',
        }
    })
    process.crawl(SomeScraperSpider)
    process.start()

if __name__ == '__main__':
    main()

In Postman I make this POST request:

POST https://somesite.com/api/search

In the Postman body, I put in:

{"token": "eg7t4q6p6pdv59m1cn22e58vmphiv2",
 "cols": ["colmX", "colmY"],
 "max": "80"}

The Headers in Postman show:

Authorization: Bearer eg7t4q6p6pdv59m1cn22e58vmphiv2
Cookie: __RequestVerificationToken=UzaS9tX-IM3KTLlNTsQw9nklSnxplY4ehkAKnIjZw5aJ2BjEZ8oGB7bi6IKQOjdxf-izYKUq2_-g-JxK1_QjuvhOFDuBELzSwlyfol_UcBg1; fb_SessionId=66cuv1enskencmluva229q5ohnpj43; sUniqueID=20240208013723-50.106.12.146-m5mf01hk0k
Postman-Token: <calculated when request is sent>
Content-Type: application/json
Content-Length: <calculated when request is sent>
Host: <calculated when request is sent>
User-Agent: PostmanRuntime/7.36.3
Accept: */*
Accept-Encoding: gzip, deflate, br
Connection: keep-alive

In both Postman and Scrapy the Token is added to the Body before being sent, not the Headers. But even when I add the the Token and Cookie to the Scrapy header I receive the same result.

0

There are 0 answers