Webscraping multiple sources for similar products; standardization

28 views Asked by At

I'd like to gather some knowledge about webscraping. I am currently building a hobby project where I'd create something like "pricerunner" which is a price comparison website. My approach would be to scrape similar products from different sources.

For simplicity lets say that I want to compare iPhone prices

Webshop X has the following product: Title: iPhone 14 Pro 128GB Red Price: $1299,99

Webshop Y has the following product: Title: Green 128GB iPhone 14 Pro Price: $1249,99

Webshop Z has the following product: Title: Blue iPhone 14 Pro 128GB Price: $1199,99

Upon scraping this data I'd need some way to standardize that the above three products into one product

{
  "title": "iPhone 14 Pro"
  "storage": 128
  "vendors": 
[
"webshop-x": {"link": "webshop-x.com/iphone", "price": 1299}, 
"webshop-y": {"link": "webshop-y.com/iphone", "price": 1249},
"webshop-z": {"link": "webshop-z.com/iphone", "price": 1199}
]
} 

Or something along the lines of the above object. I hope it makes sense.

In short the objective would be to gather data from different sources for products that is similar. Standardize this product, so when a user searches for iPhone 14 Pro on my site, all three would be returned.

I reckon that with this exact product, it would be quite easy to just return every product that contains the word "iPhone 14 Pro 128" in no particular order; but more complex products would exist.

What is your take on this? Am I missing something?

Have a nice day!

Best regards

0

There are 0 answers