I am trying to code a scraper with Scrapy for Python. At this point, I am trying to get the name of the webpage and all the outbound links within the page. The output should be a dictionary like this
        {'link': [u'Link1'], 'title': [u'Page title']}
I have created this code:
from scrapy.spider import Spider
from scrapy import Selector
from socialmedia.items import SocialMediaItem
class MySpider(Spider):
    name = 'smm'
    allowed_domains = ['*']
    start_urls = ['http://en.wikipedia.org/wiki/Social_media']
    def parse(self, response):
        items =[]
        for link in response.xpath("//a"):
            item = SocialMediaItem()
            item['title'] = link.xpath('text()').extract()
            item['link'] = link.xpath('@href').extract()
            items.append(item)
            yield items
Could anyone help me to get this result? I've adapted the code from this page http://mherman.org/blog/2012/11/05/scraping-web-pages-with-scrapy/
updating the code without the deprecated functions. Thank you so much!
Dani
                        
If I understand correctly, you want to iterate all of the links and extract links and titles.
Get all
atags via//axpath and extracttext()and@href:This yields:
Also, note that there are
Link Extractorsbuilt-in into Scrapy: