How to handle privacy information popup? I am using Python and pyppeteer library for webscrapping

52 views Asked by At

I have a working scrapper, but I have trouble closing the pop up. And the pop up only comes in certain cases, so I need to handle it popup

I have tried finding a button attribute and click "Accept All"

the bold portion in the code is what I have tried

import asyncio
from pyppeteer import launch
import time
from datetime import datetime, timedelta
import pandas as pd


async def filter_by_url(url):

    browser = await launch(
        {
            "headless": False,
            'args':['--start-maximized'],
            # 'executablePath':'/usr/bin/google-chrome'
        }
    )
    # url = "https://www.justwatch.com/us/provider/netflix?sort_by=trending_7_day"
    page = await browser.newPage()
    await page.setViewport({'width': 1920, 'height': 1080})
    await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3')
    await page.goto(url)   
    ## Scroll To Bottom
    **#time.sleep(5)

    #await page.waitFor('footer span[data-icon="Accept all"]')
    #await page.click('button:has-text("Accept all")');
    #await Page.locator('uc-accept-all-button').first().click();**
    while True:
        
        await page.evaluate("""{window.scrollBy(0, document.body.scrollHeight);}""")
        time.sleep(2)
        end_point = await page.querySelector(".timeline__end-of-timeline")
        if end_point:
            print("reached to end points")
            break      

    

# Run the function
urls = [
    'https://www.justwatch.com/ca/provider/netflix?sort_by=trending_7_day'
]
for url in urls:
    asyncio.get_event_loop().run_until_complete(filter_by_url(url))

1

There are 1 answers

1
Yaroslavm On BEST ANSWER

Your button is placed inside shadow-root, to get internal shadow root structure, you should get it's host first and then get shadowRoot property.

Shadow host has selector #usercentrics-root. You should wait for host content to be loaded and then click internal button. If content has not been rendered yet - repeat with timeout.

After that good practice to wait for host to be hidden.

More about Shadow DOM

  await page.evaluate("""function acceptConsent() {
       let accept = document.querySelector('#usercentrics-root').shadowRoot.querySelector('[data-testid=uc-accept-all-button]');
       if(accept) {
         accept.click();
         return;
       }
       setTimeout(acceptConsent, 500);
       }
    """)
   await page.waitForSelector('#usercentrics-root', options={'visible': False})