Validate urls with anchor hashbangs in python

291 views Asked by At

I'd like to check if a url with section anchors using the # operator (example: http://url_link.html#section) is valid in Python.

I'm using urlopen to check as below:

from urllib.request import urlopen

def is_valid_url(url):
    try:
        r = urlopen(url)
        return r.status == 200
    catch Exception e:
        return False

But calling is_valid_url with invalid section anchors doesn't return False

is_valid_url("http://valid_url_link.html#valid_section") # True
is_valid_url("http://valid_url_link.html#invalid_section") # Also True!

Is there a way to detect that http://valid_url_link.html#invalid_section is not a valid url in Python?

1

There are 1 answers

4
user459872 On

You can use urlparse function to check if the url has fragment attribute equal to 'invalid_section'.

>>> import urllib.parse
>>>
>>> result = urllib.parse.urlparse("http://valid_url_link.html#invalid_section")
>>> result.fragment == "invalid_section"
True
>>> result = urllib.parse.urlparse("http://valid_url_link.html")
>>> result.fragment == "invalid_section"
False

So you can modify your function to something like this.

from urllib.request import urlopen
from urllib.parse import urlparse


def is_valid_url(url: str) -> bool:
    try:
        return (urlparse(url).fragment != "invalid_section") and (
            urlopen(url).status == 200
        )
    except Exception:
        return False