urlparse doesn't return params for custom schema

Question

urlparse doesn't return params for custom schema

359 views Asked by Konrad Sikorski At 18 October 2022 at 12:10

I am trying to use urlparse Python library to parse some custom URIs.

I noticed that for some well-known schemes params are parsed correctly:

>>> from urllib.parse import urlparse
>>> urlparse("http://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='http', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')
>>> urlparse("ftp://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='ftp', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')

...but for custom ones - they are not. params field remains empty. Instead, params are treated as a part of path:

>>> urlparse("scheme://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='scheme', netloc='some.domain', path='/some/nested/endpoint;param1=value1;param2=othervalue2', params='', query='query1=val1&query2=val2', fragment='fragment')

Why there is a difference in parsing depending on schema? How can I parse params within urlparse library using custom schema?

Original Q&A

There are 2 answers

Uriel Alves On 18 October 2022 at 12:29

Can you remove that custom schemes from the url? That allways will return the params

urlparse("//some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')

**user459872** · Accepted Answer · 2022-10-18T12:34:55+00:00

This is because urlparse assumes that only a set of schemes will uses parameters in their URL format. You can see that check with in the source code.

if scheme in uses_params and ';' in url:
        url, params = _splitparams(url)
    else:
        params = ''

Which means urlparse will attempt to parse parameters only if the scheme is in uses_params (which is a list of known schemes).

uses_params = ['', 'ftp', 'hdl', 'prospero', 'http', 'imap',
               'https', 'shttp', 'rtsp', 'rtspu', 'sip', 'sips',
               'mms', 'sftp', 'tel']

So to get the expected output you can append your custom scheme into uses_params list and perform the urlparse call again.

>>> from urllib.parse import uses_params, urlparse
>>>
>>> uses_params.append('scheme')
>>> urlparse("scheme://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='scheme', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')

TechQA.

urlparse doesn't return params for custom schema

There are 2 answers

Related Questions in PYTHON

Related Questions in URL

Related Questions in URI

Related Questions in URLPARSE

Popular Questions

Trending Questions