I am trying to use urlparse Python library to parse some custom URIs.
I noticed that for some well-known schemes params are parsed correctly:
>>> from urllib.parse import urlparse
>>> urlparse("http://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='http', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')
>>> urlparse("ftp://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='ftp', netloc='some.domain', path='/some/nested/endpoint', params='param1=value1;param2=othervalue2', query='query1=val1&query2=val2', fragment='fragment')
...but for custom ones - they are not. params field remains empty. Instead, params are treated as a part of path:
>>> urlparse("scheme://some.domain/some/nested/endpoint;param1=value1;param2=othervalue2?query1=val1&query2=val2#fragment")
ParseResult(scheme='scheme', netloc='some.domain', path='/some/nested/endpoint;param1=value1;param2=othervalue2', params='', query='query1=val1&query2=val2', fragment='fragment')
Why there is a difference in parsing depending on schema? How can I parse params within urlparse library using custom schema?
This is because
urlparseassumes that only a set of schemes will uses parameters in their URL format. You can see that check with in the source code.Which means
urlparsewill attempt to parse parameters only if the scheme is inuses_params(which is a list of known schemes).So to get the expected output you can append your custom scheme into
uses_paramslist and perform theurlparsecall again.