Building URLs in Python

Janne Kemppainen |

Building URLs is really common in applications and APIs because most of the applications tend to be pretty interconnected. But how should we do it in Python? Here's my take on the subject.

Different codebases might have different requirements such as:

  • no unnecessary dependencies
  • clean code
  • quick and dirty
  • etc..

Let's see how the different options compare.

The standard way

Python has a built in library that is specifically made for parsing URLs, called urllib.parse.

You can use the urllib.parse.urlsplit function to break a URL string to a five-item named tuple. The items are parsed

scheme://netloc/path?query#fragment

The opposite of breaking an URL to parts is to build it using the urllib.parse.urlunsplit function.

If you check the library documentation you'll notice that there is also a urlparse function. The difference between it and the urlsplit function is an additional item in the parse result for path parameters.

https://www.example.com/some/path;parameter=12?q=query

Path parameters are separated with a semicolon from the path and located before the query arguments that start with a question mark. Most of the time you don't need them but it is good to know that they exist.

So how would you then build an URL with urllib.parse?

Let's assume that you want to call some API and need a function for building the API URL. The required URL could be for example:

https://example.com/api/v1/book/12?format=mp3&token=abbadabba

Here is how we could build the URL:

import os
from urllib.parse import urlunsplit, urlencode

SCHEME = os.environ.get("API_SCHEME", "https")
NETLOC = os.environ.get("API_NETLOC", "example.com")

def build_api_url(book_id, format, token):
    path = f"/api/v1/book/{book_id}"
    query = urlencode(dict(format=format, token=token))
    return urlunsplit((SCHEME, NETLOC, path, query, ""))

Calling the function works as expected:

>>> build_api_url(12, "mp3", "abbadabba")
'https://example.com/api/v1/book/12?format=mp3&token=abbadabba'

I used environment variables for the scheme and netloc because typically your program is calling a specific API endpoint that you might want to configure via the environment.

I also introduced the urlencode function which transforms a dictionary to a series of key=value pairs separated with & characters. This can be handy if you have lots of query arguments as a dictionary of values can be easier to manipulate.

The urllib.parse library also contains urljoin which is similar to os.path.join. It can be used to build URLs by combining a base URL with a path. Let's modify the example code a bit.

import os
from urllib.parse import urljoin, urlencode

BASE_URL = os.environ.get("BASE_URL", "https://example.com/")

def build_api_url(book_id, format, token):
    path = f"/api/v1/book/{book_id}"
    query = "?" + urlencode(dict(format=format, token=token))
    return urljoin(BASE_URL, path + query)

This time the whole base URL comes from the environment. The path and query are combined with the base URL using the urljoin function. Notice that this time the question mark at the beginning of the query needs to be set manually.

The manual way

Libraries can be nice but sometimes you just want to get things done without thinking that much. Here's a straight forward way to build a URL manually.

import os

BASE_URL = os.environ.get(BASE_URL, "https://example.com").rstrip("/")

def build_api_url(book_id, format, token):
    return f"{BASE_URL}/api/v1/book/{book_id}?format={format}&token={token}"

The f-strings in Python make this quite clean, especially with URLs that always have the same structure and not that many parameters. The BASE_URL initialization strips the tailing forward slash from the environment variable. This way the user doesn't have to remember if it should be included or not.

Note that I haven't added any validations for the input parameters in these examples so you may need take that into consideration.

The Furl way

Then there is a library called furl which aims to make URL parsing and manipulation easy. It can be installed with pip:

>> python3 -m pip install furl

Let's see it in action.

import os
from furl import furl

BASE_URL = os.environ.get("BASE_URL", "https://example.com")

def build_api_url(book_id, format, token):
    f = furl(BASE_URL)
    f /= f"/api/v1/book/{book_id}"
    f.args["format"] = format
    f.args["token"] = token
    return f.url

There are a bit more lines here when compared to the previous example. First we need to initialize a furl object from the base url. The path can be appended using the /= operator which is custom defined by the library.

The query arguments can be set with the args property dictionary. Finally, the final URL can be built by accessing the url property.

Here's an alternative implementation using the set() method to change the path and query arguments of an existing URL.

def build_api_url(book_id, format, token):
    return (
        furl(BASE_URL)
        .set(path=f"/api/v1/book/{book_id}", args={"format": format, "token": token},)
        .url
    )

In addition to building URLs Furl lets you modify existing URLs and parse parts of them. You can find many more examples from the API documentation.

Conclusion

These are just some examples on how to create URLs. Which one do you prefer?

Subscribe to my newsletter

What's new with PäksTech? Subscribe to receive occasional emails where I will sum up stuff that has happened at the blog and what may be coming next.

powered by TinyLetter | Privacy Policy