typing.Annotated, the new Python cool

  • Python
  • Backend development
·Thu Jan 25 2024

If, like me, you're a huge fan of what Pydantic and FastAPI folks are doing, you can't have missed the new Annotated construct, which was introduced in Python 3.9. In a few months, we've seen lot of code examples changed from this:

from uuid import uuid4

from pydantic import BaseModel, Field


class User(BaseModel):
    id: str = Field(default_factory=lambda: uuid4().hex)

to this:

from uuid import uuid4

from typing_extensions import Annotated

from pydantic import BaseModel, Field


class User(BaseModel):
    id: Annotated[str, Field(default_factory=lambda: uuid4().hex)]

This is taken directly from Pydantic documentation.

At first glance, it doesn't seem really useful or practical. But trust me, Annotated can grant super-powers. Pydantic and friends quickly understood that and they're already rely on it a lot. By the end of this post, you'll probably understand why 😉

👋 If you don't really know about type hints in Python, be sure to check my previous post before!

Motivation 🥱

So, as you know, type hints gives us direct feedback about the validity of our types while coding. It also helps IDE give us smart auto-completions, which is very handy to list methods and their arguments. So, if we consider again the Pydantic's User class:

class User(BaseModel):
    id: str = Field(default_factory=lambda: uuid4().hex)

u = User()
print(u.id)

mypy and friends know that u.id exists and that it's a string.

Great! But as you noticed, type annotations are not always enough on their own. It's frequent that we need runtime information to finely tune the behavior of the library we're using. Here, we used the Field class to tell Pydantic that, by default, we want it to generate an UUID4 for this id.

Can you see the problem it causes? We tell the type checker id is a string, but we actually assign it to a Field object 😱 We're such scoundrels!

That's because we use this attribute for two different things actually: tell the type it'll have on an instance and tell how we want the class to behave for this field. To make the type checker happy, Pydantic sets the Field return type to Any.

That's actually a very common pattern in Python libraries dealing with data modeling. For example, this is how SQLAlchemy ORM solves this problem:

from sqlalchemy.orm import Mapped, mapped_column

class User(Base):
    __tablename__ = "users"

    id: Mapped[str] = mapped_column(String, primary_key=True)

They define a special type construct, Mapped which expects a generic type and plays well with the mapped_column return type. It's more correct semantically, but also a bit more involved for the user. Besides, SQLAlchemy team implemented a complete mypy plugin to handle this properly.

Another example of this problem is the dependency injection mechanism of FastAPI:

from fastapi import Depends, FastAPI

app = FastAPI()


async def common_parameters(q: str | None = None, skip: int = 0, limit: int = 100):
    return {"q": q, "skip": skip, "limit": limit}


@app.get("/items/")
async def read_items(commons: dict = Depends(common_parameters)):
    return commons

This is taken directly from FastAPI documentation.

The function common_parameters is responsible for retrieving pagination parameters in the URL parameters, and return them as a dictionary. To use it in our API endpoint, we use the Depends construct of FastAPI, so it knows it'll have to call this function when we receive a request.

Again, you see here the inconsistency between commons being a dict, while we're assigning it to a Depends object.

Annotated to the rescue 🛟

Annotated was purposefully implemented to solve those problems: keep consistent and smart type checking while being able to add contextual information for the runtime.

Basically, it works like this:

from typing import Annotated, get_type_hints

class User:
    id: Annotated[str, "This is a metadata"]

It's a generic type: the first argument should be the type you expect for your argument (here, an int). Then, you can have any number of arguments of whatever nature; here, the string "This is a metadata".

Type checkers, like mypy, will only consider the first argument: the type. The rest can be used (or not) by the actual Python program's logic.

How? Well, like any other type annotations, it can be retrieved using the get_type_hints function:

type_hints = get_type_hints(func, include_extras=True)
print(type_hints)  # {'id': typing.Annotated[str, 'This is a metadata']}
print(type_hints["id"].__metadata__)  # ('This is a metadata',)

Note that we need to set the include_extras argument to True to get metadata associated with Annotated. Then, once we have an Annotated object, we can retrieve this metadata as a tuple through the __metadata__ attribute.

Now, things can go really wild!

Working example ⚙️

Let's see how we can implement a class to define a data model with default values for attributes, using Annotated. The result we want to achieve will look like this:

class User(DefaultModel):
    id: Annotated[str, Default(factory=lambda: uuid4().hex)]
    name: Annotated[str, Default(value="Anonymous")]

We'll be able to create data models and declare attributes with their types and default values if they're not provided by the user.

Since Annotated metadata can be anything, a clever and convenient way is to define specific classes that'll allow us to attach some data and behavior. Let's first define a Default class:

class Default:
    """
    Class defining a behavior for default value,
    either with a static value or a function factory.
    """

    def __init__(
        self, *, value: Any | None = None, factory: Callable[[], Any] | None = None
    ) -> None:
        self.value = value
        self.factory = factory

    def get_default(self) -> Any:
        if self.value is not None:
            return self.value

        if self.factory is not None:
            return self.factory()

        raise RuntimeError()

It's a simple class that accepts either a static value or a factory function, allowing us to generate default values dynamically, like UUID or timestamps.

Now, we'll define a base class for our models, DefaultModel, that will define the logic to handle those annotations and fill the defaults.

class DefaultModel:
    """
    Class able to parse type annotations using the `Default` class
    and fill the attributes accordingly if not provided.
    """

    def __init__(self, **kwargs) -> None:
        type_hints = get_type_hints(self, include_extras=True)
        # Iterate through each type-hinted attributes
        for attribute, type_hint in type_hints.items():
            try:
                # The value has been provided by the user
                value = kwargs[attribute]
            except KeyError:
                # Otherwise, look for `Annotated` type hint
                if hasattr(type_hint, "__metadata__"):
                    # Look for the `Default` class in metadata
                    for metadata in type_hint.__metadata__:
                        if isinstance(metadata, Default):
                            # Generate the default value
                            value = metadata.get_default()
            finally:
                # Assign the value!
                setattr(self, attribute, value)

The logic is quite straightforward: we iterate through each type-hinted attributes and check if the user has provided a value for it in kwargs. If not, we'll explore the Annotated type hint and look for Default classes. It's important to check the nature of the type hint and metadata, because you can't be sure how the user will actually annotate the final class.

Aaaand... That's about it 🙃 We now have reusable logic to assign default values to class attributes. We can check it works:

class User(DefaultModel):
    id: Annotated[str, Default(factory=lambda: uuid4().hex)]
    name: Annotated[str, Default(value="Anonymous")]


u = User(id="1", name="Arthur")
print(u.id)  # 1
print(u.name)  # Arthur

u_default = User()
print(u_default.id)  # 2d10cd0e9a574ecf817ee02e20a0cf45
print(u_default.name)  # Anonymous

Neat, isn't it?

Admittedly, unless you're building Python libraries which requires such nice DX, you'll probably rarely have to implement interpretation logic for Annotated. But it's more or less how it works under the hood in Pydantic and FastAPI!

Usage in the wild 🌍

As we said, this pattern has already made its way into major Python libraries.

In Pydantic

If we take again our Pydantic example, the Field construct can now be added as a metadata in Annotated:

class User(BaseModel):
    id: Annotated[str, Field(default_factory=lambda: uuid4().hex)]

Another advantage of this technique is that we can assign it to a type and reuse it; which was not possible before:

HexUUID = Annotated[str, Field(default_factory=lambda: uuid4().hex)]

class User(BaseModel):
    id: HexUUID

class Post(BaseModel):
    id: HexUUID

The type checker still knows it should type it as a string, but from our point-of-view, it's like using a class type!

And it can go even further: Pydantic allows us to define completely custom types with various constraints and behavior:

TruncatedFloat = Annotated[
    float,
    AfterValidator(lambda x: round(x, 1)),
    PlainSerializer(lambda x: f'{x:.1e}', return_type=str),
    WithJsonSchema({'type': 'string'}, mode='serialization'),
]

This is taken directly from Pydantic documentation.

In FastAPI

Looking again at the common_parameters dependency example, Annotated will turn it into this:

@app.get("/items/")
async def read_items(commons: Annotated[dict, Depends(common_parameters)]):
    return commons

Depends is now used as an Annotated metadata. Of course, the strength of this approach is that we can assign this to a type and reuse it. Much shorter than before!

CommonParameters = Annotated[dict, Depends(common_parameters)]

@app.get("/items/")
async def read_items(commons: CommonParameters):
    return commons

@app.get("/orders/")
async def read_orders(commons: CommonParameters):
    return commons

In SQLAlchemy

Did you ever repeated a column definition; like a datetime column or a primary key, with all its constraints and stuff, in SQLAlchemy? I'm sure you did! No more with Annotated:

intpk = Annotated[int, mapped_column(primary_key=True)]
timestamp = Annotated[
    datetime.datetime,
    mapped_column(nullable=False, server_default=func.CURRENT_TIMESTAMP()),
]
required_name = Annotated[str, mapped_column(String(30), nullable=False)]
class Base(DeclarativeBase):
    pass


class SomeClass(Base):
    __tablename__ = "some_table"

    id: Mapped[intpk]
    name: Mapped[required_name]
    created_at: Mapped[timestamp]

This is taken directly from SQLAlchemy documentation.

Awesome! We have a short, easy-to-use and type-checking compliant way to declare column definitions and reuse it in dozens of models.


That's a wrap! I hope this article helped to demystify Annotated, which can be surprising at first glance. This new construct opens a whole new world of possibilities in Python, rooted in the type-annotated era. We're only at the beginning: I bet lot of new libraries will emerge with those ideas in mind.

François Voron Logo
Full-stack web developer and data scientist, I've a proven track record working in SaaS industry, with a special focus on Python backends and REST API.

Legal

Made with ❤️ from 🇫🇷 - François Voron - SIRET 52101474600023