Skip to main content
Python for the JVM Engineer

Dataclasses, attrs, Pydantic

Ravinder··5 min read
PythonJVMJavadataclassesattrsPydanticdata modellingvalidation
Share:
Dataclasses, attrs, Pydantic

Java developers are spoiled for choice when modelling data: plain JavaBeans with Lombok, immutable value objects with records (Java 16+), or validation-heavy DTOs with Bean Validation (@NotNull, @Size). Python has an equally crowded space — but the three dominant options map cleanly to Java's three archetypes once you know the mapping.

The Landscape

flowchart TD Q1{"Do you need\nruntime validation?"} Q1 -- Yes --> Pydantic Q1 -- No --> Q2{"Do you need\nadvanced features:\nslots, converters,\nvalidators, ordering?"} Q2 -- Yes --> attrs Q2 -- No --> DC["dataclasses\n(stdlib, zero deps)"] Pydantic["Pydantic v2\n(validation, serialization,\nJSON schema)"] attrs["attrs\n(full-featured,\npre-dates dataclasses)"] DC

Think of it this way:

  • dataclasses → Java record (simple, immutable-ish, stdlib)
  • attrs → Lombok's @Data / @Value with more control
  • Pydantic → Java Bean Validation + Jackson + record, all in one

dataclasses — The Stdlib Choice

dataclasses (PEP 557, Python 3.7+) generates __init__, __repr__, and __eq__ from field declarations — exactly what Lombok's @Data does.

from dataclasses import dataclass, field
from typing import ClassVar
 
@dataclass
class User:
    id: int
    name: str
    email: str
    tags: list[str] = field(default_factory=list)
    MAX_TAGS: ClassVar[int] = 10  # class variable, not a field
 
u = User(id=1, name="Alice", email="alice@example.com")
print(u)  # User(id=1, name='Alice', email='alice@example.com', tags=[])

Equivalent Java record:

record User(int id, String name, String email, List<String> tags) {
    User {
        tags = tags == null ? List.of() : List.copyOf(tags);
    }
}

For immutability, use frozen=True:

@dataclass(frozen=True)
class Point:
    x: float
    y: float
 
p = Point(1.0, 2.0)
p.x = 3.0   # raises FrozenInstanceError

For memory efficiency in large collections, use slots=True (Python 3.10+):

@dataclass(slots=True)
class Event:
    timestamp: float
    name: str
    value: float

slots=True stores attributes in a C-level slot array instead of __dict__, reducing memory by ~50% per instance — the Python equivalent of a Java record's compact layout.

attrs — When You Need More

attrs predates dataclasses and offers converters, validators, and fine-grained ordering control that dataclasses lacks:

import attrs
 
@attrs.define
class Order:
    id: int
    amount: float = attrs.field(validator=attrs.validators.gt(0))
    currency: str = attrs.field(
        default="USD",
        validator=attrs.validators.in_(["USD", "EUR", "GBP"])
    )
    items: list[str] = attrs.Factory(list)
 
Order(id=1, amount=-5.0, currency="USD")
# raises ValueError: ("'amount' must be > 0", ...)

attrs.define generates __slots__ by default, so it is already memory-efficient without extra flags.

Converters let you coerce incoming values — like a setter with a transformation:

@attrs.define
class Config:
    port: int = attrs.field(converter=int)   # coerces str "8080" → 8080
    host: str = attrs.field(converter=str.lower)

Pydantic v2 — Validation, Serialization, JSON Schema

Pydantic is the workhorse for API request/response models, configuration, and any boundary where data arrives as untyped JSON or environment variables.

from pydantic import BaseModel, EmailStr, Field, field_validator
from datetime import datetime
 
class User(BaseModel):
    id: int
    name: str = Field(min_length=1, max_length=100)
    email: EmailStr
    created_at: datetime
    tags: list[str] = []
 
# Parsing from a dict (e.g., JSON payload)
user = User(
    id=1,
    name="Alice",
    email="alice@example.com",
    created_at="2024-01-15T10:30:00Z",   # string → datetime auto-coerced
)
 
print(user.model_dump())           # → dict
print(user.model_dump_json())      # → JSON string
print(User.model_json_schema())    # → JSON Schema for OpenAPI

Pydantic raises ValidationError with structured error details — similar to Bean Validation's ConstraintViolation set:

from pydantic import ValidationError
 
try:
    User(id="not-an-int", name="", email="bad-email", created_at="not-a-date")
except ValidationError as e:
    print(e.error_count())   # 4
    for err in e.errors():
        print(err["loc"], err["msg"])

Custom validators:

from pydantic import field_validator, model_validator
 
class Order(BaseModel):
    amount: float
    currency: str
 
    @field_validator("currency")
    @classmethod
    def currency_upper(cls, v: str) -> str:
        return v.upper()
 
    @model_validator(mode="after")
    def amount_positive(self) -> "Order":
        if self.amount <= 0:
            raise ValueError("amount must be positive")
        return self

Choosing Between Them

Need Tool
Simple value object, stdlib dataclasses
Immutable record @dataclass(frozen=True)
Memory-critical (millions) @dataclass(slots=True) or attrs.define
Input validation + coercion attrs (internal models)
API boundary / JSON parsing Pydantic
JSON Schema / OpenAPI Pydantic
Config from env vars Pydantic Settings
flowchart LR A["Incoming HTTP\nJSON payload"] --> Pydantic Pydantic -->|"validated model"| B["Business logic\n(dataclasses / attrs)"] B --> Pydantic Pydantic -->|"serialised JSON"| C["HTTP response"]

A common pattern: use Pydantic at the boundary (parse and validate) and plain dataclasses inside your domain logic (no runtime overhead, full IDE support).

Key Takeaways

  • dataclasses is the stdlib Java-record analogue: zero deps, generates __init__/__repr__/__eq__, optional frozen and slots.
  • attrs adds validators, converters, and slots by default — closer to Lombok @Value with inline validation logic.
  • Pydantic is the right choice for any data that crosses an external boundary (HTTP, env, files) — it combines Jackson, Bean Validation, and JSON Schema generation.
  • Use Pydantic at API boundaries, plain dataclasses inside domain logic — separating validation from internal representation.
  • Pydantic v2 is backed by a Rust core (pydantic-core) and is 5–50x faster than v1 at parsing.
  • Avoid mixing all three in the same module — pick a convention per layer and stick to it.
Share: