Dataclasses, attrs, Pydantic
Java developers are spoiled for choice when modelling data: plain JavaBeans with Lombok, immutable value objects with records (Java 16+), or validation-heavy DTOs with Bean Validation (@NotNull, @Size). Python has an equally crowded space — but the three dominant options map cleanly to Java's three archetypes once you know the mapping.
The Landscape
Think of it this way:
dataclasses→ Java record (simple, immutable-ish, stdlib)attrs→ Lombok's@Data/@Valuewith more controlPydantic→ Java Bean Validation + Jackson + record, all in one
dataclasses — The Stdlib Choice
dataclasses (PEP 557, Python 3.7+) generates __init__, __repr__, and __eq__ from field declarations — exactly what Lombok's @Data does.
from dataclasses import dataclass, field
from typing import ClassVar
@dataclass
class User:
id: int
name: str
email: str
tags: list[str] = field(default_factory=list)
MAX_TAGS: ClassVar[int] = 10 # class variable, not a field
u = User(id=1, name="Alice", email="alice@example.com")
print(u) # User(id=1, name='Alice', email='alice@example.com', tags=[])Equivalent Java record:
record User(int id, String name, String email, List<String> tags) {
User {
tags = tags == null ? List.of() : List.copyOf(tags);
}
}For immutability, use frozen=True:
@dataclass(frozen=True)
class Point:
x: float
y: float
p = Point(1.0, 2.0)
p.x = 3.0 # raises FrozenInstanceErrorFor memory efficiency in large collections, use slots=True (Python 3.10+):
@dataclass(slots=True)
class Event:
timestamp: float
name: str
value: floatslots=True stores attributes in a C-level slot array instead of __dict__, reducing memory by ~50% per instance — the Python equivalent of a Java record's compact layout.
attrs — When You Need More
attrs predates dataclasses and offers converters, validators, and fine-grained ordering control that dataclasses lacks:
import attrs
@attrs.define
class Order:
id: int
amount: float = attrs.field(validator=attrs.validators.gt(0))
currency: str = attrs.field(
default="USD",
validator=attrs.validators.in_(["USD", "EUR", "GBP"])
)
items: list[str] = attrs.Factory(list)
Order(id=1, amount=-5.0, currency="USD")
# raises ValueError: ("'amount' must be > 0", ...)attrs.define generates __slots__ by default, so it is already memory-efficient without extra flags.
Converters let you coerce incoming values — like a setter with a transformation:
@attrs.define
class Config:
port: int = attrs.field(converter=int) # coerces str "8080" → 8080
host: str = attrs.field(converter=str.lower)Pydantic v2 — Validation, Serialization, JSON Schema
Pydantic is the workhorse for API request/response models, configuration, and any boundary where data arrives as untyped JSON or environment variables.
from pydantic import BaseModel, EmailStr, Field, field_validator
from datetime import datetime
class User(BaseModel):
id: int
name: str = Field(min_length=1, max_length=100)
email: EmailStr
created_at: datetime
tags: list[str] = []
# Parsing from a dict (e.g., JSON payload)
user = User(
id=1,
name="Alice",
email="alice@example.com",
created_at="2024-01-15T10:30:00Z", # string → datetime auto-coerced
)
print(user.model_dump()) # → dict
print(user.model_dump_json()) # → JSON string
print(User.model_json_schema()) # → JSON Schema for OpenAPIPydantic raises ValidationError with structured error details — similar to Bean Validation's ConstraintViolation set:
from pydantic import ValidationError
try:
User(id="not-an-int", name="", email="bad-email", created_at="not-a-date")
except ValidationError as e:
print(e.error_count()) # 4
for err in e.errors():
print(err["loc"], err["msg"])Custom validators:
from pydantic import field_validator, model_validator
class Order(BaseModel):
amount: float
currency: str
@field_validator("currency")
@classmethod
def currency_upper(cls, v: str) -> str:
return v.upper()
@model_validator(mode="after")
def amount_positive(self) -> "Order":
if self.amount <= 0:
raise ValueError("amount must be positive")
return selfChoosing Between Them
| Need | Tool |
|---|---|
| Simple value object, stdlib | dataclasses |
| Immutable record | @dataclass(frozen=True) |
| Memory-critical (millions) | @dataclass(slots=True) or attrs.define |
| Input validation + coercion | attrs (internal models) |
| API boundary / JSON parsing | Pydantic |
| JSON Schema / OpenAPI | Pydantic |
| Config from env vars | Pydantic Settings |
A common pattern: use Pydantic at the boundary (parse and validate) and plain dataclasses inside your domain logic (no runtime overhead, full IDE support).
Key Takeaways
dataclassesis the stdlib Java-record analogue: zero deps, generates__init__/__repr__/__eq__, optionalfrozenandslots.attrsadds validators, converters, and slots by default — closer to Lombok@Valuewith inline validation logic.- Pydantic is the right choice for any data that crosses an external boundary (HTTP, env, files) — it combines Jackson, Bean Validation, and JSON Schema generation.
- Use
Pydanticat API boundaries, plaindataclassesinside domain logic — separating validation from internal representation. - Pydantic v2 is backed by a Rust core (pydantic-core) and is 5–50x faster than v1 at parsing.
- Avoid mixing all three in the same module — pick a convention per layer and stick to it.