Software Engineering for Data Scientists, Part 1: Pydantic Is All You Need for Poor Performance Spaghetti Code

I love Pydantic. And I’ve witnessed some of the worst code written with Pydantic. Pure spaghetti, non-performant code.

There are two major anti-patterns in abusing Pydantic for maximum spaghetti. First anti-pattern is serdes debt. Instead of using Pydantic only at service boundaries for validation, it’s being used everywhere, incurring heavy serialization and deserialization and memory allocation costs. Second anti-pattern is inheritance over composition, where it’s common to see Pydantic being used to construct objects based on heavy layers of inheritance, breaking basic OOP SOLID principles.

In this post, we will discuss the serdes debt anti-pattern from using Pydantic.

Anti-pattern: SerDes Debt

Pydantic is primarily used for data validation, with support for data schema and data serialization and deserialization (serdes).

Serialization is when we need to take an object, in this case a Pydantic object, and convert it into a JSON string. Deserialization is when we need to take a JSON string and deserialize it into an object.

In the language of Python, it’s taking a string and converting it into a nested dictionary of mixed types (yay dynamic typing). Sometimes when doing these conversions, we do need to validate to ensure the data is as expected.

If it’s just for pure serdes, there are far faster and more efficient serdes packages like msgspec, orjson, or attrs. In fact, Pydantic can be set up to use orjson.

The core usecase for Pydantic is for data validation. Outside of custom data validation, the best practice is to avoid Pydantic.

Here’s a simple benchmark to demonstrate why.

Performance Benchmark

We based our benchmark on a simple two-class data structure. The Python dataclass implementation is shown below, and we bench this vs the equivalent in Pydantic. We did not implement data validation in either.

@dataclass
class Address:
    street: str
    city: str
    country: str
    postal_code: str

@dataclass
class User:
    id: int
    name: str
    email: str
    age: int
    is_active: bool
    address: Address
    tags: List[str]

Based on this simple data model, we observe that Python dataclasses run far superior in both time and space complexity.

Dataclasses vs Pydantic Time complexity

Dataclasses vs Pydantic Space complexity

Creation Performance:
- Dataclasses are ~6.5x faster for creating instances from dictionaries
JSON Operations Performance:
- Serialization: Dataclasses ~1.5x faster
- Deserialization: Dataclasses ~1.5x faster
- Full round-trip: Dataclasses ~1.5x faster overall
- Bulk Operations: The performance gap remains consistent at scale
Field Access Performance: Nearly identical performance between Dataclasses and Pydantic
Memory Consumption: Dataclasses consume ~2.5x less memory

Tips on Fixing Pydantic Anti-patterns

Only use Pydantic at service boundaries, e.g., API request and response validation. Do not use Pydantic within a service itself.

Here’s the Pydantic team themselves:
Use Pydantic at Service Boundary

Static type checking with mypy. Avoid dynamic type checking. If dynamic type-checking is really needed, rewrite in Rust.

Composition over Inheritance. Object inheritance creates additional layers of abstraction. Duplication is far cheaper than having more abstraction. Don’t Repeat Yourself (DRY) should be used sparingly.

DRY is not good. Worse is better

References

Appendix

Time complexity

Performance Comparison: Pydantic vs Dataclasses
============================================================
Test data structure: Nested user profile with address
Iterations per test: 10,000
Python dataclasses: Built-in
Pydantic version: 2.5.3

Warming up...

Running benchmarks...

Benchmarking: Instance Creation from Dict

Instance Creation from Dict
============================================================
Metric               Pydantic             Dataclasses         
------------------------------------------------------------
Mean                 0.0543 ms            0.0084 ms           
Median               0.0531 ms            0.0082 ms           
Min                  0.0497 ms            0.0076 ms           
Max                  0.0892 ms            0.0156 ms           
Stdev                0.0041 ms            0.0008 ms           

Dataclasses is 6.46x faster

Benchmarking: Convert to Dictionary

Convert to Dictionary
============================================================
Metric               Pydantic             Dataclasses         
------------------------------------------------------------
Mean                 0.0287 ms            0.0153 ms           
Median               0.0282 ms            0.0151 ms           
Min                  0.0265 ms            0.0142 ms           
Max                  0.0421 ms            0.0234 ms           
Stdev                0.0023 ms            0.0011 ms           

Dataclasses is 1.88x faster

Benchmarking: Serialize to JSON String

Serialize to JSON String
============================================================
Metric               Pydantic             Dataclasses         
------------------------------------------------------------
Mean                 0.0361 ms            0.0247 ms           
Median               0.0355 ms            0.0243 ms           
Min                  0.0334 ms            0.0228 ms           
Max                  0.0512 ms            0.0387 ms           
Stdev                0.0029 ms            0.0019 ms           

Dataclasses is 1.46x faster

Benchmarking: Deserialize from JSON String

Deserialize from JSON String
============================================================
Metric               Pydantic             Dataclasses         
------------------------------------------------------------
Mean                 0.0678 ms            0.0463 ms           
Median               0.0669 ms            0.0457 ms           
Min                  0.0632 ms            0.0431 ms           
Max                  0.0943 ms            0.0612 ms           
Stdev                0.0048 ms            0.0027 ms           

Dataclasses is 1.46x faster

Benchmarking: Field Access

Field Access
============================================================
Metric               Pydantic             Dataclasses         
------------------------------------------------------------
Mean                 0.0013 ms            0.0012 ms           
Median               0.0013 ms            0.0011 ms           
Min                  0.0011 ms            0.0010 ms           
Max                  0.0019 ms            0.0018 ms           
Stdev                0.0001 ms            0.0001 ms           

Dataclasses is 1.08x faster

Benchmarking: Bulk Creation (100 items)

Bulk Creation (100 items)
============================================================
Metric               Pydantic             Dataclasses         
------------------------------------------------------------
Mean                 54.312 ms            8.427 ms            
Median               53.867 ms            8.356 ms            
Min                  52.145 ms            8.123 ms            
Max                  58.923 ms            9.234 ms            
Stdev                1.234 ms            0.187 ms            

Dataclasses is 6.45x faster

SUMMARY
============================================================

Performance Summary:
- Instance Creation from Dict: Dataclasses is 6.46x faster
- Convert to Dictionary: Dataclasses is 1.88x faster
- Serialize to JSON String: Dataclasses is 1.46x faster
- Deserialize from JSON String: Dataclasses is 1.46x faster
- Field Access: Dataclasses is 1.08x faster
- Bulk Creation (100 items): Dataclasses is 6.45x faster

DETAILED JSON OPERATIONS COMPARISON
============================================================

Round-trip JSON test (dict -> object -> JSON -> object -> dict):
  Pydantic: 10.42 ms
  Dataclasses: 7.15 ms
  Ratio: 0.69x

Tested with Pydantic v2.5.3

Space Complexity

MEMORY USAGE COMPARISON
============================================================

1. Single Instance Memory Usage:
  Address object (deep size):
    Pydantic:    1,776 bytes
    Dataclass:   568 bytes
    Difference:  1,208 bytes (212.7% more)

  User object (deep size):
    Pydantic:    3,424 bytes
    Dataclass:   1,312 bytes
    Difference:  2,112 bytes (161.0% more)

2. Bulk Creation Memory Usage (1000 instances):
  Pydantic:    3,287.45 KB (3,287.45 bytes per instance)
  Dataclasses: 1,245.78 KB (1,245.78 bytes per instance)
  Difference:  2,041.67 KB (163.9% more)

3. JSON Operations Memory Usage:
  Per JSON deserialization:
    Pydantic:    4,256 bytes
    Dataclasses: 2,184 bytes
    Difference:  2,072 bytes

4. Attribute Storage Analysis:
  Pydantic User attributes:   8 stored attributes
  Dataclass User attributes:  7 stored attributes

  Pydantic internals:
    __dict__: 296 bytes (dict)
    __pydantic_fields_set__: 216 bytes (set)
    __pydantic_extra__: 0 bytes (NoneType)
    __pydantic_private__: 0 bytes (NoneType)
    address: 72 bytes (PydanticAddress)
    age: 28 bytes (int)
    email: 74 bytes (str)
    id: 28 bytes (int)
    is_active: 28 bytes (bool)
    name: 57 bytes (str)
    tags: 88 bytes (list)

  Dataclass internals:
    address: 72 bytes (DataclassAddress)
    age: 28 bytes (int)
    email: 74 bytes (str)
    id: 28 bytes (int)
    is_active: 28 bytes (bool)
    name: 57 bytes (str)
    tags: 88 bytes (list)

5. Memory Efficiency Summary:
  - Dataclasses use ~40-50% less memory per instance
  - Pydantic stores additional metadata for validation
  - The memory gap increases with more complex models
  - Consider memory usage for large-scale applications

6. Visual Memory Comparison (per 1000 instances):
  Pydantic:    [████████████████████████████████████████] 3,287.5 KB
  Dataclasses: [███████████████                         ] 1,245.8 KB

@article{
    leehanchung,
    author = {Lee, Hanchung},
    title = {No Code, Low Code, Full Code},
    year = {2025},
    month = {07},
    howpublished = {\url{https://leehanchung.github.io}},
    url = {https://leehanchung.github.io/blogs/2025/06/26/no-code-low-code-full-code/}
}