Skip to content

pydantic using abc which can have memory and performance issues #11605

@peterchenadded

Description

@peterchenadded

Initial Checks

  • I confirm that I'm using Pydantic V2

Description

This is a follow up on discussion at python/cpython#92810 (comment).

The issue is that pydantic is using https://docs.python.org/3/library/abc.html which uses caching for isinstance checks and depending on the number of distinct pydantic basemodel classes it can lead to severe memory and performance issues.

Reproducer:

https://github.com/peterchenadded/pydantic-abc-performance

It took 45 seconds to isinstance 7000 distinct class objects on pydantic 2.11.0b2 with --profile.

Image Image

It took 1 second to isinstance 7000 distinct class objects on my custom changes in test_performance_fix.py with --profile.

Image Image

If I changed it to 10_000 distinct class objects it gets killed on my machine.

For the 7000 example, using memray memory hit over 4gb, with the fix memory was less than 100mb.

Note below code where it is alternating the base class is important. If it is a single base class i couldn't reproduce the issue.

Image

I leave it up to the pydantic to decide if they want to fix this by applying the patch or removing abc from there code base or push back to abc.py

Example Code

# https://github.com/peterchenadded/pydantic-abc-performance

from pydantic import BaseModel
import logging
import time

class MyEntity1Pattern(BaseModel):
    pass

class MyEntity2Pattern(BaseModel):
    pass

  
def _get_my_entity1_pattern():
    class MyTest1(MyEntity1Pattern):
        pass
    
    return MyTest1
  
def _get_my_entity2_pattern():
    class MyTest2(MyEntity2Pattern):
        pass
    
    return MyTest2

class Measurement(BaseModel):
    my_entity_1_pattern_check: float = 0
    my_entity_2_pattern_check: float = 0
    success: int = 0
    failed: int = 0

    def count(self, input: bool) -> None:
        if input:
            self.success += 1
        else:
            self.failed += 1
    
def test_performance():
    my_objects = []
    logging.info("Started getting objects to check")
    for i in range(7000):
        if i % 2 == 0:
            my_objects.append(_get_my_entity1_pattern()())
        else:
            my_objects.append(_get_my_entity2_pattern()())
    
    logging.info("Completed len(my_objects) = %d", len(my_objects)) 
    logging.info("Started checking objects")

    measurement = Measurement()
    for m in my_objects:
        start = time.time()
        measurement.count(isinstance(m, MyEntity1Pattern))
        measurement.my_entity_1_pattern_check += time.time() - start

        start = time.time()
        measurement.count(isinstance(m, MyEntity2Pattern))
        measurement.my_entity_2_pattern_check += time.time() - start

    logging.info("Completed checking classes")
    logging.info("measures:\n%s", measurement.model_dump_json(indent=4))

Python, Pydantic & OS Version

2.11.0b2 but also at least 2.7

OS: Apple M2 Mac

Metadata

Metadata

Assignees

Labels

bug V2Bug related to Pydantic V2

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    pFad - Phonifier reborn

    Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

    Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


    Alternative Proxies:

    Alternative Proxy

    pFad Proxy

    pFad v3 Proxy

    pFad v4 Proxy