Skip to content

Use freelist for range object, iterator objects and other often used objects #126703

@eendebakpt

Description

@eendebakpt

Feature or enhancement

Proposal:

We can add freelists for the range object and various iter objects to improve performance. Using the new methods from #121934 the amount of code needed for adding a freelist is quite small. A prototype implementation is here:

main...eendebakpt:cpython:iter_freelists
main...eendebakpt:cpython:range_freelist

@markshannon In faster-cpython/ideas#132 (comment) you noted that freelists should be used for all types. Is adding freelists in a similar style to the already existing freelists a good approach? We could also try to make a more generic construction (for example inside PyObject_New/PyObject_Free), but that would have a much larger impact and I cannot oversee all consequences.

The freelists from the prototype improve performance of range in microbenchmarks:

range(1): Mean +- std dev: [rp_main] 50.0 ns +- 0.4 ns -> [rp_pr] 40.6 ns +- 0.5 ns: 1.23x faster
iter(range(1)): Mean +- std dev: [rp_main] 91.7 ns +- 2.4 ns -> [rp_pr] 68.5 ns +- 0.7 ns: 1.34x faster
list(range(1)): Mean +- std dev: [rp_main] 165 ns +- 1 ns -> [rp_pr] 144 ns +- 1 ns: 1.15x faster
list(iter(range(1))): Mean +- std dev: [rp_main] 240 ns +- 17 ns -> [rp_pr] 229 ns +- 13 ns: 1.05x faster
range(2, 1): Mean +- std dev: [rp_main] 61.6 ns +- 0.6 ns -> [rp_pr] 46.7 ns +- 0.4 ns: 1.32x faster
iter(range(2, 1)): Mean +- std dev: [rp_main] 100 ns +- 4 ns -> [rp_pr] 79.6 ns +- 4.0 ns: 1.26x faster
list(iter(range(2, 1))): Mean +- std dev: [rp_main] 227 ns +- 3 ns -> [rp_pr] 209 ns +- 4 ns: 1.08x faster
for loop length 1: Mean +- std dev: [rp_main] 146 ns +- 4 ns -> [rp_pr] 126 ns +- 6 ns: 1.16x faster
range(10): Mean +- std dev: [rp_main] 51.7 ns +- 2.4 ns -> [rp_pr] 41.0 ns +- 0.9 ns: 1.26x faster
iter(range(10)): Mean +- std dev: [rp_main] 92.7 ns +- 3.5 ns -> [rp_pr] 68.3 ns +- 2.2 ns: 1.36x faster
list(range(10)): Mean +- std dev: [rp_main] 205 ns +- 4 ns -> [rp_pr] 184 ns +- 9 ns: 1.11x faster
list(iter(range(10))): Mean +- std dev: [rp_main] 279 ns +- 7 ns -> [rp_pr] 261 ns +- 5 ns: 1.07x faster
range(2, 10): Mean +- std dev: [rp_main] 57.7 ns +- 1.5 ns -> [rp_pr] 48.5 ns +- 2.2 ns: 1.19x faster
iter(range(2, 10)): Mean +- std dev: [rp_main] 101 ns +- 3 ns -> [rp_pr] 78.1 ns +- 1.1 ns: 1.29x faster
list(iter(range(2, 10))): Mean +- std dev: [rp_main] 284 ns +- 17 ns -> [rp_pr] 262 ns +- 10 ns: 1.08x faster
for loop length 10: Mean +- std dev: [rp_main] 329 ns +- 7 ns -> [rp_pr] 295 ns +- 5 ns: 1.12x faster
range(100): Mean +- std dev: [rp_main] 52.3 ns +- 2.2 ns -> [rp_pr] 41.1 ns +- 1.3 ns: 1.27x faster
iter(range(100)): Mean +- std dev: [rp_main] 91.7 ns +- 3.5 ns -> [rp_pr] 69.6 ns +- 2.1 ns: 1.32x faster
list(range(100)): Mean +- std dev: [rp_main] 653 ns +- 27 ns -> [rp_pr] 603 ns +- 29 ns: 1.08x faster
list(iter(range(100))): Mean +- std dev: [rp_main] 701 ns +- 17 ns -> [rp_pr] 671 ns +- 17 ns: 1.04x faster
range(2, 100): Mean +- std dev: [rp_main] 58.5 ns +- 2.8 ns -> [rp_pr] 48.1 ns +- 2.3 ns: 1.22x faster
iter(range(2, 100)): Mean +- std dev: [rp_main] 99.6 ns +- 1.5 ns -> [rp_pr] 78.4 ns +- 0.8 ns: 1.27x faster
list(iter(range(2, 100))): Mean +- std dev: [rp_main] 710 ns +- 28 ns -> [rp_pr] 674 ns +- 16 ns: 1.05x faster
for loop length 100: Mean +- std dev: [rp_main] 2.06 us +- 0.07 us -> [rp_pr] 1.98 us +- 0.11 us: 1.04x faster
range(400): Mean +- std dev: [rp_main] 71.3 ns +- 2.2 ns -> [rp_pr] 61.7 ns +- 5.1 ns: 1.16x faster
iter(range(400)): Mean +- std dev: [rp_main] 112 ns +- 5 ns -> [rp_pr] 89.5 ns +- 2.8 ns: 1.25x faster
list(iter(range(400))): Mean +- std dev: [rp_main] 3.88 us +- 0.18 us -> [rp_pr] 3.77 us +- 0.08 us: 1.03x faster
range(2, 400): Mean +- std dev: [rp_main] 79.3 ns +- 1.7 ns -> [rp_pr] 66.0 ns +- 3.7 ns: 1.20x faster
iter(range(2, 400)): Mean +- std dev: [rp_main] 119 ns +- 2 ns -> [rp_pr] 99.5 ns +- 3.1 ns: 1.19x faster
for loop length 400: Mean +- std dev: [rp_main] 12.1 us +- 0.7 us -> [rp_pr] 10.9 us +- 0.4 us: 1.11x faster

Benchmark hidden because not significant (2): list(range(400)), list(iter(range(2, 400)))

Geometric mean: 1.16x faster
Benchmark script
import pyperf
runner = pyperf.Runner()

loop = """
def g(n):
    x=0
    for ii in range(n):
        x += 1
"""
        
for s in [1, 10, 100, 400]:
	time = runner.timeit(name=f'range({s})', stmt=f"range({s})")
	time = runner.timeit(name=f'iter(range({s}))', stmt=f"iter(range({s}))")
	time = runner.timeit(name=f'list(range({s}))', stmt=f"list(range({s}))")
	time = runner.timeit(name=f'list(iter(range({s})))', stmt=f"list(iter(range({s})))")

	time = runner.timeit(name=f'range(2, {s})', stmt=f"range(2, {s})")
	time = runner.timeit(name=f'iter(range(2, {s}))', stmt=f"iter(range(2, {s}))")
	time = runner.timeit(name=f'list(iter(range(2, {s})))', stmt=f"list(iter(range(2, {s})))")

	time = runner.timeit(name=f'for loop length {s}', stmt=f"g({s})", setup=loop)

``
</details>

On the pyperformance test suite (actually, a subset of the suite, not all benchmarks run on my system) shows the percentage of successfull freelist allocations increases about 1%.

Main:

Allocations from freelist 2,004,971,371 39.8%
Frees to freelist 2,005,350,418
Allocations 3,034,877,938 60.2%
Allocations to 512 bytes 3,008,791,812 59.7%
Allocations to 4 kbytes 18,648,072 0.4%
Allocations over 4 kbytes 7,438,054 0.1%
Frees 3,142,033,922

PR

Allocations from freelist 2,064,126,652 40.8%
Frees to freelist 2,064,499,538
Allocations 2,989,239,063 59.2%
Allocations to 512 bytes 2,963,207,194 58.6%
Allocations to 4 kbytes 18,596,157 0.4%
Allocations over 4 kbytes 7,435,712 0.1%
Frees 3,096,349,170

(Some quick testing shows that most of the allocations not via freelists are due to python int objects, so that is another candidate to use freelists for)



### Has this already been discussed elsewhere?

No response given

### Links to previous discussion of this feature:

Some references to discussions on freelists:

* https://github.com/python/cpython/pull/121934
* https://github.com/faster-cpython/ideas/discussions/132
* https://github.com/python/cpython/issues/91912
* https://github.com/python/cpython/pull/101453

<!-- gh-linked-prs -->
### Linked PRs
* gh-128368
* gh-128592
* gh-128594
* gh-128619
* gh-128692
* gh-129638
* gh-129921
* gh-132319
* gh-135233
<!-- /gh-linked-prs -->

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy