Skip to content

qstr: Separate hash and len from string data. #7209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

tyomitch
Copy link
Contributor

@tyomitch tyomitch commented May 3, 2021

This allows the compiler to merge strings: e.g. "update",
"difference_update" and "symmetric_difference_update"
will all point to the same memory.

No functional change.

   bare-arm:   -12 -0.021%
minimal x86:   +38 +0.026% [incl +40(data)]
   unix x64:  -576 -0.114% [incl +32(data)]
unix nanbox:  -696 -0.158%
      stm32: -1336 -0.344% PYBV10
     cc3200:  -440 -0.240%
    esp8266: -1092 -0.159% GENERIC
      esp32: -1332 -0.093% GENERIC[incl -1408(data)]
        nrf:  -404 -0.273% pca10040
        rp2: -1024 -0.213% PICO
       samd:  -196 -0.190% ADAFRUIT_ITSYBITSY_M4_EXPRESS

@dpgeorge
Copy link
Member

dpgeorge commented May 3, 2021

Wow, great find!

Need to run the performance benchmark to see if it has any impact there. Also see what the change is for RAM usage.

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label May 3, 2021
tyomitch added 2 commits May 6, 2021 01:31
This allows the compiler to merge strings: e.g. "update",
"difference_update" and "symmetric_difference_update"
will all point to the same memory.

No functional change.

```
   bare-arm:   -12 -0.021%
minimal x86:   +38 +0.026% [incl +40(data)]
   unix x64:  -576 -0.114% [incl +32(data)]
unix nanbox:  -696 -0.158%
      stm32: -1336 -0.344% PYBV10
     cc3200:  -440 -0.240%
    esp8266: -1092 -0.159% GENERIC
      esp32: -1332 -0.093% GENERIC[incl -1408(data)]
        nrf:  -404 -0.273% pca10040
        rp2: -1024 -0.213% PICO
       samd:  -196 -0.190% ADAFRUIT_ITSYBITSY_M4_EXPRESS
```

Originally at adafruit#4583

Signed-off-by: Artyom Skrobov <tyomitch@gmail.com>
Originally at adafruit#4707

Signed-off-by: Artyom Skrobov <tyomitch@gmail.com>
@dpgeorge
Copy link
Member

I ran the performance benchmark on this PR, against master at 70f50c4. This is the result (perf0 is master, perf1 is this PR):

diff of scores (higher is better)
N=100 M=100                   perf0 ->      perf1         diff      diff% (error%)
bm_chaos.py                  308.47 ->     303.92 :      -4.55 =  -1.475% (+/-0.01%)
bm_fannkuch.py                77.06 ->      76.96 :      -0.10 =  -0.130% (+/-0.00%)
bm_fft.py                   2453.01 ->    2463.74 :     +10.73 =  +0.437% (+/-0.00%)
bm_float.py                 4909.52 ->    4826.24 :     -83.28 =  -1.696% (+/-0.00%)
bm_hexiom.py                  35.16 ->      35.47 :      +0.31 =  +0.882% (+/-0.00%)
bm_nqueens.py               4195.59 ->    4190.63 :      -4.96 =  -0.118% (+/-0.00%)
bm_pidigits.py               648.14 ->     647.14 :      -1.00 =  -0.154% (+/-0.29%)
misc_aes.py                  365.74 ->     367.25 :      +1.51 =  +0.413% (+/-0.00%)
misc_mandel.py              3010.53 ->    3019.65 :      +9.12 =  +0.303% (+/-0.00%)
misc_pystone.py             1938.93 ->    1943.80 :      +4.87 =  +0.251% (+/-0.01%)
misc_raytrace.py             309.49 ->     304.47 :      -5.02 =  -1.622% (+/-0.00%)
viper_call0.py               584.31 ->     584.83 :      +0.52 =  +0.089% (+/-0.11%)
viper_call1a.py              556.62 ->     557.64 :      +1.02 =  +0.183% (+/-0.13%)
viper_call1b.py              442.75 ->     442.90 :      +0.15 =  +0.034% (+/-0.02%)
viper_call1c.py              447.41 ->     447.64 :      +0.23 =  +0.051% (+/-0.03%)
viper_call2a.py              542.77 ->     543.22 :      +0.45 =  +0.083% (+/-0.09%)
viper_call2b.py              385.41 ->     385.91 :      +0.50 =  +0.130% (+/-0.10%)

That's more or less no change.

@tyomitch
Copy link
Contributor Author

Ping?

@dpgeorge
Copy link
Member

Against latest master, the size diff is now:

   bare-arm:    -4 -0.007% 
minimal x86:  +150 +0.092% [incl +48(data)]
   unix x64:  -608 -0.118% 
unix nanbox:  -572 -0.126% [incl +32(data)]
      stm32: -1392 -0.352% PYBV10
     cc3200:  -448 -0.244% 
    esp8266: -1208 -0.173% GENERIC
      esp32: -1028 -0.068% GENERIC[incl -1020(data)]
        nrf:  -440 -0.252% pca10040
        rp2: -1072 -0.217% PICO
       samd:  -368 -0.264% ADAFRUIT_ITSYBITSY_M4_EXPRESS

Performance change on PYBv1.1 is:

diff of scores (higher is better)
N=100 M=100                pybv10-perf0 -> pybv10-perf1         diff      diff% (error%)
bm_chaos.py                    365.21 ->     371.06 :      +5.85 =  +1.602% (+/-0.01%)
bm_fannkuch.py                  77.51 ->      78.72 :      +1.21 =  +1.561% (+/-0.00%)
bm_fft.py                     2559.69 ->    2591.71 :     +32.02 =  +1.251% (+/-0.00%)
bm_float.py                   5993.07 ->    6034.82 :     +41.75 =  +0.697% (+/-0.02%)
bm_hexiom.py                    48.01 ->      48.96 :      +0.95 =  +1.979% (+/-0.00%)
bm_nqueens.py                 4475.93 ->    4510.62 :     +34.69 =  +0.775% (+/-0.00%)
bm_pidigits.py                 647.87 ->     651.87 :      +4.00 =  +0.617% (+/-0.22%)
core_import_mpy_multi.py       516.38 ->     564.77 :     +48.39 =  +9.371% (+/-0.01%)
core_import_mpy_single.py       59.33 ->      68.67 :      +9.34 = +15.742% (+/-0.01%)
core_qstr.py                    49.99 ->      64.16 :     +14.17 = +28.346% (+/-0.00%)
core_yield_from.py             363.99 ->     362.58 :      -1.41 =  -0.387% (+/-0.00%)
misc_aes.py                    425.57 ->     429.69 :      +4.12 =  +0.968% (+/-0.01%)
misc_mandel.py                3462.57 ->    3485.14 :     +22.57 =  +0.652% (+/-0.00%)
misc_pystone.py               2465.46 ->    2496.52 :     +31.06 =  +1.260% (+/-0.01%)
misc_raytrace.py               380.58 ->     381.47 :      +0.89 =  +0.234% (+/-0.01%)
viper_call0.py                 576.49 ->     576.73 :      +0.24 =  +0.042% (+/-0.03%)
viper_call1a.py                550.03 ->     550.37 :      +0.34 =  +0.062% (+/-0.03%)
viper_call1b.py                435.40 ->     438.23 :      +2.83 =  +0.650% (+/-0.22%)
viper_call1c.py                441.73 ->     442.84 :      +1.11 =  +0.251% (+/-0.12%)
viper_call2a.py                535.37 ->     536.31 :      +0.94 =  +0.176% (+/-0.14%)
viper_call2b.py                380.97 ->     382.34 :      +1.37 =  +0.360% (+/-0.19%)

The new benchmarks that measure qstr interning now show quite good improvements. And other benchmarks are pretty much unchanged. That is good.

The esp32 port (GENERIC on TinyPICO) also shows similar performance results to PYBv1.1:

diff of scores (higher is better)
N=100 M=100                esp32-perf0 -> esp32-perf1         diff      diff% (error%)
bm_chaos.py                    301.74 ->     292.94 :      -8.80 =  -2.916% (+/-0.01%)
bm_fannkuch.py                  77.43 ->      77.45 :      +0.02 =  +0.026% (+/-0.01%)
bm_fft.py                     2340.59 ->    2381.94 :     +41.35 =  +1.767% (+/-0.00%)
bm_float.py                   5025.37 ->    4486.82 :    -538.55 = -10.717% (+/-0.18%)
bm_hexiom.py                    31.95 ->      35.90 :      +3.95 = +12.363% (+/-0.02%)
bm_nqueens.py                 2291.45 ->    2313.04 :     +21.59 =  +0.942% (+/-0.03%)
bm_pidigits.py                 662.89 ->     665.97 :      +3.08 =  +0.465% (+/-0.18%)
core_import_mpy_multi.py        98.04 ->     272.48 :    +174.44 = +177.927% (+/-0.48%)
core_import_mpy_single.py       13.00 ->      49.13 :     +36.13 = +277.923% (+/-0.21%)
core_qstr.py                    14.95 ->      76.87 :     +61.92 = +414.181% (+/-0.11%)
core_yield_from.py             109.32 ->     109.34 :      +0.02 =  +0.018% (+/-0.00%)
misc_aes.py                    345.72 ->     345.55 :      -0.17 =  -0.049% (+/-0.01%)
misc_mandel.py                2047.04 ->    2205.99 :    +158.95 =  +7.765% (+/-0.01%)
misc_pystone.py               1751.29 ->    1797.49 :     +46.20 =  +2.638% (+/-0.02%)
misc_raytrace.py               315.02 ->     319.06 :      +4.04 =  +1.282% (+/-0.01%)
viper_call0.py                 273.15 ->     273.16 :      +0.01 =  +0.004% (+/-0.00%)
viper_call1a.py                269.46 ->     269.47 :      +0.01 =  +0.004% (+/-0.00%)
viper_call1b.py                228.22 ->     228.23 :      +0.01 =  +0.004% (+/-0.00%)
viper_call1c.py                228.87 ->     228.88 :      +0.01 =  +0.004% (+/-0.00%)
viper_call2a.py                267.21 ->     267.22 :      +0.01 =  +0.004% (+/-0.00%)
viper_call2b.py                209.39 ->     209.40 :      +0.01 =  +0.005% (+/-0.00%)

The large improvements are probably because the qstr hashes are much more localised in flash and have better cache behaviour.

@dpgeorge
Copy link
Member

Rebased and merged in 18b1ba0 and f46a714 (without any changes).

This improvement will also help with #8191, because the qstr pools will be able to reference string data directly in a memory-mapped .mpy file.

Thanks @tyomitch for the contribution, and for the clean commits!

@dpgeorge dpgeorge closed this Feb 11, 2022
tannewt pushed a commit to tannewt/circuitpython that referenced this pull request Nov 28, 2022
Save code space by packing rgbw values into C union
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy