Replies: 7 comments 2 replies
-
Another approach to passing 32-bit (rather than 30-bit) ints to a Viper or asm function is to populate a 32-bit integer array with the values and pass the array as an arg. |
Beta Was this translation helpful? Give feedback.
-
The one I had high hopes might work was SIO = const(0xd0000000)
@micropython.viper
def callback(_):
sio = ptr32(SIO) but even that involves a 700ns lookup. I assume the const substitution happens before the viper decorator even gets a chance to see it? On the subject of array() and bytearray(), if For me, this 100x difference really hammered home the importance of taking local pointers once at the start of a viper function, rather than doing any namespace lookups inside tight loops, as the documentation already cautions. I see that x[0] = 324
x[1] = 324
x[2] = 324 takes 40ns whereas for i in range(3):
x[i] = 324 takes 560ns and even i = int(0)
while i < 3:
x[i] = 324
i = i + 1 takes 510ns, although luckily viper still doesn't need to allocate for the range(3). (Within viper, I don't think I need the int(0) there - it would automatically be int32 when assigned as 0?) It'd be interesting to build a debug version of the code emitters and see what they're actually doing here. I'll need to work out how, though... |
Beta Was this translation helpful? Give feedback.
-
Yes, I found the documentation and started to dig through the emitnative.c code generator a bit too, although not yet figured out how to build a debug version of mpy-cross that debug logs the generated assembler from viper so I can experiment with its behaviour more directly. (I rather lazily started disassembling the relevant bytes of the .mpy file, but that's a horrific way to do it!) My measurement harness is as simple as from machine import Pin, Timer
import micropython
Pin(0, Pin.OUT)
@micropython.viper
def callback(_):
sio = ptr32(uint(0xd0) << 24)
sio[6] = 1
# Insert something here
sio[8] = 1
timer = machine.Timer(freq = 1000, mode = Timer.PERIODIC,
callback = callback, hard = True) where the block to test is pasted inline. (Obviously it can't use the predefined sio as that would be cheating.) Without any additions, this produces a 14ns pulse: essentially the two cycles to write sio[8]. I can offset the trigger to measure just the extra time added by the code spliced in. Inserting
I can't find any way to write the fast version that has the literal 0xd0000000 in it, nor any way to express addresses between 0x40000000 and 0xc0000000 faster/clearer than a shifted smaller constant. If python had macros or define-time/inline expanded functions, I could write a helper than takes a define-time constant and emits something that viper will optimise well, but sadly python is not scheme. Yes, I expected the while loop to be super-fast too, and I don't really understand why it's not. With no other code spliced in than: i = 0
while i < 4:
i = i + 1 the loop over four values of i doing nothing still costs 390ns or about 59 cycles. i = 4
while i:
i = i - 1 is a little bit better at 240ns but still not brilliant: I guess about 34 cycles? |
Beta Was this translation helpful? Give feedback.
-
Nice, there's basically zero jitter in the overhead on your test harness. Saves firing up a scope! I added two extra columns, one for
PS For me the RVR register comes up as zero so I added |
Beta Was this translation helpful? Give feedback.
-
Here's a slightly boiled down version which disables interrupts during the test: import machine
import micropython
@micropython.viper
def test() -> uint:
ppb = ptr32(-0x20000000) # PPB = 0xe0000000
ppb[0x3804] = 0b101 # CSR = CLKSOURCE | ENABLE
ppb[0x3805] = -1 # RVR = maximum
state = machine.disable_irq()
t0 = ppb[0x3806] # t0 = CVR
t1 = ppb[0x3806] # t1 = CVR
x = ptr32(-0x30000000) # Line to benchmark
t2 = ppb[0x3806] # t2 = CVR
machine.enable_irq(state)
# Difference between t1 - t2 and t0 - t1 masked with RVR:
return uint(t1 - t2 - t0 + t1) & ppb[0x3805]
print(*(test() for _ in range(20))) and some corresponding results for variants you and I have measured above:
When I look at the initially puzzling difference between |
Beta Was this translation helpful? Give feedback.
-
Another curious factor with gpio timing is that: p = Pin(10)
p(1) # set pin high is actually noticeably faster than: p = Pin(10)
p.value(1) # set pin high This is because in the second case there's a dictionary lookup internally to find the |
Beta Was this translation helpful? Give feedback.
-
Interesting, and 420ish cycles vs 700ish cycles (a bit jittery): almost twice as fast as you say. I didn't know you could call pin objects directly like that! I wondered if I'd overlooked it when reading the documentation. The machine.Pin reference does mention Maybe I should cook up a docs PR? Another fun one I stumbled across: MicroPython interns strings, so comparing two strings in viper is cheap and constant time, as is comparing two pin objects (say). There isn't an [Edit after reading the code: no, comparing string reprs wouldn't be cheap like comparing objects is, because |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
While measuring jitter on hard vs soft IRQs on rp2350 with a scope, I got distracted into benchmarking different ways to do fast GPIO from a viper hard IRQ handler. I found the numbers interesting, so thought I'd post them in case anyone else is interested too.
As a baseline, with the default 150MHz clock frequency, if
pin = machine.Pin(0, machine.Pin.OUT)
, callingpin.value(1)
from a viper function takes around 4us with roughly 700ns jitter.A more direct
mem32[0xd0000018] = 1
takes about 2us with c. 500ns jitter.Of course, viper can write memory directly and this is much faster. If
sio
is a ptr32 to 0xd0000000, the directly equivalentsio[6] = 1
takes only 14ns.But there's a little trap here: initialising
sio = ptr32(0xd0000000)
takes 700ns! The same forsio = ptr32(13 << 28)
orsio = ptr32(int(0xd0000000))
. However,sio = ptr32(int(13) << 28)
orsio = ptr32(int(0xd0) << 24)
are fine and take just 26ns.Even though we're using viper, the argument to
ptr32()
/int()
is a python integer and if that's more than 30-bits we end up dereferencing an object. I understand why this happens but still managed to forget and end up surprised by it!Similarly, something like
ptr8(0)[0xd0000018]
won't work at all because we're trying to index zero with an object not a machine int. However,ptr32(0)[0x34000006] = 1
works fine and is fast at 40ns (= 14ns + 26ns).Final measurement: if we define
then calling
wibble()
from another viper function costs 1.5us, so it can be quite costly to break up a viper callback in the absence of any way to create a define-time macro.Beta Was this translation helpful? Give feedback.
All reactions