Skip to content

Fix PIO I2C race condition #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

dteal
Copy link

@dteal dteal commented Feb 21, 2025

This is a (hacky) fix for a race condition bug in the PIO I2C code that breaks I2C reads.

In many cases when reading data over the I2C bus via PIO, the correct data is received over the wire (as seen on a logic analyzer) but is incorrectly reported by the PIO state machine, with the result that I2C reads don't work. This problem has already been observed in the following threads and issue:

[1] https://forums.raspberrypi.com/viewtopic.php?t=356883
[2] https://forums.raspberrypi.com/viewtopic.php?t=340111
[3] #168

Testing suggests the (incorrect) bytes reported by the PIO are the expected I2C byte stream but shifted by 7 (!) bits. For example, the thread [2] reported the following examples of incorrectly read byte sequences. The I2C bus sees an 8 bytes for the address + read bit, then 32 bytes of data:

Data on I2C Bus        Data in receive buffer
0xED 0xFF 0x8B 0x00    0x77 0xDB 0xFF 0x16
0xD5 0xFF 0x8B 0x00    0x77 0xAB 0xFF 0x16
0xE1 0xFF 0x8B 0x00    0x77 0xC3 0xFF 0x16
0xD5 0xFF 0x7B 0x00    0x77 0xAB 0xFE 0xF6

This makes the following sequences in binary form, where we can see the PIO data is offset 7 bits from what is visible on the I2C bus:

bus 1:        11101101111111111000101100000000
pio 1: 01110111110110111111111100010110
bus 2:        11010101111111111000101100000000
pio 2: 01110111101010111111111100010110
bus 3:        11100001111111111000101100000000
pio 3: 01110111110000111111111100010110
bus 4:        11010101111111110111101100000000
pio 4: 01110111101010111111111011110110

I have replicated all this with a BH1750 light sensor and FDC2112 capacitance sensor on both a RP2040 and RP2350, and I can further confirm the mysterious first 7 bits reported by the PIO appear to be the last 7 bits of the (address + read bit) byte sent over the I2C bus prior to the expected data. The PIO is correctly reading the bus but has somehow shifted the result by 7 bits.

My best explanation for this is as follows:

First, I suspect the PIO program itself works as expected. Every time it reads or writes a byte on the bus, it puts 8 bits in the ISR (note this happens for both reads and writes, which the PIO treats identically). Autopush should be enabled, moving 8 bits at a time into the RX FIFO. To write a byte over I2C, the main core should put a byte into the state machine TX FIFO, wait for the state machine, then read and discard a byte from the RX FIFO. To read a byte, the main core should but 0xff into the TX FIFO, then read the result from the RX FIFO.

However, the C program running on the main core messes with this. In pio_i2c.c, the function pio_i2c_write_blocking() first calls pio_i2c_rx_enable(), which disables the PIO state machine autopush via direct register manipulation. This is done so that the PIO doesn't put anything on the RX FIFO, so bytes don't have to be read then discarded. The function pio_i2c_write_blocking() re-enables autopush.

Crucially, enabling/disabling autopush is not synced to the PIO state machine clock.

I haven't been able to figure out exactly when things break (I don't know why the shift seems is reliably 7 bits instead of a random number depending on the setup), but the following is a possibility:

  • Autopush is disabled.
  • 1 byte is placed in the TX FIFO, then put onto the bus. The state machine ISR gets 8 bits and the input shift counter is increased to 8. Nothing else happens because autopush is disabled.
  • 1 more byte is placed in the TX FIFO. However, after the first bit is put on the bus (making the shift counter 9), autopush is re-enabled and immediately PUSHes the ISR to the RX FIFO and resets the input shift counter. But then the remaining 7 bits are sent and put in the ISR.
  • Every subsequent byte in the PIO state machine has its first bit autopushed with the last 7 bits of the previous byte.

Even though the exact race condition hasn't been pinned down, there's enough evidence to create solutions that seem to work:

  • A. Remove pio_i2c_rx_enable() and have pio_i2c_write_blocking() read and discard the resulting RX FIFO dummy bytes. I confirmed this seemed to work in at least one case. This is similar to the solution noted in [2].
  • B. Ensure the PIO state machine bytes are always aligned by continually resetting the ISR. This has the advantage that any other race condition bugs in the code will at most make a single byte on the I2C bus incorrect; the next bytes will be automatically re-aligned. This is the solution proposed in [1] and [3].
  • C. More thoroughly rewrite pio_i2c.c with more caution for potential race conditions.

I chose to implement solution B by adding a mov isr, null instruction to the PIO assembly since this protects against all race conditions. Since this is a patch rather than fixing the ultimate root issue, it is possible there exist some setups where there is still edge case broken behavior, but I was not able to find any. Further implementing solution A might be wise in the future because it probably fixes whatever the root issue is. Solution C might be helpful in the long term as there are some other suspicious bits of code (e.g., emptying the RX FIFO buffer via while (!pio_sm_is_rx_fifo_empty(pio, sm)){(void)pio_i2c_get(pio, sm);} at the start of pio_i2c_read_blocking() seems like it should be unnecessary).

@peterharperuk peterharperuk added this to the 2.1.2 milestone Feb 21, 2025
@peterharperuk
Copy link
Contributor

Thanks for the analysis. I'll see if I can reproduce the problem with a BH1750.

@kilograham kilograham requested a review from Wren6991 March 24, 2025 23:05
@kilograham
Copy link
Contributor

cc @Wren6991 for thoughts on the underlying issue

@Wren6991
Copy link
Contributor

Wren6991 commented Jul 21, 2025

Just thinking through this part on the potential race condition:

I haven't been able to figure out exactly when things break (I don't know why the shift seems is reliably 7 bits instead of a random number depending on the setup), but the following is a possibility:

  • Autopush is disabled.
  • 1 byte is placed in the TX FIFO, then put onto the bus. The state machine ISR gets 8 bits and the input shift counter is increased to 8. Nothing else happens because autopush is disabled.
  • 1 more byte is placed in the TX FIFO. However, after the first bit is put on the bus (making the shift counter 9), autopush is re-enabled and immediately PUSHes the ISR to the RX FIFO and resets the input shift counter. But then the remaining 7 bits are sent and put in the ISR.
  • Every subsequent byte in the PIO state machine has its first bit autopushed with the last 7 bits of the previous byte.

It's my belief that if you write to SHIFTCTRL.AUTOPUSH, and then write to the TX FIFO, all the instructions which observe the TX FIFO write will also observe the SHIFTCTRL write.

It's more likely that the input shift counter is nonzero at the point autopush is enabled. The only way I could see that happening is if this function were somehow called at the wrong time:

void pio_i2c_resume_after_error(PIO pio, uint sm) {
    pio_sm_drain_tx_fifo(pio, sm);
    pio_sm_exec(pio, sm, (pio->sm[sm].execctrl & PIO_SM0_EXECCTRL_WRAP_BOTTOM_BITS) >> PIO_SM0_EXECCTRL_WRAP_BOTTOM_LSB);
    pio_interrupt_clear(pio, sm);
}

...since this is the only point that we directly touch the SM's program counter.

The workaround given here makes sense under that theory (I would still like to understand how the shift counter ends up nonzero!) and the only change I would make would be to move the counter clear into the pio_i2c_start()/pio_i2c_repstart() sequences to save on PIO instruction RAM space:

diff --git a/pio/i2c/pio_i2c.c b/pio/i2c/pio_i2c.c
index a3a19f5..7028fce 100644
--- a/pio/i2c/pio_i2c.c
+++ b/pio/i2c/pio_i2c.c
@@ -67,9 +67,10 @@ uint8_t pio_i2c_get(PIO pio, uint sm) {
 }
 
 void pio_i2c_start(PIO pio, uint sm) {
-    pio_i2c_put_or_err(pio, sm, 1u << PIO_I2C_ICOUNT_LSB); // Escape code for 2 instruction sequence
+    pio_i2c_put_or_err(pio, sm, 2u << PIO_I2C_ICOUNT_LSB);                         // Escape code for 3 instruction sequence
     pio_i2c_put_or_err(pio, sm, set_scl_sda_program_instructions[I2C_SC1_SD0]);    // We are already in idle state, just pull SDA low
     pio_i2c_put_or_err(pio, sm, set_scl_sda_program_instructions[I2C_SC0_SD0]);    // Also pull clock low so we can present data
+    pio_i2c_put_or_err(pio, sm, pio_encode_mov(pio_isr, pio_null));                // Ensure ISR counter is clear
 }
 
 void pio_i2c_stop(PIO pio, uint sm) {
@@ -80,11 +81,12 @@ void pio_i2c_stop(PIO pio, uint sm) {
 };
 
 void pio_i2c_repstart(PIO pio, uint sm) {
-    pio_i2c_put_or_err(pio, sm, 3u << PIO_I2C_ICOUNT_LSB);
+    pio_i2c_put_or_err(pio, sm, 4u << PIO_I2C_ICOUNT_LSB);
     pio_i2c_put_or_err(pio, sm, set_scl_sda_program_instructions[I2C_SC0_SD1]);
     pio_i2c_put_or_err(pio, sm, set_scl_sda_program_instructions[I2C_SC1_SD1]);
     pio_i2c_put_or_err(pio, sm, set_scl_sda_program_instructions[I2C_SC1_SD0]);
     pio_i2c_put_or_err(pio, sm, set_scl_sda_program_instructions[I2C_SC0_SD0]);
+    pio_i2c_put_or_err(pio, sm, pio_encode_mov(pio_isr, pio_null));
 }
 
 static void pio_i2c_wait_idle(PIO pio, uint sm) {

@Wren6991
Copy link
Contributor

It's more likely that the input shift counter is nonzero at the point autopush is enabled. The only way I could see that happening is if this function were somehow called at the wrong time:

Ah, no, it's just because the ISR counter increments during the write (even though autopush is disabled) and is never cleared. The next in after autopush is enabled shifts 1 more bit through the ISR and then pushes. There's no race, it's just a bug, and clearing the ISR counter between the last shift of a write and the first shift of a read is the correct fix.

Wren6991 added a commit that referenced this pull request Jul 21, 2025
This avoids a spurious shift on a write followed by a read. See discussion on #617
Wren6991 added a commit that referenced this pull request Jul 21, 2025
This avoids a spurious shift on a write followed by a read. See discussion on #617
@Wren6991
Copy link
Contributor

Closing in favour of #676 which is the same thing but without using a PIO instruction memory slot

@Wren6991 Wren6991 closed this Jul 21, 2025
kilograham pushed a commit that referenced this pull request Jul 23, 2025
This avoids a spurious shift on a write followed by a read. See discussion on #617
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy