-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Description
Caveat
This is on custom hardware, mainline LitteFS code with addition of #5166
Setup
STM32F765 with 32MB QSPI Flash chip.
Flash is manually partitioned in boot.py
with a Fat partition in lower half and a LittleFS V1 partition in top half.
if is_unpartitioned:
flash_size = 32 * 1024 * 1024
# Starting the second partition 512KB (1 erase page) before the half way mark means older
# bootloaders without the qspi addressing fix can still wipe the second partitions' fs table
part_1_len = (flash_size // 2) - (512 * 1024)
fl1 = pyb.Flash(start=0, len=part_1_len) # 15.5MiB partition
fl2 = pyb.Flash(start=part_1_len) # remaining 16.5MB
uos.VfsFat.mkfs(fl1) # format FAT on partition 1
uos.mount(fl1, '/flash')
uos.VfsLfs1.mkfs(fl2) # format littlefs v1 on partition 2
uos.mount(fl2, '/lfs')
Issue
A few times in the last couple of days I've seen corruption resulting in deadlock in the QSPI chip
I started with a system running fine, files on both filesystems, but then ran into a MemoryError (fragmenting).
Upon restarting, the FatFS filesystem appears to be wiped. I put some files back on it and restart it.
The code starts to write a file on the LFS then and deadlocks.
Debugger in C shows I'm in lfs1_file_write()
-> lfs1_ctz_extend()
-> lfs1_alloc()
-> lfs1_cache_read()
->mp_vfs_blockdev_read_ext()
-> spi_bdev_readblocks_raw()
Here it's trying to read num_bytes: 32
from block_num: 34869653
My chip only has 8192
blocks
This then eventually gets to qspi_read_cmd_qaddr_qdata()
where the deadlock occurs in
while (!(QUADSPI->SR & QUADSPI_SR_FTF)) {
}
Status
At this stage I don't know if the root cause is related to my partition configuration, the above PR, the mainline VFS stack or some kick-on effect from hitting the memoryerror.
Irrespective of the cause of the corruption though, it would be worth adding some protection around the QSPI flash driver to prevent attempts to read outside the chip range as this will always deadlock here.
I'm continuing to investigate the issue, I wanted to document my running findings as I go though.