Replies: 10 comments 27 replies
-
|
Beta Was this translation helpful? Give feedback.
-
Can add more detail on
|
Beta Was this translation helpful? Give feedback.
-
Small correction - my previous micropython was 1.19, not 1.9 |
Beta Was this translation helpful? Give feedback.
-
What MQTT client code are you using? |
Beta Was this translation helpful? Give feedback.
-
umqtt. Ive used the same library across dozen various esp/micropython
things without issue...
…--Alexia
On Sat, 14 Jun 2025, 20:06 Peter Hinch, ***@***.***> wrote:
What MQTT client code are you using?
—
Reply to this email directly, view it on GitHub
<#17494 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALECSEEM346I73DZRKQLBL3DRI7XAVCNFSM6AAAAAB7I27BHGVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTGNBXGA2DCNQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
hmmm. I checked the code for the corruption check. Its basically just reading first sector bytes of the bdev and making sure they read FF, if not the FS is declared corrupt. Is that check safe to do any time, not just in boot?
Could I do the same check in my main loop to catch this as soon as it happens? |
Beta Was this translation helpful? Give feedback.
-
Indeed, however, i have code around it to recover the connection and none
of it accesses the fs for write, only read... And more importantly, despite
the fs corruption the connection seems to be up and even operating - im
using rabbitmq mqtt plugin and can keep an eye on it from inside... Untill
i restart the controller.
…--Alexia
On Mon, 16 Jun 2025, 14:03 Peter Hinch, ***@***.***> wrote:
I may be barking up the wrong tree but my query arose from these
observations:
- Having two devices simultaneously fail suggests a common cause.
- The official MQTT libraries do not cope well with WiFi disruption.
—
Reply to this email directly, view it on GitHub
<#17494 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALECSFUX2FOISSV4OMFZPD3D2P77AVCNFSM6AAAAAB7I27BHGVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTGNBYGI4DINQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
This happened again, now in 3 days, so it is not the overflow. And it only looked sychronized because it took a net flux for me to notice. I have found the probable trigger, but not the cause. The main loop has sa spot that when a setting is changed, its supposed to save it once and then not again until it is changed. I found that I wasnt assigning current value back properly so if it was changed during runtime and kept overwriting the config file at every loop. That would eventually eat a hole in my flash where that file was, but... It is also for some reason craping up the FS form the start and the extent of the crap in the first sector seems... incremental, as in the point where it writes crap keeps shifting forward. The sniplet that does the writing looks like this:
Note the w+ write mode. It has no big reason being there, just copy-paste from years ago... My next go to is to change it to regular w. I also managed to pull hex dumps of the first sector. The crap written there is not mine... and its very similar on both, but not the same. And it has a pattern... So maybe there is a firmware isseu with 'w+' mode? I will attach the dumps in case they end up being useful to someone. |
Beta Was this translation helpful? Give feedback.
-
Thats how I pulled the hex dumps. They were perfectly functional and accessible via webrepl. |
Beta Was this translation helpful? Give feedback.
-
This looks more and more some sort of an issue in littleFs wear leveling when it's pushed leading to corrupt FS. I did not know the default changed to littlefs so none of mine were vfat even when I tried to switch. I will probably format the ones I want to keep working(ones actually installed in the controllers) to vfat and leave the rest as littlefs, so I can fish for more information on the dev pieces, but debuggin littlefs is a bit out of my scope. I did look at the code and could not parse it well enough to figure out how it knows where the superblock was moved to chase it around. However - at some point, simply from use long before the actual flash wears out, littlefs will become corrupt and fail to mount. But it isnt when the superblock is first moved... |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
A bit of stage setting... I have 4 identical controllers running exactly the same micropython code. Esps are from diferent batches, even slightly different models - one out of the 4 is already an usb-C connecting model. All report to my HA - they are operating as hombrew kiln controllers set up as very high max temp HVAC-s through MQTT. They have roughly the same uptime, because I flashed them and installed them in the controllers as one batch.
Out of the blue all controllers (first all 4, then the 2 I reflashed and resetup) disappeared from HA roughly around the same and when reset reported:
Reflashing fixes it for now, but twice now, this happens to multiple controllers at the same time AND it seems to be related to uptime - this hapened after a week at least without reset. After the first incident, I reformated one of them to littlefs2... but both controllers failed the same way the roughly the same time. All of my code is using with open() syntax to operate with files, so there should be no leaking of file handles or files left open and I am a bit confused, because I could belive ONE failing like this is bad hardware, all four doing synchronized FS corruption... Um, no. So, Ideas?
I did not have these issues with 1.9, but I did have issues with getting network up and asyncio, so I upgraded the firmware. Net problems went away, but now I have this.... Any ideas how to debug/investigate it? As is, I have 2 esps sitting with this corruption that I havent reflashed on my desk...
Beta Was this translation helpful? Give feedback.
All reactions