I’ve found a lot of value in the new(ish) telemetry for Maslow4 (for example, just posted Arm diagnostics via telemetry). However, I haven’t been able to collect any telemetry during a normal job/GCode execution. The symptom is that telemetry starts fine and the job starts fine, but then the Maslow halts within ~2-5 seconds of starting the job (I see it stop moving and immediately lose my connection to the maslow
AP - no log info). This is unique to running with telemetry enabled (haven’t had any failures from ~30 GCode jobs without telemetry).
Connecting via USB shows the following (consistent across two attempts):
[MSG:INFO: Telemetry: enabled]
[MSG:INFO: M4 telmetry set to on]
ok
E (943348) vfs_fat: open: no free file descriptors
abort() was called at PC 0x42109163 on core 0
Backtrace: 0x40379a1e:0x3fcd7de0 0x4037ebf9:0x3fcd7e00 0x40385bf1:0x3fcd7e20 0x42109163:0x3fcd7ea0 0x421091aa:0x3fcd7ec0 0x42109dbb:0x3fcd7ee0 0x4200b3cf:0x3fcd7f00 0x4200b469:0x3fcd7f30 0x4201dbde:0x30
ELF file SHA256: 4e4aa657ffc586cb
E (15539) esp_core_dump_flash: Core dump flash config is corrupted! CRC=0x7bd5c66f instead of 0x0
Rebooting...
I’m open to digging into this more myself but first wanted to check: Is this a known issue with a work-around?
1 Like
I think I may know what might be happening but I’m not sure of it. The code that records to the telemetry file opens it on each write (and closes it). But maybe the OS is not keeping up when it’s also reading gcode
2 Likes
Thanks for the note! I had a chance to do some initial digging in the last week. With the disclaimer that my understanding is still evolving: The Maslow reboot occurs immediately following the vfs_fat: open: no free file descriptors
error, so I suspect we’re hitting the ESP-IDF file descriptor limit (FD_SETSIZE
in sys/select.h
- see ESP32 SDK docs for VFS file descriptors). That would be consistent with your points about (1) needing at least one FD to read the G-code and (2) periodically needing another FD to persist the telemetry buffer’s data.
In theory, we’re free to redefine FD_SETSIZE
(comment in sys/select.h
says “FD_SETSIZE may be defined by the user”). I may try rebuilding with a small increase to the current value of 64, but I’d like to do a bit more digging first. (I’ve only ever dealt with OS FD limits in Linux, never had to deal with it in a RTOS so I have little intution for potential side effects/touchiness of the change.)
I’m unsure if this is aligned with your thought about the OS maybe “not keeping up”; let me know if you think it’s worth spending time investigating beyond FD limits.
Yea, that might work. Or we maybe could hold onto the file handle and not open and close it as often. (maybe every “n” times whatever we want “n” to be). I did it this way because you don’t lose anything that might not be written yet, but in hindsight, maybe just flush would work there, and keep the file handle for when the command comes to turn on or off telemetry.
1 Like