M4: Cut fail. not much info, but potential Z axis bug after stop

gazinux · May 16, 2024, 1:38am

Does the browser call the update function and does this imply link lost between browser and M4?

ronlawrence3 · May 16, 2024, 12:04pm

This is called from the protocol main loop (and other places) in the FluidNC code, not something we rely on the browser for.

dlang · May 16, 2024, 12:45pm

I know that with the esp8366 the arduino network library had a bug where a
network hiccup could stall all processing. the esp32 is dual core, but I don’t
know how it is managed or if the network lib bug has been fixed

David Lang

gazinux · May 16, 2024, 1:36pm

Is there a sinple way we can track from browser side network connections and reconnections that M4 experiences?

ronlawrence3 · May 16, 2024, 2:09pm

I wish! I’m looking for ways to trap errors / gather info on the ESP side and either store them off somewhere we can get after power off or try to send them to the browser. I suspect whatever this is is too late to send anything (i.e. the system is not in a good state, and maybe even is no longer talking to the browser / websocket). And to be clear, I am learning this environment and by NO means good at it yet.

ronlawrence3 · May 16, 2024, 2:11pm

certainly something is happening in the main loop that is taking longer than it should if I’m getting that error. OR… there is an un-caught fault that causes it to not call update again, and this message is just a symptom of that.

gazinux · May 16, 2024, 3:34pm

I am no coder but see others trying to diagnose FluidNC crashes on reddit and this thread talks about FluidNC has support for telnet protocol as potential way to get telemetry out

https://www.reddit.com/r/hobbycnc/s/tFd8yktSBJ

dlang · May 16, 2024, 5:30pm

Ron Lawrence wrote:

certainly something is happening in the main loop that is taking longer than
it should if I’m getting that error. OR… there is an un-caught fault that
causes it to not call update again, and this message is just a symptom of
that.

an almost exactly 1 second delay like this makes me think of a timeout.

Is this happening with the maslow as the AP, or with it connected to your home
network?

David Lang

ronlawrence3 · May 16, 2024, 5:32pm

If I understand the code right, yes, the FluidNC code has a “LogStream” that can connect to multiple streams (websocket, telnet, “other”) to support communications (two-way) with the system via grbl/Gcode. There are a few other “endpoints” on the FluidNC webserver that allow commands to be sent also.

In ESP32-UI they use a websocket connection and command endpoints for this communication to talk to the UI. It is certainly possible to communicate with telnet or USB also and I have experimented with telnet and USB on my system.

Explaining this has made me think that my next step might be to reset my device fully to a default config and make sure its not the telnet support and whatever other config I turned on that is causing issues for me.

ronlawrence3 · May 16, 2024, 5:34pm

Its on my home network. Although it could be related to that (I can try as an AP instead) I’m thinking it should not matter at all when the maslow is running a gcode job whether a ui is connected or not…

dlang · May 16, 2024, 6:12pm

Ron Lawrence wrote:

Its on my home network. Although it could be related to that (I can try as an
AP instead) I’m thinking it should not matter at all when the maslow is
running a gcode job whether a ui is connected or not…

In theory you are correct, but in practice…

the bug I know of with the network library was that if it lost the connection
out to the AP, it stalled all processing for several seconds (at the time) to
where it wouldn’t even respond to local button presses. This was for the tasmota
home automation firmware on the esp8266.

The built-in antenna on the esp32 module used on the maslow is weak and in close
proximity to significant RF noise generators

David Lang

ronlawrence3 · May 16, 2024, 7:38pm

Fair point and I’ll try with and without home wifi. First I’ll do the full reset of config/ full / fresh clean untouched by me firmware and try, then I’ll try the AP approach.

If I understand things in fluidnc code, they do most time critical things with cpu0 where they disabled the watchdog timer and some to cpu1 but the webserver and “polling loops” seem to be on 1, but if anything that has to be on 0 is misbehaving it could impact things.

md8n · May 16, 2024, 11:23pm

Among my crazier suggestions was to switch all of the ‘event handling’ code over to using RX (Reactive Extensions). This shifts everything to using an observer/observable pattern, and from experience it is substantially better than polling loops for doing event handling. But it completely inverts the way one function relates to another. In general, you do not have to rewrite the body of any function, but you do have to rewire all of the functionality.

ronlawrence3 · May 16, 2024, 11:34pm

I use reactive streams for javascript (rxjs) in my day job in frontend applications (angular, typescript), and have a love-hate relationship with them . I don’t think re-vamping fluid to use rxcpp would be a job I’d want to do…

md8n · May 16, 2024, 11:53pm

Having done a few too many event loops I’m reasonably certain that rx would be the way to go. With the qualification that I haven’t really dug into the c++ code yet

dlang · May 17, 2024, 12:21am

Lee H wrote:

Having done a few too many event loops I’m reasonably certain that rx would be
the way to go. With the qualification that I haven’t really dug into the c++
code yet

the big problem would be how to get such a major re-write upstream to people who
don’t know or care what maslow is.

David Lang

md8n · May 17, 2024, 12:26am

There’s one advantage here in that it is very easy to convert a stream of events into an observable stream. So if you’ve got an event loop and can ‘forward’ the events out of that to your observable stream, then you don’t need any other handling done in the ‘other side’ of the codebase.

jwolter · May 20, 2024, 12:50pm

@ronlawrence3 did grounding your pins result in any improvement/change in behavior?

ronlawrence3 · May 20, 2024, 5:21pm

Negative. I did at one point get through my “laundry” cut, but tried another and still got the “panic” / disconnection / whatever issue again. I didn’t have enough time this weekend to try with the maslow network, but I’ll try that this week. I really have a feeling it is not network related as once a gcode program is running I don’t think network disconnects should have an impact on the cut, unless I’m missing something, which is totally possible

Josh_Monroe · May 28, 2024, 12:13am

I see that it has been 7 days since the last discussion on this issue, but I think I ran into this bug twice today.

The first time, I was cutting a part and it just started moving straight up and did not stop. The bit was not lowered at that point, so nothing was cut and I only had one 1/16 pass left, so I just finished the cut by hand and used a router with a flush trim bit to finish it up.

The second time, it was near the end of the first 1/16 in pass and started moving on on angle and did not stop until I pulled power (it did not respond to the pause or stop buttons.
Here is a photo of the results.

I also attached the g-code if anyone wants to take a look. It was generated using Kiri:Moto in Onshape
SinkLeftV1.nc (231.3 KB)

Other potentially helpful info…
Firmware V0.74
Maslow was running in AP mode.
The computer seemed to have gone to sleep and disconnected around the time this happened.
Config: maslow.yaml (3.1 KB)
Logs after turning the machine back on:
Maslow-serial.log (1.6 KB)

Given that this was 30 minutes into the cut (and the potential loss of materials, I would also be interested in any ideas on how to resume the cut. the area where it wandered off is not in the finished part. Is my best bet to try to position the Maslow home as close to the original location as possible and run the code again?

Topic		Replies	Views
Strange error mid-cut	29	1349	March 31, 2018
Z-axis freezes after a cut Troubleshooting	1	416	April 21, 2021
Maslow started moving in 1 direction during cut Troubleshooting	55	478	June 16, 2024
Emergency stop error (solved) Troubleshooting	19	317	March 29, 2024
My first test cut and everything worked My First Cuts	3	177	July 28, 2024

M4: Cut fail. not much info, but potential Z axis bug after stop

Related Topics