M4: Cut fail. not much info, but potential Z axis bug after stop

Thanks for the specifics; that does not look too hard!

I will take this apart and try this after I try one more firmware tweak to allow for a few “long” update loop invocations without panic, just to see if I can get through a cut with that workaround safely.

1 Like

Except one thing… I went to disassemble the arms and none of the arms can be fully disassembled as at least 1-2 (all on one) of the bolt will come undone – stripped / “cammed out”. I’ve tried the rubber-band trick… anyone have another trick to try for this, other than drilling out the nuts? I think these bolts are made out of something pretty soft… it should not be this easy to cam them out :frowning:

Found an allen wrench that was just slightly bigger and it got them… whew.

2 Likes

I used a combo of cutting down my allen keys so they had “fresh” sharp edges to bite into the bolt, jamming TORX bits into the hex head, and a few I had to Dremel a slit into so I could use a flat head screw driver. I’ve replaced them with locally sourced m3 12mm hex heads, hopefully theyre a bit tougher.

2 Likes

I did have to dremel one of the bolts on the motor mount to get it off, but no other casualties. I also didn’t have a small cutting wheel so ended up nicking the side of the bearing but it is just cosmetic.

Although… looks a lot bigger in the image :laughing: . I think I got it, definitely does not win any prizes for soldering skill, but I’m waiting for my new belt guards to print and my bolts to arrive from amazon to get this back together.

3 Likes

So my bolts got here this morning and I reassembled everything along with the new belt guards too.

I first did “retract all” and on two of the arms nothing happened. I was thinking “oh great what did I mess up”? But before I disassembled, I did retract all again with 2000 and one arm started retracting. I bumped it up again again and the other did. I reduced it to 1300 again, extend / retract a few times at the “normal” 1300 level and all is well at that level now. (whew!). Also, Yay, I didn’t mess up the encoder boards!

So I went to hang it and got it all booted up and hung and it had forgot my home so I re-homed it XY then Z. Started my cut again without the router on watched it get to about where it last died and flipped on the router/vacuum. It went along and cut things great, started on the final layer with the tabs and got through a few letters, then stalled and the router started heading up and right, belts started unspooling, etc. Hit power on the power strip and captured my log. Nothing in the log to tell me what happened. I’m assuming some fault or system level event given that the pause and stop buttons didn’t work and I got the reconnect error, indicating the maslow stopped talking to the ui.

Log with my debug level on (as on the git branch above). Nothing seemed wrong in here - certainly none of the panic messages showed up.

Maslow-serial-05-14.log (69.7 KB)

I’m seriously in need of pointers on what to do to determine what is happening. I can start littering the code with logs but I’m not sure that is a good idea. I’m so out of my debugging comfort zone on this

1 Like

Would you be willing to post the gcode file? I’m 99% sure that’s not the issue, but I can try running it anyway just to be sure.

Some non-destructive tests to consider: (a) remove the bit and re-run the job router turned OFF see if it completes as expected path and (b) set Z zero an inch or so above your workpeice and rerun the job with router turned ON

Sure. it was produced by using an inkscape SVG in Krabzcam and I manually added tabs to it.

SVG:
l2

Just outside profile cut:
l2-outside.nc (185.6 KB)

Full file including inside profiles first, then outside:
l2.nc (283.4 KB)

both are for a 1/4" bit

1 Like

which .NC job were you running at the time it went off course?

I’ve had issues on both files. If you see the “story” photo above, the inner profiles went through on 2 tries, first of which was a fail with the big gash on the a. I then made the outer profile nc to skip the inner profiles them on the next attempts. Only the last was with the modified encoder boards, and it failed after watching the machine get through the first two layers without the router on, and starting the router on the third.

I agree with this, but I’m not sure what it could be. This might sound crazy, but I wonder if maybe recording the log is actually causing the issue? I have had issues in the past where trying to read from the encoders and talk to the SD card at the same time would cause the processor to crash. It took me a long time to get the encoder system and the SD card to play nice together and I would see behavior somewhat like what you are describing when it did crash.

Looking through the log I have a couple questions.

Do you know why we are seeing this so often?

[GC:G3 G54 G17 G21 G90 G94 M3 M9 T0 F1800 S10000]

These back to back makes me think that something funky is going on with the encoders at this moment in time.

[MSG:WARN: Encoder read failure on Top Left]
[MSG:ERR: Position error on Bottom Right axis exceeded 15mm while running. Rereading... Error is -24.058mm]

Generally there are a lot of encoder read failures which seems concerning.

I think that as weird as it sounds the next thing that I would try would be running it with the logging turned off and let’s make sure that we’re not interrupting the encoder reads with log writes somehow.

Does not sound weird. I’ll see if I can sneak out at lunch and try with current stock firmware and logging turned down to default I don’t see a setting to turn off the GC: logs though, they are using log_to, so don’t check the log level.

1 Like

Maybe I’m getting confused here. Are the logs being written to the SD card or is that happening in the browser?

These are serial logs (browser/websocket). The button I have that PR out for just grabs the serial out text area content, which is what my files are on this topic.

I have not got any logging to SD tasks started yet :slight_smile: My only changes to firmware are all in the git compare above, and are just in the update function and do more logging and retries, etc. I’ll eliminate that for this trial though and turn system log level to error.

the [GC: logs are from Report.cpp and are using log_to which I don’t think respects the log level setting (I could be wrong there)

1 Like

Gotcha, then I’m pretty sure I’m wrong :confused:

Hmmmm…I agree that something system level made it crash, and I have seen bugs do things like that before.

Yea, it seems like we ought to be able to trap system errors and try to log something, but I’m out of my depth on that. If I could find a consistent way to reproduce this it might help narrow it down.

1 Like

100%, that’s usually 99% of the battle.

Especially if we can replicate it in a way that doesn’t require waiting 20 minutes for it to happen :melting_face:

Stock 0.72 firmware, my new ui (so I can capture logs). Logging level set to error: another fail…

[MSG:ERR: Emergency stop. Update function not being called enough.1002ms since last call]

motion did not stop, BTW. I’m not sure why as it was a panic which should have stopped motion, right?

Also this time through I turned on the router and vacuum at the start just to take that out or put that in the equation… still went through to the layer where tabs start, got further than it has, then this happened.

Maslow-serial-20240515.log (35.6 KB)

1 Like

Tried my code (pushed to my fork in guthub) to allow a few “too long” calls to update.

Another “butt pucker” got even further on the cutting, then decided to just start wandering up and right. Was trying to video a little and got flustered and didn’t hit off on the router very fast / hit power. fun times…

I don’t see anything in the log below that did a panic so I’m not sure what failed. I’m going to look for a way to catch exceptions / errors and try to log them.

Maslow-serial-2024051502.log (45.7 KB)