2 bugs, or not 2 bugs, that is the question (Arduino 'insane')

In my home-office fake-server Maslow setup, I managed to make the arduino ‘insane’ a couple of times.
Only unplugging the arduino with closed GC will cure this state.
What makes it hard to trace down is, that it’s not consistent repeatable. It took me 2 hours today (FW/GC1.26), to make it happen.

All that i could find out for now is how you can identify the insanity:
The only report after starting GC is the connected port. The FW and GC version are missing as well as the loaded position.

In this test all arrows where frozen. A click on the ‘Stop’ loaded the position and then the arrows played as they should. I had 1 occasion where the stop-button ‘healed’ the arduino and restarting GC would now show the expected feedback, but making it insane again and trying 6-7 times to repeat the miraculous healing failed.

Log.txt at start with healthy mega:

Connected on port /dev/ttyACM0
PCB v1.2 Detected
Grbl v1.00
ready
ok
Sent: $$
[Forward Calculating Position]
position loaded at:
-0.02
-0.11

insane mega:

Connected on port /dev/ttyACM0

The $$ is not sent, or not answered.

In lack of other suspects, i interrogated the ‘strange’ numbers reported from a $$ like:

$1=1219.19995117 (machine height, mm)
$12=8113.72998046 (main steps per revolution)
$13=63.29999923 (distance / rotation, mm)
$18=12.60000038 (max z axis RPM)
$19=3.17000007 (z axis distance / rotation)
$27=0.27999999 (main Kd Velocity)
$35=0.27999999 (z axis Kd Velocity)

I had to let them go, as they where seen in the healthy mega aswell, but will keep a good eye on those guys. Might bring them back in for some other bug’lery. They do look suspicious.

The second suspect questioned was ‘time’.
How long does it take to write the information that we write to the eeprom?
While restarting GC right after closing can let me keep the mega in insane state, some walks to check on the food on the oven seems to cure this issue and suddenly it will start GC normal will all reports.
And then again, waiting several minutes directly after changing the feedrate in GC, closing, waiting 5 minutes does not. This make no sense.

Totally lost here.

1 Like

I don’t have much to add right now but wanted to say thank you for doing this.

I feel like this is valid. I’ve seen instances where a simple move command does not excecute for awhile and I have given up and on to doing something different…usually looking through the settings pages.

I am not sure where to post this, but I was able to spend a few hours this weekend, trying to do developments related to Holey Calibration.

There have been several posts by @jimr about difficulty initializing the machine following a calibration. So, I had it in my mind that I would try to address this if I encountered it. I didn’t actually make any progress on Holey Calibration because I did run into an issue similar to @jimr. Further, in my digging, I found quite a bit of stuff that was questionable. I didn’t pursue a quick solution, and I tried to dig in to understand a root-cause. The issues I encountered are new to the v1.26 updates, and were not present in the GC and Firmware Update versions.

  1. There was the issue of initializing the machine after calibration. The issue I ran into was: the Distance Between Motors was different from the default (12 ft vs. 10 ft), and the initial chain-lengths resulted in an x,y position that was off the board. The only way I was able to recover from this (without resetting the chain-length) was I commented Kinematics::_verifyValidTarget on line 73.
  2. After this, when the machine started, it would not respond to any g-code commands. If I executed a command like “B10L”, it did not respond in any way. The only way to make this work was to hit the stop (red stop sign button in the GUI), which re-initialized the Firmware, and it was then functional, and would respond to “B10L”.
  3. Despite (2), even though the Firmware would receive the command and respond, it did not display values in the GUI via the (I think) runPeriodically function, line 326
  4. After digging in, I found some stuff that appeared off.
    a) The function runPeriodically, line 326, parses the text in the line to determine how to handle it, and something is going wrong.
    b) The function, getLeftChainLength, line 173, looks like it sends a B10R (Requests the right chain-length). This makes no sense.
  5. I still wasn’t able to et the Firmware to respond to many commands; the [Home] button, for example.

Following this work, I would like to identify a strategy moving forward. There are quite a few bugs in the code, and improvement opportunities. I am not sure where or how these issues came about, or if they are just things nobody has seen because they were moving too fast to stop and notice. I don’t have any answers, unfortunately.

3 Likes

Thank you @Joshua for troubleshooting this! Just to confirm (I’m easily confused)…

Are you saying you are getting these errors while using the “official” v1.26 FW and GC?

.

You mean you have not had the above issues in your forked updates to v1.25?

I think I can shed a little light on this. I believe that on Windows (and possibly Mac) when the connection to the Arduino is interrupted, it triggers a reset of the Arduino causing it to do things like print the version numbers and load the position. I remember reading at one point that on Linux that is not the case, that a re-connection will not trigger a reset, although I don’t have hands on experience to back that up.

I think this could be considered a bug for sure, especially since more and more folks are using Linux on the Pi and this issue could be seen there also

i use ubuntu 16.4

I hold against, that with the arduino in a normal state, i can open close and disconnect as much as i want, ok, the port might change, but always GC will report the shield and the FW/GC version.
My biggest success in replicating the issue is changing the feed-rate in GC and then attempt something.

Observation:
If you change the motor distance, the machine recalculates itself position. Changing the feed-rate does not do that and perhaps it’s not necessary. However it seems as if “the eeprom is opened for writing and not closed” leaving it unreadable for next and further starts of GC until the arduino is unplugged. Sorry for the noob language but programming is 30 years back and burnout is current, so no chance for me to look what happens on the ‘value change’ or on ‘closing the settings’ event.

Last night I was able to do some work which may contribute to this topic.

In GC, there is a thread which attempts to monitor the progression of the machine. GC issues gcode commands, and the machine executes the commands. GC can’t issue more commands than the machine can execute. This is done by tracking a “bufferSize” and “bufferSpace” property in the code, which are both integers representing the working and absolute buffer space on the machine. GC doesn’t get to write a command until all the previously issued commands are complete. When a command is issued, the bufferSpace decreases, the length of the command is subtracted from the bufferSpace. When a command is completed. The length of the command is added back on. So, when there is an incomplete command, bufferSpace is numerically smaller than bufferSize. GC doesn’t write a new command when this is the case.

On my machine, a command is issued, and there is an error. It is shown as “Invalid Syntax”. Because of this error, the command is never completed. So, GC doesn’t issue any other commands. It is waiting for this task to complete.

In total, this is at least two bugs: one being the invalid syntax, and the other is GC never writing a command afterward.

3 Likes

So, I spent a brief time looking at this again. Based on the firmware code, the part where it says “PCB v1.2 Detected, GRBL v1.00, ready, ok”, should only happen once per power-cycle. It shouldn’t happen when you disconnect/reconnect. I don’t think this has changed recently.

Based on what I can see, this only happens when you restart GC, which seems to be what you did (with no $$ sent).

One thing I have noticed is that GC fires off a new thread every time you start it. This thread doesn’t necessarily stop when you close GC. Further, this thread attempts to connect to the Arduino every 5 seconds. When it does, it begins communicating with the Arduino. So, there is a possibility that this thread is still active when you attempt to start the new GC. In that case, the Arduino is communicating with the old thread that is still running. I don’t know what that would look like from a user’s perspective.

2 Likes