Is that the version of the MemoryFree library that allows walking the free list? If so, you don’t need to change libraries … That would give a feeling for fragmentation… Perhaps implement a temporary Bxx gcode to trigger a printout of the free list that could be slipped into a gcode file to debug?
How many prints are relevant for normal, i.e. non developer, users? Can you just ifdef them out in released code?
@blurfl I love the idea of slipping in a temp Bxx code to let us interrogate the sate of things and give us a sense of how fragmented things are.
The version of MemoryFree I was using just prints the total memory that is free, but if there is a version which would let us walk the free list that could tell us more. The tool to inspect the heap looks really promising to me.
If you make progress on this front, let me know. I’d love to be able to build off of your work in the morning. If not this sounds like exactly the place I want to start so thanks for the advice!
@mooselake Unfortunately, they’re all relevant for normal users, the ones you don’t recognize are when the machine prints it’s position and positional error. Those are masked by Ground Control and used to move the cross hairs on the screen. If we can prove that heap fragmentation is the issue, and that it’s caused by printing we can almost certainly find a way to achieve the same functionality without causing fragmentation
Unless Arduino does things very differently, the only thing in this firmware that could be causing heap fragmentation is strings. Otherwise I don’t find any other use of dynamic memory allocation. i.e. there are no uses of “new” or “malloc” that I can find. So almost all of the elements allocated are on the stack.
I don’t think this is a heap problem. You preallocate the two main strings that are dynamically changed (readycommandstring and gcodeline), so unless they are exceeding the reserved string length (could they be?) thenthe two strings of size 128 isn’t likely an issue.
Incidentally there is an unused String variable named “temps” in the ringbuffer class that can be deleted. I don’t have git loaded on the machine I’m on, so I can’t make the change just now.
From my experience, I thiink chasing the “heap” isn’t going to yield much, but I’ve been wrong before. Will keep looking.
One way to be sure, is to replace the use of the string class with good old fashioned c style strings. no dynamically allocated strings means no heap problems. See here for info if you’re not familiar.
Here is another very good tutorial for those who missed the glory days of ‘C’ programming.
I’ll give it a look tonight
@mooselake, your suggestion is interesting - we could test that by temporarily disabling the print and foregoing the cursor move to see if that affects this issue.
PR#272 in Firmware adds two Bxx codes to put hjeap info into the log:
- B12 executes Avrheap.freeListWalk
- B13 executes Avrheap.dumpHeap
I didn’t do anything with the strings.
Running the “sled abbreviated” file a few times left me wondering whether it might be an issue in the PID routines. PID winding out beyond limits? Swamp rats? Is there an easy way to switch it on and off? Do we need PID on the z-axis with the current motor/gearbox?
Bar wrote "1. Pressing the “Stop” button seems to fix the problem"
Could it be a serial communications buffer problem since pressing the “stop” button would stop and reset the serial communications ?
Actually, the serial stream never stops. Movement stops, though, so proportional control variables that adapt to the present movement might change, maybe?
Back in the dark ages when C++ was the new kid on the block the common opinion was it should be avoided for real-time code due to all the constructing/destructing going on. At the time the 2560 would have been a high powered big memory system.
FWIW Linus Torvalds has (like most topics) a very strong negative opinion on the subject.
I wonder if having G codes for displaying the heap would put them where they wouldn’t show this problem, since presumably all pending commands would have finished before the B12/B13 were processed and the fragmenting space released? Any chance using an ICSP debugger would help sort it out? Last time I used an ICE was in the Z80 days, not up on the current stuff.
@mooselake Good thinking but there is very little constructing\deconstructing going on in this code. Almost all objects are created at startup and used throughout. See my earlier related post in this topic (though that post is more about dynamic memory use).
I do have a debugger, but can’t simulate what is being experienced as I am one of those without hardware yet. I have the arduino and can run the code but without encoder feedback, things get stopped up quickly. Anybody have any spare motors\encoders they want to loan me?
Not to mince words or cause a stir, but to be clear, just because it is firmware does not make it “real-time” code. Real-time is a term applied to code that is most often characterized by being deterministic (or very nearly so)… that is, the time required to execute a task will meet it’s deadline each and every time. Real-time code has very little variability in execution timing. Very often real-time is achieved through the use of a real-time OS that is specialized for switching task contexts using priority scheduling. The Maslow firmware does not utilize tasks nor scheduling, but is rather AFAICR (“As Fast As I Can Run”) code. One very useful thing about an RTOS is the protected access of shared resources (e.g. motors, memory, queues) such that they are accessed atomically, that is, a task owns the resource until it is completely through with it. Resource access is protected by using the semaphore and mutex constructs, that ensure multiple tasks cannot preempt each other until the shared resource is released by one task or the other.
Consider two timer ISRs that each fire a task, very closely one after another, that both want to access the same motor. Task 1 begins its movement of the motor to position X. The command to do so, may appear to be one pseudo-line-of-code: “Move motor 1 to Position X.” Beneath the covers of the high-level language though, that one line might be 50 lines of assembler code. The execution of the statement begins, but can only get through say 25 of its assembler lines of execution, before it is interrupted by Task 2 that wants to execute “Move motor 1 to Position Y”. What about the 25 lines of as of yet unexecuted code and the state of all of the registers when Task 2 begins? Chaos and unpredictable behavior ensue. This is why I advocate for anything that either task-switches or is interrupt-driven and uses shared resources, to be coded using an RTOS that can protect those resources and guarantee the atomic execution of the high-level code statements.
But I digress without adding any more value, so I’ll stop here.
I think I’ve tracked down the issue.
Huge thanks to @blurfl for finding the tool to inspect the heap. After taking a look at it, it was pretty clear that fragmentation wasn’t the issue. It looked exactly like we would expect it to (yay!)
The only other thing that I could think of that could hold onto and build error over the course of a run was the integral term of the PID controller, and that seems to have been the issue. The integrator term was winding up and saturating making the machine move in a jerky way.
I’ve reduced the value of _Ki and now it seems to be happy. The drawback is that when the machine first powers up it takes a second or two longer to compensate for the weight of the sled.
Bonus points to @seware for calling that it wasn’t a heap issue!
I’m going to add an option to change the PID tuning values from the Advanced Settings in Ground Control so that anyone who wants to can easily play with them, and maybe we’ll find a better set
_Ki was going to be my second guess. (actually I just thought it referred to a fraternity)
Nice work bar. You have a knack for troubleshooting.
And to everyone else who is regularly contributing. Some very cool people here.
I noticed the right motor (the end where my laptop is) would buzz anytime the z was tweaked, for example when zeroing the bit. Does this eliminate that?
Thanks @seware! I get to listen to everyone’s suggestions and then I look smart
@mooselake That hum is normal, it’s just the sound of the PWM signal which control the motor. The left motor sounds different from the right motor because they are controlled using different PWM sources due to the fact that the pin which controls the left motor is also used by the timer which is used to trigger the PID loops to recompute. We’ll probably make them both sound the same eventually just for everyone’s sanity, but they should function the same
Sweet. Thank you so much.
I was close in my guess that:
But honestly, I am well over my head here so that is nothing more than dumb luck.
Still a pretty spot on guess!
I couldn’t tell if the left end buzzed; it was over 10 feet away and I had the laptop on the end of a lumber pile. To be more specific, I was only moving the Z (tweaking the bit position a, well, bit at time to zero it), not X/Y, and a bit surprised that there was any activity on the other axes every time the Z moved.
It looked like the sled didn’t move.
Is there any way you could implement an autotune feature in the future? Every sled will be different. BTW, I’m an ignorant PIDder (hopefully not PIDdler), but do know an expert on another forum
The results of the change to Ki are night and day. Everything runs and sounds so much smoother now. When performing a drilling step, the XY positioning remains extremely constant and the motors don’t sound like they are struggling.