Calibration code / understanding

I’m having a dig around in the calibration code (I like a good algorithm, and geometry), can I check my high level understanding is right before I ask some code questions.

In essence calibration does:

  • Unwind X length of belt for all belts (measured via belt-teeth which nominally map to an exact length, barring the current talk on stretching).
  • Tighten the belts for an initial length of each belt at an arbitrary point, giving us measurements[0]
  • Slack a couple of belts, move the Maslow a little, tighten again, giving us measurements[1].
    • Repeat this a few times, giving us an initial array of measurements[].
  • With this initial measurements[] array try an initial fit of possible solutions of co-ordinates for the anchor points
  • If there is not-bad fitness, we can have the belts tight, and try some variation of 3x3, 5x5, 7x7 grid to populate our measurements[] array further.
    • Spiralling out and based on the initial location and the calibration area the user specified.

And the fitting is run in the browser of the device controlling the Maslow.

I guess my initial question (other than do I understand right) is whether we keep all measurements done in every stage in measurements[] or if we chuck any initial ones / only fit for a NxN set of results? I’m sure the answer is in the firmware code but I haven’t got round to digging into that yet!

1 Like

Dave wrote:

I guess my initial question (other than do I understand right) is whether we
keep all measurements done in every stage in measurements[] or if we chuck
any initial ones / only fit for a NxN set of results? I’m sure the answer is
in the firmware code but I haven’t got round to digging into that yet!

I believe that it keeps all measurements as it goes.

David Lang

2 Likes

@dlang is right, we are keeping all of the measurements.

There has been some playing around with cherry picking some measurements (say throwing out the worst 10%) but experimentally that doesn’t seem to give better results (although a lot more testing is needed)

Bar wrote:

There has been some playing around with cherry picking some measurements (say
throwing out the worst 10%) but experimentally that doesn’t seem to give
better results (although a lot more testing is needed)

If you are getting good tension and no arm/frame collisions, there’s not a need
to throw any of them out.

detecting that some are way off from others is an indication that something is
wrong and you need to go back and re-measure and figure out why those points are
bad.

David Lang

1 Like

That’s a good point - any valid data point is good! Though more points means more processing time if we’re limited to browser calculation.

From reading around, it seems like the main way we’re avoiding bad data points is avoiding overly large calibration grids where the corners are likely to be having arm collisions.

It is not clear from the code we’re doing any sanity checking of the data points but maybe I’m missing something?

I feel like if we have at least some idea of the size of the frame (from user measurements of their frame for example) we can calculate whether a data point is ‘risky’ (might have arm collisions) and potentially reject it.

I wonder if that can be taken even further - in my head, for horizontal, it seems like if we could get a user to plonk the Maslow top left & bottom right of a potential calibration area, and take a quick belt measurement from each, then we could run the calibration in an area defined by that - I think those two measurements would give you enough to run with tight belts but i’m not certain. :thinking: But I also need to get a better grasp of the code for moving the Maslow / controlling the belts to get an idea of what gotchas there are.

I also wonder in a much more general sense if a step back to what problem we’re trying to solve might help. I haven’t thought this through :laughing: but part of me wonders if we’re looking at this as ‘measure the frame size, calculate how many belt teeth to move for a given distance, calculate the number of teeth if I need to move X centimetres for a cut’
BUT
given the belt stretch questions (and more generally) I wonder if you can get a better result with some form of non-linear control where we sample and build up a map of ‘this point requires these many teeth on each belt to reach’ and only do small local interpolations between the points in the map. It might imply some level of user measuring in the calibration - I guess it might have come up before and been rejected for that reason?

Dave wrote:

That’s a good point - any valid data point is good! Though more points means more processing time if we’re limited to browser calculation.

we are using the browser because even people’s phones are so much more powerful
that they can do the math faster in javascript than the ESP32 can do in C

From reading around, it seems like the main way we’re avoiding bad data points
is avoiding overly large calibration grids where the corners are likely to be
having arm collisions.

correct

It is not clear from the code we’re doing any sanity checking of the data points but maybe I’m missing something?

we are not

I feel like if we have at least some idea of the size of the frame (from user
measurements of their frame for example) we can calculate whether a data point
is ‘risky’ (might have arm collisions) and potentially reject it.

frame collisions are only one cause of bad data points.

belts snagging on things (wasteboard, hoses, etc), frame flexing are also
significant risks.

I wonder if that can be taken even further - in my head, for horizontal, it
seems like if we could get a user to plonk the Maslow top left & bottom right
of a potential calibration area, and take a quick belt measurement from each,
then we could run the calibration in an area defined by that - I think those
two measurements would give you enough to run with tight belts but i’m not
certain. :thinking: But I also need to get a better grasp of the code for
moving the Maslow / controlling the belts to get an idea of what gotchas there
are.

it’s worth a try. There are codes that you can send to control the maslow belts
directly, and I believe you can also query the belt lengths. one issue you will
run into is figuring out when to stop pulling on a belt.

Part of the goal of the current calibration process is to require as little
human input and measurement as possible.

the earliest calibrations tried moving large distances between the measurements
and that caused problems with belts getting caught up in the gears. the new
shields were created, but also the calibration was changed to not try to move as
far before pulling the belts tight again so that there is never too much slack.

I also wonder in a much more general sense if a step back to what problem
we’re trying to solve might help. I haven’t thought this through :laughing:
but part of me wonders if we’re looking at this as ‘measure the frame size,
calculate how many belt teeth to move for a given distance, calculate the
number of teeth if I need to move X centimetres for a cut’ BUT given the belt
stretch questions (and more generally) I wonder if you can get a better result
with some form of non-linear control where we sample and build up a map of
‘this point requires these many teeth on each belt to reach’ and only do small
local interpolations between the points in the map. It might imply some level
of user measuring in the calibration - I guess it might have come up before
and been rejected for that reason?

The problem is how do you define ‘this point’, especially with high accuracy.
the webcontrol project (running the original maslow) tried an ‘optical
calibration’ setup where they put a camera in the router to spot the crossing
points of a grid for exactly your approach. They ran into the problem that there
is no good way to get accurate grids to calibrate on at our scale. They tried
getting posters printed, and (at least at the time) found that such prints by
commercial printers were not reliably accurage along the roll.

David Lang

1 Like

Yeah, i used to work on phone chips :slightly_smiling_face: In the future it would be interesting to look at whether the fitting could be offloaded to the GPU using WebGL, I don’t think there’s any reason it shouldn’t be possible :thinking:

True, true!
I wonder if there’s scope for trying things like sub-sampling from the set of points into 2/3/4 sets, fitting each one, taking the best and testing it against the rest.

And/or returning to previously tried points that are suspect but seem like they should give valid results.

Yeah, I understand the drive to minimise user interaction, in my experience it can be a double edged sword sometimes - it can force you into assumptions that you then never test against the users assumptions and end up tripping you up.

With what I was thinking it was aimed at being more interactive but lessening the amount of belt spooling done automatically before we’re in tight-belt-calibration (I think horizontal gives opportunities because the device can just sit there without support) - I’ll think on it a bit more and maybe try a few ideas over the weekend as to how it could work.

Yeah, I see that - it’s not a trivial problem and what you can get the user to do is limited - off the top of my head I wondered, if you get the user to plonk a full 8*4 as the spoil board, move the Maslow around till it’s near a corner, you can get the user to measure distance sled-edge to board-edge and repeat a few times (partly because the sled is round which helps). It’s not super accurate with a tape measure, but laser measures are not exactly expensive anymore :thinking:

Dave wrote:

With what I was thinking it was aimed at being more interactive but lessening
the amount of belt spooling done automatically before we’re in
tight-belt-calibration (I think horizontal gives opportunities because the
device can just sit there without support)

that’s what we are trying to do now, we get an initial guess with everyting
tight roughly in the center (belts close to the same length)

then we do a few short moves to get a few more points, which don’t require
letting out much belt, then make some calculations on those points. That usually
gets fairly close, enough that we should not have a lot of loose belt after
that.

then we do the next ring, all requiring small amounts of belt movement and then
refine the calculations.

earlier, instead of defining the grid in terms of size and number of points,
define it in terms of distance between points, possibly making the distance
smaller initially when we are less confident (so that less belt needs to be fed
out)

Yeah, I see that - it’s not a trivial problem and what you can get the user to
do is limited - off the top of my head I wondered, if you get the user to
plonk a full 8*4 as the spoil board, move the Maslow around till it’s near a
corner, you can get the user to measure distance sled-edge to board-edge and
repeat a few times (partly because the sled is round which helps). It’s not
super accurate with a tape measure, but laser measures are not exactly
expensive anymore :thinking:

it’s worth experimenting with.

David Lang

2 Likes

Managed to get up and running and got a test. Just a 3x3 calibration, interestingly it was pretty close on X but not great on Y (quite like this thread: Y Axis accuracy issue - #3 by Alex1), but I knew the calibration wasn’t great and wanted to try some actual cuts so I can work on all bits in parallel.

I definitely want to have a bit of a poke at the calibration code itself - am I right that all I need to do is build this project:

which will generate me a new index.html.gz with my changes and upload that to the board?

1 Like

Yup, that is exactly correct!

We’re here to answer any questions that might come up.

Another option if you just want to mess with the calibration code in a stand-alone form would be this project:

Unfortunately they are slightly out of sync so it’s not exactly the same, but it’s 95% the same.

Awesome, will give one / both of them a go :grin:

1 Like

(going to rewrite that all, on re-reading it was too stream-of-concious to convey my points well)

Dave wrote:

  • Do a bit of modelling - specifically around quick tweak of the code to output the estimated points as CSV, so I can easily slap them in a spreadsheet and do quick modelling / visualisation to show if my sets of points are all clustered in the way I expect them to be.

currently it outputs the belt measurements for each point to the browser as a
json (or at least json-like) array.look through the log of a calibration and you
will see that.

  • Take a fork of the code and refactor it to do pure sampling of possible points.
  • Take a fork of the code and make it do the combined sampling approach above.
  • Look a bit more at how we might factor in points&motor-current together.

There are people who have made web pages and spreadsheets that you can paste
this json array into and it tries to visualize it and run different calibration
routines against it.

this has included showing points that seem ‘off’ compared to the others, the
idea was to make a version that could throw out some number of ‘bad’ data
points.

@bar do you have a quick way to find some of those discussions (or at least
remember the 4 character tag, starts with a C ends with an R IIRC)

so you don’t need to work on modifying/forking the code, just in capturing the
data when it’s sent to the browser (or from the logs) to paste it in to your
attempt.

David Lang

Yep yep - i’ve been working on data pulled from my existing calibrations, that kind of ambiguity is why i’m going to rewrite everything I wrote at ~1am last night :joy: (things are slammed and i’ve been meaning to write this up for a while).

Yeah, any previous discussions I’m keen to have a look at :slight_smile:

Lets try this again, will probably need a few different replies.

Ok, what have I been doing. First, I grabbed a few data sets - sets of cailbration-point-belt-lengths (CPBLs?) taken from my calibration logs. I had:

  • My 3x3 @ 900 force
  • My 7x7 @ 900 force
  • My 7x7 @ 1500 force (more on why later)
  • Another set from another user who had scanned their frame and put their logs in thread.

I had a look at

and

to get some idea of the difference - for my purposes, the simulation is fine as i’m messing with the algorithm. Although I did just pull it locally and mess with the code like that, one nice thing i remembered is you can run webpages github using the html-preview project, like this:

https://html-preview.github.io/?url=https://github.com/BarbourSmith/Calibration-Simulation/blob/main/index.html

Like this.

I had a few hunches about sampling and starting from approximations, but I mostly was putting the data through a few times in different ways to see what it showed me and what speed-ups I might be able to get.

What I tried:

  • I started using ‘rough’ estimates of the frame size for the initial guess - both exact measured ones, and ones within 50mm for each frame corner (I figure anyone building a frame can tape measure it to within 10/20mm reliably, and even if they misunderstand measuring bolt-centre to bolt centre, they’ll get within 50mm.
  • I was using manual measurements, both assuming it was a rectangle, and using:
    http://lang.hm/maslow/maslow4_manual_calibration_simple.html
    to do estimation of the offsets.

My data was a small sample, but I was seeing that:

  • When I played with how close or far away you start, it didn’t make much difference beyond taking longer to find a set with high fitness - and increasing the chance you’ll hit a poor estimate with lower fitness.
  • I played with the initial point randomisation/mutation/sampling, and toned it down to ± 20mm - again, having it much more than that just made it more likely to iterate down to a low fitness answer.

And testing with initialGuess measured to within 50mm of actual, and point randomisation ± 20mm:

  • I would get to a converged or barely-changing estimate of a set of points in 100-1000 steps (ie very quickly).
  • And if I manually re-run it multiple times, I would get similar-but-slightly-different estimated points with similar plausibility (and some with lesser).

And with an output-set of points from that, I am getting good results (though I need more testing):
https://forums.maslowcnc.com/t/hexagons-are-the-bestagons/24123/6
And better estimates than I saw in the logs from the actual calibrations.

Simple steps:

I’m not saying it does make sense, but I could certainly imagine a user flow of:

  • Design frame, use something like http://lang.hm/maslow/maslow4_frame.html to get a feel for what a reasonable work area vs frame size is, but also to work out a reasonable area / size for calibration grid.
  • Build frame, measure the Z Offsets, measure the corners with a tape measure.
  • Put those corner values through http://lang.hm/maslow/maslow4_manual_calibration_simple.html to give a frameGuess
  • Run a calibration step using the grid size worked out earlier, potentially with no calibration calculation, just measuring and logging the CPBLs.
  • Put that frameGuess in as the initialGuess in the calibration simulator, along with the CPBLs and Z Offsets.
  • Have the user run the simulator a few times until they get a feel for what fitness they’re getting (do they need to work on the frame to stiffen it, etc), and then have them pick a sensible set of calibration output to put in the Maslow.yaml to use.

Which would largely just require a PR to pull the initialGuess / ZOffsets / CPBLs to the top of the calibrator file and label them so its easy for the user to find and modify them.

Obviously a lot can be potentially automated in that (and/or input fields in a webpage / generating a maslow.yaml from the calibration page, etc) - that’s just a quick bootstrapping way if people wanted to try it.

Now, that’s a simple step and possibly already been talked about, but I was looking at a bit more and it gets more speculative…

1 Like

Interesting stuff to investigate more - calibration force

On a hunch, I tested 7x7 with force 900, and 7x7 force 1500. I need to do some more testing modelling, but it definitely seemed like:

  • The frame estimation at force 900 comes in as slightly smaller than my measured frame (~5-10mm). The frame estimation at force 1500 comes in slightly smaller again (~5-10mm smaller again).

Now that’s not actually surprising I don’t think - stretch / flex would show up like that. On the one hand, that could be from belt stretch, or frame flex, or unit flex, and the internal friction of each arm could differ, and being sure which could be tricky. On the other, as long as it’s measurable and contributes to accuracy, the actual cause doesn’t matter (I’d argue).

While motor-current / calibration force isn’t seen as a good proxy for tension, I also think it’s not to be ignored.

I think that it would be interesting to investigate both:

  • Running calibration with different tensions and noting the tension associated with each CPBL set.
  • Possibly running the machine up and down tl<->br and bl<->tr with differing tensions (so only one pair of belts is tense) - I’m not as certain about this being useful, but it feels interesting enough to investigate more and / or model a bit to see what it might tell us.

Interesting stuff to investigate more - sampling

Ok, so from running the above, I saw:

Ok, so I have thoughts on the calibration algorithm - and this is heavily influenced by what I’ve done in the past with ray-tracing and machine learning stuff.

I think the method in the code is good (I think it’s described as a genetic algorithm, but I think it’s a variant of gradient descent). But my take on it is that it is based on an assumption that there’s a single right / perfect answer - that the frame/belts/unit are all perfectly stiff.

The measure of the currently-estimated set of frame points is termed fitness, and that does make sense, but I think it is also useful to think of it as a measure of plausibility - how plausible is it that the estimated set of points match the frame.

However I would suggest that, in the real world, the frame / belt / unit all flex. And that should mean you can get multiple sets of points that are all plausible - there could be lots of theoretical sets of output points that all have similar plausibility / fitness. Or even not the same plausibility, but the gradient descent will, if it gets close to a reasonably high plausibility, settle on a set of points quite far from the reality.

And I believe that matches the user experiences we’re seeing.

Conjecture time.

My gut feeling is we should be treating this as a weighted-sampling problem instead.

If we take a step back from plausibility-maximisation, and instead of treating a single set of output points as the result we use, we could treat it as a single sample amongst many. We also could also weight each sample by the plausibility/fitness. I suspect we can get a much better estimate of the actual points that way.

So, one could:

  • Take a rough-measured estimate of the frame and use that as a starting point for our sampling.
  • For 10/100/1000 times:
    • Do a single sample ± 20mm from the rough-measurelike we currently do.
    • Run the gradient descent probably with cut-off of 1000 iterations, but possibly less.
    • Use the resulting set of points as a single sample, weighted by plausibility in some fashion.

You could even do a continuously accumulated estimate that you keep sampling until it converges. :thinking:

From my experiences I noted in the previous post (that assuming you start with measured values, the iterations are quick) mean this shouldn’t be too time consuming - and comparable to some current experiences but with a better outcome.

The global context for all this

I’ve talked about both the calibration force and sampling together because I think they’re interrelated. I want to restate what I see as the problem we’re trying to solve:

Calibration - We want to come up with a model (specific to a specific frame) that gives us a number of belt-teeth to pull in on each belt, at a certain motor-current, to move the centre of the router to every XY position possible in our cutting area.

So - I’m very specifically calling out that what we care about is the number of belt teeth to pull in and at a specific motor-current.

We appear to see that calibration generates a frame size estimation that is slightly too small, and will be smaller the greater the force used (tbc). But the reality is that doesn’t actually matter if using that frame size estimation effectively includes / offsets the belt-stretch, etc - we just need to calibrate such that for a specific frame + maslow, we need to pull in specific belts to a specific length at a specific motor-force.

Now we might have to include motor-force as a proxy and / or sample at different motor-forces during calibration to achieve that - and that’s the kind of thing I want to look into at some point.

1 Like

What I want to do next:

  • Do a bit of modelling - I want to capture a bunch of outputs from what i’ve been trying (as CSV for example) so I can easily slap them in a spreadsheet and do quick modelling / visualisation to show if my sets of output points are all clustered in the way I expect them to be (I expect they’ll cluster around an accurate estimate of the frame thats scaled slightly smaller).
  • Take a fork of the code and refactor it to do pure sampling of possible points.
  • Take a fork of the code and make it do the combined sampling approach above.
  • Look a bit more at how we might factor in points&motor-current together.

I did have a quick try at running multiple samples, but the entire code seems set up specifically iterate on a single estimate, so a bit more time is needed for me to unpick it / make it do what I want.

1 Like

This is an awesome post :+1: :+1: :+1: :+1: :+1: :+1: :+1: :+1: :heart: :heart: :heart:

I am 100% of the opinion that the way we are doing things is the best way that we know how to do them now, but not the right way and that if the current approach hasn’t improved dramatically in the future I’ll be disappointed. I think that we can and should do better.

I agree with this, but much very hand wavey understanding is that as the number of sample points goes up the space becomes “smoother”. In theory we only need three data points to start the process, but I found that with just three data points there are way to many fake valleys to fall into.

If we say the space looks something like this (except of course we’re in a 5 dimensional space):

The more data points (measurements) we add, the more the system flattens out because there are fewer and fewer “plausible” wrong solutions.

I want to push back on this a bit. This is actually how the system worked up until about two months ago and I think that we need to avoid user input because it creates a lot of opportunity for something to go wrong. Us humans aren’t great at reading and carefully following instructions and folks input all sorts of crazy things instead of the right measurements. Even the selection of “horizontal vs vertical” is a problem. Almost every week someone posts that they aren’t able to calibrate and are frustrated and that turns out to be the problem and I can’t criticize because I myself have gotten that one wrong at least four or five times and sometimes even taken hours to figure out what I did wrong :joy:

I don’t think that this is a big issue tho because we should be able to make something which converges reasonably fast even without a good first guess.

I’m not totally sure that I understand, but I think I like what I’m hearing. The basic idea is that we run the full process to find “anchor point locations” for a whole bunch of random starting points and then we either pick the best ones and average them, or we pick the most commonly found values, or something like that?

Let me know if you want any help unpacking specific bits of the code. It’s grown and evolved over time and is probably due for a rewrite

1 Like

@bar - will do a proper reply soon - i’m off to a board game convention in the morning and everything’s a bit manic :laughing: - but one little point on:

I both agree and disagree (I used to do engineer-to-engineer support for very complex hardware, so I do have some experience) - there’s currently a whole heap of assumptions the user can break inadvertently.

And quite a subtle example of being careful how you do this - currently the Horizontal vs Vertical has a default, which means a user can ignore it - it’s an assumption we don’t force a user to check. If it defaulted to undefined and you forced that user to actively pick each time (with a stopping error if they don’t), you might see less user error?

2 Likes

This is brilliant! There should absolutely not be a default. Phenomenal suggestion!

1 Like