Maslow 4 is quite a bit more complicated than Maslow 1 was and as a result we’re seeing quite a bit less community engagement in working on the firmware. That makes a lot of sense. The code is big and complicated and even figuring out how to get it to compile can take hours of work whereas the Maslow 1 firmware could be uploaded using just the Arduino IDE.
Now that we’ve entered the age of vibe coding I’ve noticed a pattern emerging which is that our typical dev process goes like this:
Someone finds a bug and posts about it in the forums → I create an issue on GitHub for the bug → I assign the issue to Copilot (AI) → AI creates a fix → I compile the fix into a firmware .bin → I test the fix → If it looks good I upload the firmware.bin to the forums for testing.
I’m not sure how much actual value I’m providing in this chain, I’m mostly just a middleman for information and I was thinking about how I could make the system better and give the community more direct control to make and test changes to the firmware.
The solution that I came up with is an AI bot called @MaslowBot that does a few things for us.
You can create an issue on GitHub here which describes an issue the AI will automatically try to solve it for you and generate a set of proposed firmware changes which fix your issue. All that you need to do is to create an issue describing the behavior that you see clearly and what you would like to see instead. No programing of any kind is needed.
Once the AI has created a solution which you are ready to test you can comment “@MaslowBot Please Build” which will cause the bot to automatically generate a new firmware.bin file with the fix for your issue and attach it for you to download and test.
If you don’t like the changes (for example if your issue is not fixed) you can comment and say “@MaslowBot, this doesn’t fix my issue, I am still seeing the top left belt extend more than the bottom right one” or whatever you are seeing and the AI will work to make it’s fix better.
All proposed changes will still need to be reviewed by me before they can be merged into the firmware, but this creates a system which allows any member of the community to basically become a programmer who can propose changes to the firmware without needing to code.
This is very much an experiment, I’m not sure that I’ve seen anyone else try to do anything like this for an open source project so I would love to hear any feedback you have on how the system can be improved.
Pro tip - The system can actually improve itself so if you create a github issue describing how you would like the bot to function the bot can actually add that behavior to itself.
@bar this sounds pretty darn cool. We should name it Hal 9000.
Serious question though.
Since this may result in unintended consequences. Will Maslow Inc replace any parts/belts damaged by the AI produced firmware or is this a use at your own risk type of thing?
This seems like a terrible idea to me. Not to rain on the parade here but if user 1 finds a bug and creates a fix using the bot, user2 -2000 still have a system that has that same bug. And patches build on themselves. So once a patch for bug 1024 is built, your next version patch for bug 1025 is built on top of it. So the new patch has fixes for both bugs. If you let people start creating their own patches as they find bugs, user 1 comes across bug 1024 and user 2 comes across bug 1025 and they each build a patch for THAT bug in the moment, but neither of them have both patches and the rest of us don’t have either patch.
Plus not all of us are techie enough to feel comfortable with this so you remove yourself from patch building, and my device is useless…well it already is because it broke itself before it ever made an actual cut. But I figured hey no worries, I’m going to let it sit on the shelf and collect a little dust while the people that know more than me work out the kinks and revisit this and repair it once there is a stable bin and I can try again. It’s starting to sound like this was a $700 waste, as i am not technical enough to bug fix everything.
Ryan I could be wrong but I think the idea is that multiple people can contribute to the project, but nothing is put in the general distribution until it has been vetted. So the idea is anyone can make a change and test on their own machine and then submit it for evaluation. If it is then assessed as being worthwhile it may be incorporated in the general release.
I’m not totally sure that I understand, but I think there might be some misunderstanding. The idea here is just to make it easier for more folks to get involved in the software process, not to replace me.
There will still always be regular and vetted updates released which incorporate all of the latest changes (like version 1.12 which came out yesterday).
This is just a system that lets folks who want to get more involved get involved. It’s 100% optional and just for fun.
if you find a bug and submit an issue, you can use the copilot AI to create a
fix for the bug.
you need to test it to see if copilot did it’s job correctly (it ususally
doesn’t the first time) and when you get it working, you then submit the fix for @bar to review/improve. Other people can also look at the fix and suggest
improvements (see the branch copilot/fix-310 for an example of this) and then
after review/improvements, Bar will merge it into a release and the fix will be
there for everyone.
so this isn’t everyone creating their own fix again and again, it’s getting more
people involved in creating fixes for everyone.
@bar I would say that if the community is going to open the door to code that was not created by a person (and yes, I understand that people are just as fallible as an AI coder), it increases the need for automated testing coverage. I took a very quick look at the firmware repo, and I could be wrong, but it looks like there aren’t automated tests. And yes, I’m aware that in a project such as Maslow, where the final output is very much in the physical world, it means that end-to-end (E2E) tests would have to involve the router itself and therefore be quite difficult to automate, but there probably is still nonetheless a subset of the firmware behavior that can be tested in an automated fashion.
Either way, it may be worth considering vibe coding a fair number of automated tests, and including said tests in the build, just to provide some modicum of a safety mechanism, especially if AI is going to join the code submission party. Certainly, automated tests won’t protect against everything, but they can sometimes head an issue off at the pass (besides constituting working documentation of the code, nurturing thought about how each module in the code works, and nudging the code toward better quality).
(source: I’m a software engineer with close to 30 years of experience and am a big fan of automated testing.)
Either way, it may be worth considering vibe coding a fair number of automated
tests, and including said tests in the build, just to provide some modicum of
a safety mechanism, especially if AI is going to join the code submission
party. Certainly, automated tests won¢t protect against everything, but they
can sometimes head an issue off at the pass (besides constituting working
documentation of the code, nurturing thought about how each module in the code
works, and nudging the code toward better quality).
right now, the code is going through extensive manual review before being
accepted (the AI does the wrong things, solves the wrong problem enough that it
can’t be trusted), but tests are welcome.
I’m setting up a few boards in a test setup where it watches for new artifacts
and will load those on the boards to do sanity checks on them (initially, just
‘does it load and report the correct version after restart’ ) but I would
welcome assistance and suggestions on what tests we can run. I also plan to have
one or more boards hooked to motors/encoders so that we can do tests using some
real-world elements.
testing actual movement is going to be harder, but it can be done with smallish
setups that don’t involve an actual router or cutting, so I plan to do something
there as well.
as I say, suggestions welcome
It would also be fantastic if you can review the PRs that are generated and
suggest improvements. I am avoiding suggesting extensive changes/cleanups to
code that we otherwise aren’t touching (minimizing the divergence from upstream
in the hope that we can rebase to their newer version and or merge upstream)
I’ve submitted a dozen issues in the last week, all but one now have PRs that I
am comfortable with but more eyes the better.
I couldn’t possibly agree more strongly. I think that testing is very very much critical and for me I think that writing the code takes less than 10% of the time and 90% of the time is testing.
I would love to build out a hardware in the loop testing system which could test the hardware going through processes like calibration, but our system is quite complex and it could be tricky to implement.
Maybe a key first step is to work to eliminate the need to retract and extend the belts, that way a machine permanently attached to a frame might be able to run some tests automatically without a human needing to set it up each time.
Or maybe the simplest place to start could be with a really good checklist that needs to be followed before anything is merged?
I’ve been doing a lot of vibe coding these last few months.
It guesses, a lot.
With the FluidNC codebase you could tell that it was written by someone technically competent, but not a programmer. It’s simply not structured in the way a professional programmer would have done things. Vibe coding amplifies this sort of thing a lot, it will generate a lot of spaghetti code unless it is strictly instructed not to.
I would definitely recommend using AI to help in development. But the first uses of it should be highly disciplined. I would recommend getting it to focus on refactoring the code base, but not everything all at once, pick some area of the code’s functionality and keep it tightly focussed on that. It needs to be given the objective to reduce lines of code, not add, and to break the code into chunks that facilitate automated testing. Then get it to write the tests.
I would also get it to look for memory leaks. Or to hunt for “code smells” throughout the code base.
Always keep in mind with generative AI that its focus is on “generating” not on accuracy.
@md8n@Noreaster
I have submitted a dozen or so issues in the last week and helped copilot generate code to solve them. Most of them are very straightforward ‘obvious’ fixes
But I have one that is more complex, and one that copilot didn’t do the right thing immediately that really could use another set of eyes (at least)
This is code that, when connected to your home wifi, and have it enabled in maslow.yaml, will check for a new release when it connects to the wifi and download/install it.
I’ve finally fought through to get it to work, but have not yet looked at the main new file to see how much garbage there is from the false starts and dead-ends.
all the suspect code should be contained in the AutoUpdate.cpp file (it modifies a few other files, but they all look ‘obviously sane’)
I’ve had really good luck in cases like this re-writing the issue (or creating a new one) and having it start fresh. The AI struggles a bit with building on its own work and cleaning things up that went wrong so sometimes a fresh start is the way to go
I¢ve had really good luck in cases like this re-writing the issue (or creating
a new one) and having it start fresh. The AI struggles a bit with building on
its own work and cleaning things up that went wrong so sometimes a fresh start
is the way to go
I tried that for the autoupdate process, and the 2nd try went so bad that I did
a new issue to reopen the first try (which had been going well until something
went wrong and it stopped being able to respond).
I just haven’t dug in to the main file to see how good/ugly that code is.
add a button to the UI that does the check for updates and auto update
move the autoupdate files out of WebUI to ???
it does not currently do cert validation, which is a vulnerability, but we would have to make sure the time is valid if we enable it
change most of the logging from log_info to log_debug (keep decision results as info, but most of the details can be debug)
Improve version comparison starts at line
can the version comparison be short circuited
if the version numbers are the same, the git annoted one with more patches is newer (v1.12-4-abcd is newer than v1,12-2-dcba)
isNewerThan should be part isNewerVersion(that also handles other cases)
when checking for firmware.bin in the release, should we look for firmware.bin* to allow for versioned file names (as I discussed elsewhere this would make it easier for people to keep track of what file was what when doing development)
I have found copilot to be pretty good for vibe coding. However, I would very much recommend that you try out the cursor IDE which is based on VS Code.
I have found copilot to be pretty good for vibe coding. However, I would very much recommend that you try out the cursor IDE which is based on VS Code.
right now I’ve been using vs code to compile (because that’s what the
instructions say) but I do almost all my editing in vi, and have now started
compiling from the command line (I’m an old school sysadmin )
I’ll try to remember to look at cursor
so far, all my interactions with copilot have been via github, using the
mechanism that Bar setup to bounce my requests through a bot so that copilot
will pay attention to them.
@dlang We’ve got like 20 PRs open and I’d like to start merging them now that we’ve got 1.13 done. Can you let me know which ones are ready to go and I’ll start merging them?