Something I couldn't see was how those examples actually work, there are no actions specified. Do they watch a user, default to randomly hitting the keyboard, neither and you need to specify some actions to take?
What about rerunning things?
Is there shrinking?
edit - a suggestion for examples, have a basic UI hosted on a static page which is broken in a way the test can find. Like a thing with a button that triggers a notification and doesn't actually have a limit of 5 notifications.
Rerunning things: nothing built for that yet, but I do have some design ideas. Repros are notoriously shaky in testing like this (unless run against a deterministic app, or inside Antithesis), but I think Bombadil should offer best-effort repros if it can at least detect and warn when things diverge.
Shrinking: also nothing there yet. I'm experimenting with a state machine inference model as an aid to shrinking. It connects to the prior point about shaky repros, but I'm cautiously optimistic. Because the speed of browser testing isn't great, shrinking is also hard to do within reasonable time bounds.
Thanks for the questions and feedback!
Recently evaluated other testing tools/frameworks and if you're not already running the npm-dependencyhell-shitshow for your projects, most tools will pull in at least 100 dependencies.
I might be old fashioned but that's just too much for my taste. I love single-use tools with limited scope like e.g. esbuild or now this.
Will give this a try, soon.
Ok I will see myself out
(Yes I know it's actually from the Tolkien book)
Sometimes, only thing you can do is let the plague spread, and hope that the people who survive start showering and washing their hands.
[0]: I once interviewed at a company that sold a kind of on-prem VM hosting and storage product. They were shipping a physical machine with Linux and a custom filesystem (so not ZFS), and they bragged about how their filesystem was very fast, much faster than ZFS or Btrfs on SSDs. I asked them, if they were allowed to tell me how they achieved such consistent numbers. They listed a few things, one of which was: "we disabled block-level check-summing". I asked: "how do you prevent corruption?". They said: "we only enable check-summing during the nightly tests". So, a little unsettled, I asked: "you do not do _any_ check-summing at any point in production"? They replied: "Exactly. It's not necessary". So, throwing caution to the wind (at this point I did not care to get the job), I asked: "And you've never had data corruption in production"? They said: "Never. None". To which I replied: "But how do you _know_"? My no-longer-future-coworker thought for a few seconds, and realization flashed across his face. This was a company that had actual customers on 2 continents, and was pulling in at least millions per year. They were probably silently corrupting customer data, while promising that they were the solution -- a hospital selling snake-oil, while thinking it really is medicine.
Nice name, now who is he?
It's helpful to know what the tool maintainers see as upcoming or incomplete work. It also saves a consultant like me a lot of time to evaluate new tools for clients if I also know the limitations before diving in. Maybe a section in the manual for "What Bombadil can't do".
Great work!
Should be pretty easy to make it deterministic if you follow that precondition.
(How I had my review apps wired up was I dumped the staging DB nightly and containerized it, I believe Neon etc make it easy to do this kind of thing.)
Ages ago I wired up something much more basic than this for a Python API using hypothesis, and made the state machine explicit as part of the action generator (with the transitions library), what do you think about modeling state machines in your tests? (I suppose one risk is you donβt want to copy the state machine implementation from inside the app, but a nice fluent builder for simple state machines in tests could be a win.)
Microsoft had a remotely similar tool named Pex [1] but instead of randomly generating inputs, it instrumented the code to also enable executing the code symbolically and then used their Z3 theorem proofer to systematically find inputs to make all encountered conditions either true or false and with that incrementally explore all possible execution paths. If I remember correctly, it then generated a unit test for each discovered input with the corresponding output and you could then judge if the output is what you expected.
[1] https://www.microsoft.com/en-us/research/publication/pex-whi...
Regarding state machines: yeah, it can often become an as-complex mirror of the system your testing, if the system has a large complicated surface. If on the other hand the API is simple and encapsulates a lot of complexity (like Ousterhout's "Deep Modules") state machine specs and model-based testing make more sense. Testing a key-value store is a great example of this.
If you're curious about it, here's a very detailed spec for TodoMVC in Bombadil: https://github.com/owickstrom/bombadil-playground/blob/maste... It's still work-in-progress but pretty close to the original Quickstrom-flavored spec.
I work at Antithesis now so you can take that with a grain of salt, but for me, everything changed for me over a decade ago when I started applying PBT techniques broadly and widely. I have found so many bugs that I wouldn't have otherwise found until production.
So occasionally I got mails by "some colleague on behalf of Sauron" back then
Jokes aside, great project and documentation (manual)! Getting started was really simple, and I quickly understood what it can and cannot do.
Is there a video showing someone spinning this up and finding a bug in a simple app?
A broken counter app maybe?
"Just Another Victim Of The Ambient Morality" is one of my favorites.
Property-based testing for web UIs, autonomously exploring and validating correctness properties, finding harder bugs earlier.
Runs in your local developer environment, in CI, and inside Antithesis.
[!NOTE] Bombadil is new and experimental. Stuff is going to change in the early days. Even so, we hope you'll try it out!
Learn all about Bombadil with the following resources:
Or, if you want to hack on it, see Contributing.
Old Tom Bombadil is a merry fellow,
Bright blue his jacket is, and his boots are yellow.
Bugs have never fooled him yet, for Tom, he is the Master:
His specs are stronger specs, and his fuzzer is faster.
Built by Antithesis.
See, the idea with the semantic web, and the ARIA markup equivalents, is that things should have names, roles, values, and states. Devs frequently mess up role/keyboard interaction agreement (example: role=menu means a list will drop on Enter keypress, and arrow keys will change the active element), and with ensuring that state info (aria-expanded, aria-invalid, etc.) is updated when it should be.
Then I checked the Antithesis website. They don't even have focus state styling on any of the interactive elements. sigh
Unfortunately, I concluded that Bombadil is a toy. Not as in "Does some very nice things, but missing the features for enterprise adoption." I mean that in a very strong sense, as in: I could not figure out how to do anything real with it.
The place where this is most obvious is in its action generator type. You give it a function that generates actions, where each action is drawn from a small menu of choices like "scroll up this many pixel" or "click on this pixel." You cannot add new types of actions. If you want to click on a button, you need it to generate a sequence of actions to first scroll down enough, and then look up a pixel that's on that button.
Except that it selects actions randomly from your list, so you somehow need the action generator, when run the first time, to generate only the scroll action, and then have it, when run the second time, generate only the click action. If you are silly enough as to have an action generator that, you know, actually gives a list of reasonable actions, you'll get a tester that spends all its time going in circles clicking on the nav bar.
(Something in the docs claimed that actions are weighted, but I found nothing about an actual affordance for doing that. Having weights would make this go from basically impossible to somewhat impossible.)
(Edit: Found the weighted thing.)
I am terrified to imagine how to get Bombadil to fill out a form. At least that seems possible -- you can inspect the state of the web page to figure out what's already been filled out. But if you want the state to be based on anything not on the current page, like the action that you took on the previous page, or gasp the state on disk (as for an Electron app), that seems completely impossible. Action generators are based on the current state, and the state must be a pure function of the web page.
Its temporal logic has a cool time-bounded construct, but it's missing the U (until) operator. One of their few examples is "If you show the user a notification, it disappears within 5 seconds." But I want to say "When you click the Generate button, it says 'Generating...' up until it's finished generating." And I can't.
(Note: everything above is according to the docs. Hopefully everything I said is a limitation of the docs, not an actual limitation of the framework.)
I shared my comments with the author yesterday on LinkedIn, but he hasn't responded yet. Maybe I'll hear from him here.
I have a pretty positive opinion of Antithesis as a company and they seem to be investing seriously in it, and generally see it as a strong positive sign when someone knows what temporal logic is, so I have hopes for this framework. I am nonetheless disappointed that I can't use it now, especially because I was supposed to finish an internal GUI testing tool this week and my god I'm behind.
The bug is subtle: we do have styles for that, but in some change which I probably did we use css variable, which isn't there...
We will fix that soon.
Also, as you note, you can't implement custom actions. And that's just something that I haven't gotten to yet. It'd be quite straightforward to add a plain function (toStringed) as one of the variants of the Action type and send that to the browser for eval. (Btw, we're taking contributions!)
Weighting is also important right now. There's no smart exploration in Bombadil yet, only blind random, so manual weighting becomes pretty crucial to have more effective tests (i.e. unlikely to run in circles). I'd like to both make the Bombadil "fuzzer" better at this, but eventually you might want to run Bombadil inside Antithesis instead to get a much better exploration, even in a single campaign.
The Until operator is also one of those things that I haven't gotten to yet. I actually didn't expect someone to hit this as a major limitation of the tool so quickly, which is why I've focused on other things. Surprising!
To add some more context, Bombadil is an OSS project with one developer (but we're hiring!) and it's 4 months old and marked experimental. I'm sorry you were disappointed by its current state, but it's very early days, so expect a lot of these things to improve. And your feedback will be taken into account. Thanks!