Design of experiments has been around for fifty years, the software to run it just as long, and most people still do not use it. It looks complicated, people think a DoE requires hundreds of experiments. So most of us stuck with changing one factor at a time, which at least felt like less work. What I think held DoE back was the expertise to drive it, knowing which of the many designs to pick and how to properly analyze and interpret the results.
I believe that generative AI will take all that off our hands. In the not-so-far future we will just specify a goal and an AI agent will figure out the most efficient way to reach it, with as few experiments as possible. It picks the right design, analyzes the results for you, and suggests the right follow-up experiments. At every step it explains its reasoning so you always stay in the loop (until it gets so good that you don’t have to anymore).
This is how I picture the future of experimental work. We won’t use any DoE software tools anymore. No need for statistical training. We’ll only have a conversation with an AI agent (or maybe multiple at a time). This essay is an example of how I imagine this future.
What you had to learn to run DoE
I don’t think it is a coincidence that most DoE software providers are also quite heavy on providing courses or other kinds of learning material. Because it is not straight forward to apply Design of Experiments. You have to learn and understand quite a lot.
The first is choosing the design. Most people who have heard of DoE know one design, the full factorial design. It is the easiest one to understand but unfortunately also the most expensive to run. For example you need 128 runs if you test 7 factors which is not that uncommon. But the full factorial design is only one design among dozens and the right fractional design tests those same 7 factors with only 16 experiments instead of 128. Previously you needed to know that this fractional design existed and more importantly how to properly use it.
The second is analyzing the results. You had to fit a model, judge which main effects and interactions are significant through statistical methods like ANOVA and while doing all that also navigate confounding that was introduced through the design; for example the fractional factorial design I just mentioned.
And the third is choosing the next set of experiments. Because usually you’re not finished after just one design you ran and you shouldn’t be. There’s this whole concept of sequential experimentation that says you should actually move in small steps and reach your goal through a multiple smaller designs and not one large. It’s much more effective this way but you had to know what design makes sense to run next and how to cleverly adapt the design space.
These are all the things you had to learn through a lot of dedication or expensive courses or both.
Let the agent be the design expert
If the agent is the design expert you just describe the goal you have and provide the necessary domain knowledge and the agent figures out the rest.
If we choose to test every possible combination, that is called a full factorial design. These designs are quite detailed and they usually provide much more information about the factors and their effects than needed. For example, a full factorial design can tell you whether the effect of temperature changes depending on which hardener and which catalyst concentration you use together. This is called a three-way interaction and they are usually not relevant.
The design we use instead is a so-called fractional design. The way it works is that some effects share the same runs. You get a combined number that contains the effect of temperature alone and the three-way interaction I just described, and you cannot separate the two. But that is fine because the three-way interaction is usually so small that the combined number is almost entirely the effect of temperature. The design we choose also combines two-way interactions, which is slightly more problematic because the effect of temperature might actually be influenced by the catalyst concentration. But we will figure that out as we go. Does that sound reasonable to you?
| Factor | Low setting | High setting |
|---|---|---|
| Curing temperature | 60 °C | 80 °C |
| Hardener type | Type A | Type B |
| Catalyst level | 1 % | 5 % |
| Curing time | 30 min | 60 min |
| Run | Curing temperature | Hardener | Catalyst | Curing time |
|---|---|---|---|---|
| 1 | 60 °C | type A | 1 % | 30 min |
| 2 | 60 °C | type A | 3 % | 60 min |
| 3 | 60 °C | type B | 1 % | 60 min |
| 4 | 60 °C | type B | 3 % | 30 min |
| 5 | 80 °C | type A | 1 % | 60 min |
| 6 | 80 °C | type A | 3 % | 30 min |
| 7 | 80 °C | type B | 1 % | 30 min |
| 8 | 80 °C | type B | 3 % | 60 min |
| Run | Curing temperature | Hardener | Catalyst | Curing time | Hardness |
|---|---|---|---|---|---|
| 1 | 60 °C | type A | 1 % | 30 min | 33 |
| 2 | 60 °C | type A | 3 % | 60 min | 87 |
| 3 | 60 °C | type B | 1 % | 60 min | 57 |
| 4 | 60 °C | type B | 3 % | 30 min | 68 |
| 5 | 80 °C | type A | 1 % | 60 min | 88 |
| 6 | 80 °C | type A | 3 % | 30 min | 72 |
| 7 | 80 °C | type B | 1 % | 30 min | 77 |
| 8 | 80 °C | type B | 3 % | 60 min | 84 |
From experiments to goals
This was just a small example of how the future could look, and I don’t think this future is far away.
And it is only the beginning. Classical design of experiments is a great tool, but you can imagine this same agent also reaching for Bayesian optimization and much more. In fact, there is research showing that the LLM itself, on its reasoning alone, is quite good at running experiments and suggesting the next ones to try. As these models get smarter, you might not need a formal method at all. And once you connect your whole database of past results and experiments, it only gets better.
So the question is, will this make humans obsolete? I don’t think it will but we need to change and adapt to work with these systems. Our job moves more towards deciding which problems are worth solving. The agent optimizes towards a target. Picking the target, and knowing why it’s the right one, is the part that stays human.