What is a model in DoE and why do I need one?

Jul 21

In practice, performing an experiment with only three factors rarely requires a complex model. But how often do we actually encounter such simple scenarios? More often, you’re dealing with more factors and that’s when the data starts to get overwhelming. This is where a model comes into play. But what actually is a model?

What is a Model and why is it important?

When we talk about models, we are essentially discussing a structured way to understand and predict the behavior of our system. A model is a mathematical representation of the relationship between your input factors and the output response.

Visualizing these relationships, as we did in previous blog posts, works well for main effects and simple two-way interactions. However, it becomes increasingly more difficult when more than three factors are involved. The reason is that we are kind of limited to two, maybe three dimensions when we visualize results. That is not the case when we are working with a model. There we get a full picture.

Components of a model

To build a model, we need to consider several key components:

Factors (Independent Variables): These are the inputs or variables that you manipulate in your experiments. Factors can be anything that might affect the outcome of your experiment, such as temperature, pressure, concentration of a chemical, or even the amount of time an operation is performed. Identifying the right factors is crucial because they form the foundation of your model.
Levels: Each factor can take on different values or levels. For example, if temperature is a factor, its levels could be 50°C, 100°C, and 150°C. The choice of levels depends on the range over which you want to study the factor's effect.
Responses (Dependent Variables): The response is the outcome or result that you measure in your experiment. It's what you're ultimately interested in predicting or optimizing. For instance, in a chemical reaction, the yield or the purity of the product could be the response.
Interactions: Interactions occur when the effect of one factor depends on the level of another factor. For instance, the effect of temperature on reaction yield might change depending on the pressure. Understanding interactions is vital because they can reveal more complex dependencies that simple main effects cannot show.
Error Terms: These represent the variability in your response that cannot be explained by the factors and their interactions. It's important to account for error because it provides a measure of the model's accuracy and reliability. Errors can arise from measurement inaccuracies, environmental fluctuations, or other uncontrolled variables.
Model Coefficients: In a mathematical model, coefficients quantify the relationship between factors and responses. They indicate the magnitude and direction of the effect of each factor and interaction on the response. In a linear model, these coefficients are constants that multiply the factor values.

The simplest type of model: a linear model

The simplest type of model you can use to describe the relationship between factors and responses is a linear model. Linear models are easy to understand and often sufficient for many practical applications.

A linear model assumes that the response variable is a linear combination of the input factors. The model is typically expressed as a mathematical equation. For a linear model, this would be something like:

$Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_{12}X_1X_2 + \epsilon$

where $Y$ is the response, $\beta$ are the coefficients, $X$ are the factors, and $\epsilon$ is the error term.

An Example of Fitting a Model

Let's go through a simple example to illustrate how to fit a linear model.

Imagine you are working in the coatings industry and want to understand how temperature $X_1$ and curing time $X_2$ affect the hardness $Y$ of a coating.

Step 1: Identify the Factors and Response

Factors: Temperature $X_1$ and curing time $X_2$
Response: Hardness of the coating $Y$

Step 2: Design the Experiment

You decide to test three temperatures (150°C, 175°C, 200°C) and three curing times (30 minutes, 45 minutes, 60 minutes), resulting in 9 experiments.

Step 3: Collect Data

You perform the experiments and collect the following data:

Run	Temperature $X_1$	Curing Time $X_2$	Hardness $Y$
1	150	30	40
2	150	45	42
3	150	60	45
4	175	30	50
5	175	45	55
6	175	60	60
7	200	30	55
8	200	45	60
9	200	60	65

Step 4: Fit the Model

To fit a linear model, we use a simple equation:

$Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \epsilon$

Based on our experiments, we need to find the coefficients $\beta_0$, $\beta_1$ and $\beta_2$. Of course computers do that for us but simply explained, this is done by trial and error.

You guess values for the coefficients
You calculate Y for the tested factors
You calculate the difference between calculated and measured
You try again until you minimized that difference

Doing that for our example will lead us to the following model with the following coefficients:

${Y} = -21.9 + 0.35X_1 + 0.27X_2$

Run	Temperature $X_1$	Curing Time $X_2$	Hardness $Y$	Predicted Hardness $\hat{Y}$
0	150	30	40	39
1	150	45	42	44
2	150	60	45	48
3	175	30	50	48
4	175	45	55	52
5	175	60	60	57
6	200	30	55	57
7	200	45	60	61
8	200	60	65	65

Step 5: Interpret the Results

This means:

The intercept $\beta_0$ is -21. This is the hardness when temperature and hardness are both at 0. There isn’t always a practical relevance to the intercept and it shows that the model has its limits.
For each 1°C increase in temperature $X_1$, hardness increases by 0.35 units.
For each 1-minute increase in curing time $X_2$, hardness increases by 0.27 units.

With this model you can now predict the hardness of the coating based on temperature and curing time. This helps you understand and optimize the process to achieve the desired coating properties.

As already mentioned in the beginning, for a simple experiment like this, data visualization would have been fine to understand the relationship between the factors temperature and curing time and the response hardness. However, in practice, coating hardness might also be influenced by binder type, catalyst used, hardener used, their ratios and concentrations and many more. Additionally, you might not only have one response variable but multiple. Like chemical resistance, flexibility, price…

This is when the model becomes really useful.

Also, you need the model perform an ANOVA to find out which parameters are significant. That’s what we will cover in the next blog post.

Marcel Butschle

What is a model in DoE and why do I need one?

What is a Model and why is it important?

Components of a model

The simplest type of model: a linear model

An Example of Fitting a Model

What is ANOVA? A beginners guide

Visualizing data from a full factorial design with Python