What is a model in DoE and why do I need one?

In practice, performing an experiment with only three factors rarely requires a complex model. But how often do we actually encounter such simple scenarios? More often, you’re dealing with more factors and that’s when the data starts to get overwhelming. This is where a model comes into play. But what actually is a model?

What is a Model and why is it important?

When we talk about models, we are essentially discussing a structured way to understand and predict the behavior of our system. A model is a mathematical representation of the relationship between your input factors and the output response.

Visualizing these relationships, as we did in previous blog posts, works well for main effects and simple two-way interactions. However, it becomes increasingly more difficult when more than three factors are involved. The reason is that we are kind of limited to two, maybe three dimensions when we visualize results. That is not the case when we are working with a model. There we get a full picture.

Components of a model

To build a model, we need to consider several key components:

  1. Factors (Independent Variables): These are the inputs or variables that you manipulate in your experiments. Factors can be anything that might affect the outcome of your experiment, such as temperature, pressure, concentration of a chemical, or even the amount of time an operation is performed. Identifying the right factors is crucial because they form the foundation of your model.
  2. Levels: Each factor can take on different values or levels. For example, if temperature is a factor, its levels could be 50°C, 100°C, and 150°C. The choice of levels depends on the range over which you want to study the factor's effect.
  3. Responses (Dependent Variables): The response is the outcome or result that you measure in your experiment. It's what you're ultimately interested in predicting or optimizing. For instance, in a chemical reaction, the yield or the purity of the product could be the response.
  4. Interactions: Interactions occur when the effect of one factor depends on the level of another factor. For instance, the effect of temperature on reaction yield might change depending on the pressure. Understanding interactions is vital because they can reveal more complex dependencies that simple main effects cannot show.
  5. Error Terms: These represent the variability in your response that cannot be explained by the factors and their interactions. It's important to account for error because it provides a measure of the model's accuracy and reliability. Errors can arise from measurement inaccuracies, environmental fluctuations, or other uncontrolled variables.
  6. Model Coefficients: In a mathematical model, coefficients quantify the relationship between factors and responses. They indicate the magnitude and direction of the effect of each factor and interaction on the response. In a linear model, these coefficients are constants that multiply the factor values.

The simplest type of model: a linear model

The simplest type of model you can use to describe the relationship between factors and responses is a linear model. Linear models are easy to understand and often sufficient for many practical applications.

A linear model assumes that the response variable is a linear combination of the input factors. The model is typically expressed as a mathematical equation. For a linear model, this would be something like:

$Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \beta_{12}X_1X_2 + \epsilon$

where $Y$ is the response, $\beta$ are the coefficients, $X$ are the factors, and $\epsilon$ is the error term.

An Example of Fitting a Model

Let's go through a simple example to illustrate how to fit a linear model.

Imagine you are working in the coatings industry and want to understand how temperature $X_1$ and curing time $X_2$ affect the hardness $Y$ of a coating.

Step 1: Identify the Factors and Response

  • Factors: Temperature $X_1$ and curing time $X_2$
  • Response: Hardness of the coating $Y$

Step 2: Design the Experiment

You decide to test three temperatures (150°C, 175°C, 200°C) and three curing times (30 minutes, 45 minutes, 60 minutes), resulting in 9 experiments.

Step 3: Collect Data

You perform the experiments and collect the following data:

Run Temperature $X_1$ Curing Time $X_2$ Hardness $Y$
1 150 30 40
2 150 45 42
3 150 60 45
4 175 30 50
5 175 45 55
6 175 60 60
7 200 30 55
8 200 45 60
9 200 60 65

Step 4: Fit the Model

To fit a linear model, we use a simple equation:

$Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \epsilon$

Based on our experiments, we need to find the coefficients $\beta_0$, $\beta_1$ and $\beta_2$. Of course computers do that for us but simply explained, this is done by trial and error.

  1. You guess values for the coefficients
  2. You calculate Y for the tested factors
  3. You calculate the difference between calculated and measured
  4. You try again until you minimized that difference

Doing that for our example will lead us to the following model with the following coefficients:

${Y} = -21.9 + 0.35X_1 + 0.27X_2$

Run Temperature $X_1$ Curing Time $X_2$ Hardness $Y$ Predicted Hardness $\hat{Y}$
0 150 30 40 39
1 150 45 42 44
2 150 60 45 48
3 175 30 50 48
4 175 45 55 52
5 175 60 60 57
6 200 30 55 57
7 200 45 60 61
8 200 60 65 65

Step 5: Interpret the Results

This means:

  • The intercept $\beta_0$ is -21. This is the hardness when temperature and hardness are both at 0. There isn’t always a practical relevance to the intercept and it shows that the model has its limits.
  • For each 1°C increase in temperature $X_1$, hardness increases by 0.35 units.
  • For each 1-minute increase in curing time $X_2$, hardness increases by 0.27 units.

With this model you can now predict the hardness of the coating based on temperature and curing time. This helps you understand and optimize the process to achieve the desired coating properties.

 
 

As already mentioned in the beginning, for a simple experiment like this, data visualization would have been fine to understand the relationship between the factors temperature and curing time and the response hardness. However, in practice, coating hardness might also be influenced by binder type, catalyst used, hardener used, their ratios and concentrations and many more. Additionally, you might not only have one response variable but multiple. Like chemical resistance, flexibility, price…

This is when the model becomes really useful.

Also, you need the model perform an ANOVA to find out which parameters are significant. That’s what we will cover in the next blog post.

Previous
Previous

What is ANOVA? A beginners guide

Next
Next

Visualizing data from a full factorial design with Python