Evaluating Model Performance with Residual Analysis

You have conducted a DoE, visualized the results, used ANOVA to create a model that you can now use for decision making. But how good will the decisions be that you are going to make based on the model you have created. That is something you need to find out.

Visual Comparison

One of the simplest forms of model validation is visually comparing predicted values to measured values. This method offers a quick impression of whether the model is reasonable. Take a look at the plots below. Although the measured points may seem to deviate significantly from the red line, the good fit effectively captures the overall trend of the data. Something that is not the case for the bad fit, where we clearly see a curvature, which the model fails to account for.

However, as the number of factors and response variables increases, visual comparisons will be challenging. This is where residual analysis becomes more effective.

Residuals

What are residuals?

Residuals are the differences between the observed values and the values predicted by the model. They provide insight into the model's accuracy.

 
 

Residual Sum of Squares

One way to assess the quality of your model is by calculating the residual sum of squares (RSS). As the name suggests, RSS is the sum of all the residuals, specifically the sum of the squared residuals. Squaring the residuals ensures that both positive and negative deviations are accounted for, preventing them from canceling each other out. Generally, the smaller the RSS, the better the model.

$RSS = \sum (y_i - \hat{y}_i)^2$

where $y_i$ is the observed value and $\hat{y}_i$ is the predicted value.

Temperature Concentration Pressure Measured Result (Y) Predicted Result (Ŷ) Residual (Y - Ŷ) RSS
25 0.1 1 5.0 4.8 0.2 0.04
30 0.2 1.5 6.2 6.0 0.2 0.04
35 0.3 2 7.1 7.3 -0.2 0.04
40 0.4 2.5 8.3 8.1 0.2 0.04
45 0.5 3 9.0 8.9 0.1 0.01
50 0.6 3.5 10.1 10.0 0.1 0.01

One problem with the residual sum of squares (RSS) is that it is not standardized, meaning its value depends on the scale of the data and the number of data points. This makes it difficult to compare the goodness of fit across different models or datasets.

R-squared

R-squared, on the other hand, is a better value because it is standardized. Most people might know this statistical measure from excel. It ranges from 0 to 1, making it easier to interpret and compare across different models. It represents how much of the variation of the response can be explained by the tested factors. An R-squared value closer to 1 indicates a better fit, showing that a larger proportion of the variance is explained by the model.

The formula to calculate R-squared is:

$R^2 = 1 - \frac{\text{SS}{\text{res}}}{\text{SS}{\text{tot}}}$

where:

$SS_{res}$ is the residual sum of squares (RSS). $SS_{tot}$ is the total sum of squares.

R-squared = 1: This indicates that the model explains 100% of the variance in the response variable. The predicted values perfectly match the observed data.

R-squared = 0: This indicates that the model does not explain any of the variance in the response variable. The model’s predictions are no better than the mean of the observed data.

0 < R-squared < 1: Values between 0 and 1 indicate the proportion of the variance in the dependent variable that is predictable from the independent variables. For example, an R-squared of 0.9 means that 90% of the variance in the response variable is explained by the model.

However, the R-squared value also has its limitations. For example, R-squared does not provide information on non-linearity or time dependent effects. However, residual plots do.

Residual Plots

Residuals vs. Predicted:

A residuals vs. predicted plot is a graph that shows the residuals (errors) on the vertical axis and the predicted values on the horizontal axis. Ideally, it should show a random scatter, indicating a good fit. If the plot shows a pattern, such as a trumpet shape, it suggests that your model is not a good representation of the overall trend and that something is wrong. Sometimes it helps to transform your data (e.g., log or square root), sometimes you forgot to add an interaction term, sometimes the model might be missing a quadratic term.

Residuals vs. Run:

The residuals vs. run plot shows the residuals versus the run order of the experiments. A random scatter of points is again desired. Trends in this plot can indicate time-dependent effects (e.g., temperature changes during the experiment). Blocking and randomization help ensure that these trends do not affect the analysis.

Model validation is a critical step that confirms the reliability of your experimental findings. Without it, decisions based on the model may be wrong.

Previous
Previous

What is a QQ-Plot and why is it important?

Next
Next

ANOVA with Python for intermediates