Visualizing data from a full factorial design with Python

In Design of Experiments (DoE), understanding how different factors influence outcomes is crucial. Two key concepts are main effects and interactions. Main effects reflect the impact of a single factor on the response variable, averaged across other factors. For instance, in a filtration experiment, the main effect of temperature shows how changes in temperature affect the filtration rate. Interactions, on the other hand, occur when the effect of one factor depends on another factor. For example, the impact of temperature on filtration rate might change depending on the stirring rate.

In this blog post, you will learn how to use Python to visualize main effects and interactions in your data. We'll use an example dataset related to a filtration rate experiment. The dataset includes the factors temperature (T), pressure (P), concentration of formaldehyde (CoF), and stirring rate (RPM).

Loading the Data

We'll start by loading the experimental data into a pandas DataFrame. For this example, the data is stored in an Excel file named example_filtration_rate_fullfact.xlsx.

import pandas as pd

df = pd.read_excel("example_filtration_rate_fullfact.xlsx")

Main Effects Plots

Main effects plots show the effect of individual factors on the response variable. They are straightforward to create using seaborn.

import seaborn as sns
import matplotlib.pyplot as plt

def create_main_effects_plots(data, y):
    # Get all column names except for the response variable
    factors = [col for col in data.columns if col != y]

    # Create a main effects plot for each factor
    for x in factors:
        sns.pointplot(x=x, y=y, data=data)
        plt.title("Main Effects Plot")
        plt.xlabel(x)
        plt.ylabel(y)
        plt.show()

# Create main effects plots for all factors
create_main_effects_plots(df, 'Filtration_rate')

This function iterates over all factors in the dataset (excluding the response variable) and creates a point plot for each.

You'll notice that temperature appears to have the highest effect on the filtration rate, while pressure has the smallest. However, be careful. The effect of temperature might depend on another factor. Let's examine the two-way interactions.

Interaction Plots

Interaction plots show how the effect of one factor depends on the level of another factor. We can create these plots using the lmplot function from seaborn.

from itertools import combinations

def create_interaction_plots(data, y):
    # Get all column names except for the response variable
    factors = [col for col in data.columns if col != y]

    # Generate all possible pairs of factors
    factor_pairs = combinations(factors, 2)

    # Create an interaction plot for each pair
    for x, hue in factor_pairs:
        sns.lmplot(x=x, y=y, hue=hue, data=data)
        plt.title(f"Interaction Plot for {x} and {hue}")
        plt.show()

# Create interaction plots for all pairs of factors
create_interaction_plots(df, 'Filtration_rate')

This function generates all possible pairs of factors and creates a linear model plot for each pair. The plots illustrate how the relationship between one factor and the response variable changes across different levels of another factor (two-way interactions).

Here, we observe that the effect of temperature on the filtration rate significantly depends on the levels of stirring rate (RPM) and concentration of formaldehyde (CoF). For instance, in the TxRPM interaction plot, a change in temperature has no effect on the filtration rate when the stirring rate is low (blue line). However, when the stirring rate is high (orange line), the filtration rate increases significantly with an increase in temperature. Similarly, in the TxCoF interaction plot, temperature changes do not affect the filtration rate at a high CoF level (orange line), but an increase in temperature significantly boosts the filtration rate when the CoF level is low (blue line).

That's it. We'll use the insights gained from visualizing the results when performing the ANOVA, which we will cover in the next post. Feel free to adapt and expand the provided code to suit your specific needs. Happy experimenting!

Previous
Previous

What is a model in DoE and why do I need one?

Next
Next

Create a full factorial design in DoE with Python