Create a full factorial design in DoE with Python

Introduction

One of the fundamental types of experimental designs is the full factorial design. In this blog post, we'll explore how to create two-level and multi-level full factorial designs using Python and the pyDOE2 package. Let's dive in.

What is a Full Factorial Design?

A full factorial design tests all possible combinations of levels for each factor involved in an experiment. This approach provides comprehensive insights into the main effects and interactions between factors. Although it can require a large number of experiments, it ensures that no potential interaction is overlooked.

Why Use Python and pyDOE2?

Python is an open-source programming language with a vast array of libraries. The pyDOE2 package is specifically designed for constructing experimental designs, making it a perfect tool for implementing full factorial designs. Using Python and pyDOE2 allows for flexibility, reproducibility, and ease of integration with other python libraries when it comes to analyzing your design.

Setting Up the Environment

Before we dive into the code, ensure you have Python and the necessary packages installed. If you haven't already, install pyDOE2 using pip:

pip install pyDOE2

If you are unfamiliar with Python, make sure to visit:

Creating the Full Factorial Design

We'll use the pyDOE2 package to create a full factorial design for a hypothetical experiment involving four factors: Temperature (T), Pressure (P), Concentration (C), and Stirring Rate (RPM).

Two-Level Full Factorial Design

For a simple two-level full factorial design, you can use the ff2n function from the pyDOE2 package. This function takes the number of factors as its input and creates the two-level full factorial design matrix. You can then convert this matrix into a DataFrame and add factor names for better readability.

# Import necessary packages
from pyDOE2 import ff2n
import pandas as pd

# Define the number of factors
num_factors = 4

# Generate the full factorial design matrix
design_matrix = ff2n(num_factors)

# Convert the design matrix to a DataFrame for better readability
factor_names = ['T', 'P', 'C', 'RPM']
df = pd.DataFrame(design_matrix, columns=factor_names)

# Display the design matrix
print("Design Matrix for the Full Factorial Design:")
print(df)

This will output a design matrix like:

T    P    C  RPM
0  -1.0 -1.0 -1.0 -1.0
1   1.0 -1.0 -1.0 -1.0
2  -1.0  1.0 -1.0 -1.0
3   1.0  1.0 -1.0 -1.0
4  -1.0 -1.0  1.0 -1.0
5  ....

Multi-Level Full Factorial Design

If you need to create a design where your factors have more than two levels, or different factors have different numbers of levels, you can use the fullfact function. This function takes an array of integers representing the number of levels for each factor.

from pyDOE2 import fullfact
import pandas as pd

# Define the number of levels for each factor
levels = [2, 3, 4, 5]

# Generate the full factorial design matrix
design_matrix = fullfact(levels)

# Convert the design matrix to a DataFrame for better readability
factor_names = ['T', 'P', 'C', 'RPM']
df = pd.DataFrame(design_matrix, columns=factor_names)

# Display the design matrix
print("Design Matrix for the Full Factorial Design:")
print(df)

This will output a design matrix with 120 rows, representing all combinations of the factor levels:

T    P    C  RPM
0    0.0  0.0  0.0  0.0
1    1.0  0.0  0.0  0.0
2    0.0  1.0  0.0  0.0
3    1.0  1.0  0.0  0.0
4    0.0  2.0  0.0  0.0
..   ...  ...  ...  ...
115  1.0  0.0  3.0  4.0
116  0.0  1.0  3.0  4.0
117  1.0  1.0  3.0  4.0
118  0.0  2.0  3.0  4.0
119  1.0  2.0  3.0  4.0

[120 rows x 4 columns]

Randomization and Replication

To improve the reliability of our experimental results, we can introduce randomization and replication. Randomizing the order of experimental runs helps mitigate the impact of uncontrolled variables, while replication allows us to estimate experimental error.

# Randomize the order of runs
df_randomized = df.sample(frac=1).reset_index(drop=True)

# Number of replications
num_replications = 3

# Create a replicated design matrix
df_replicated = pd.concat([df_randomized] * num_replications, ignore_index=True)

# Display the replicated and randomized design matrix
print("Replicated and Randomized Design Matrix:")
print(df_replicated)

Saving the Design Matrix

For practical purposes, you may want to save the design matrix to an Excel file.

# Save the design matrix to an Excel file
df_replicated.to_excel('full_factorial_design.xlsx', index=False)

That's it. Done! Happy experimenting.

References

Previous
Previous

Visualizing data from a full factorial design with Python

Next
Next

Some basics in Python before you start with DoE