Create a Full Factorial Design in Python

Create a Full Factorial Design in Python

Professional DOE software can cost thousands of dollars - money that students and startups often don’t have. Fortunately, Python offers a free alternative! The pyDOE3 package lets you create professional experiment designs with just a few lines of code.

In this tutorial, we’ll walk through creating a full factorial design step by step. By the end, you’ll have a complete, randomized experiment plan ready for the lab.

Step 1: Installation

First, you’ll need to install a few essential libraries. pyDOE3 is for generating the design matrix, and pandas is for organizing it into a user-friendly table (a DataFrame).

Open your terminal or command prompt and run this command:

pip install pydoe3 pandas numpy

Step 2: Generate a Basic 2-Level Factorial Design

A 2^k full factorial design is an experiment where you have k factors, each tested at two levels (a “low” and a “high” setting). This design tests every single combination of these settings.

Let’s create a design for 3 factors (a 2³ = 8-run experiment). The pyDOE3 library makes this a one-liner with the ff2n() function.

from pyDOE3 import ff2n
import pandas as pd

# Generate the design for 3 factors
design_matrix = ff2n(3)

print(design_matrix)

What you’ll see:

Output:
[[-1. -1. -1.]
 [ 1. -1. -1.]
 [-1.  1. -1.]
 [ 1.  1. -1.]
 [-1. -1.  1.]
 [ 1. -1.  1.]
 [-1.  1.  1.]
 [ 1.  1.  1.]]

This NumPy array is the core of our design. Each row is an experimental run, and each column is a factor. The values -1 and +1 are coded units that represent the low and high levels for each factor, respectively.

While correct, this array isn’t very descriptive. Let’s use pandas to make it a proper table.

# Convert the array into a pandas DataFrame for better readability
plan = pd.DataFrame(design_matrix, columns=['Factor A', 'Factor B', 'Factor C'])

print(plan)

Now you get a much clearer table:

Factor AFactor BFactor C
0-1.0-1.0-1.0
11.0-1.0-1.0
2-1.01.0-1.0
31.01.0-1.0
4-1.0-1.01.0
51.0-1.01.0
6-1.01.01.0
71.01.01.0

This is a great start! We have a complete factorial plan. But “Factor A” and “-1” aren’t useful in the lab. Next, we’ll map these to real-world values.

Step 3: Map Coded Values to Real-World Units

Let’s imagine our experiment involves testing a chemical reaction. Our factors are:

  • Temperature: Low = 60°C, High = 90°C
  • Time: Low = 30 min, High = 60 min
  • Catalyst: Low = 0.10%, High = 0.30%

The best way to manage this is with a Python dictionary. This keeps your factor names and levels organized in one place.

# Define your factors and their low/high levels
factors = {
    "Temp_C":       (60, 90),
    "Time_min":     (30, 60),
    "Catalyst_pct": (0.10, 0.30)
}

# Generate the design using the number of factors
k = len(factors)
design_matrix = ff2n(k)

# Create the DataFrame with proper column names from our dictionary
plan = pd.DataFrame(design_matrix, columns=factors.keys())

# Now, map the -1 and +1 values to the real units
for factor, (low, high) in factors.items():
    plan[factor] = plan[factor].map({-1: low, 1: high})

print(plan)

Explanation:

  1. We created a factors dictionary where each key is the factor name (a string) and the value is a tuple (low, high).
  2. We use factors.keys() to automatically and correctly name the columns of our DataFrame.
  3. The for loop iterates through our dictionary. For each factor (e.g., “Temp_C”), it uses the .map() method to replace every -1 with its low value (60) and every +1 with its high value (90).

The result is a practical, easy-to-read plan:

Temp_CTime_minCatalyst_pct
060300.10
190300.10
260600.10
390600.10
460300.30
590300.30
660600.30
790600.30

Step 4: Finalize Your Plan with Best Practices

Our design is logically complete, but a real-world experiment requires a few more touches for statistical robustness. For a deep dive, check out our Principles of DoE: Randomization, Replication, Blocking article.

Add Replicates and Center Points (Optional)

  • Replicates: Repeating the entire set of factorial runs helps improve the precision of your results and provides a better estimate of experimental error.
  • Center Points: These are runs conducted at the middle level of all factors (e.g., 75°C, 45 min, 0.20% catalyst). They are crucial for detecting curvature or non-linear effects in your system.

Let’s build a new plan with 2 replicates of the factorial points and 3 center points.

# We'll use the 'plan' DataFrame from the previous step as our base
base_plan = plan.copy()

# 1. Add replicates
num_replicates = 2
replicated_plan = pd.concat([base_plan] * num_replicates, ignore_index=True)
replicated_plan["Type"] = "Factorial" # Add a tag for clarity

# 2. Add center points
num_center_points = 3
center_point = {factor: (low + high) / 2 for factor, (low, high) in factors.items()}
center_points_df = pd.DataFrame([center_point] * num_center_points)
center_points_df["Type"] = "Center" # Add a tag

# 3. Combine them
final_plan = pd.concat([replicated_plan, center_points_df], ignore_index=True)

Our final_plan now contains 2 * 8 = 16 factorial runs and 3 center point runs, for a total of 19 runs. The last step is crucial: randomization.

Randomize the Run Order

Running experiments in a structured order (like the one in our table) is risky. Uncontrolled variables (like ambient temperature changing during the day or equipment wearing out) can systematically bias your results.

Randomization is your best defense against this bias. It mixes up the run order so that the influence of any lurking variables is spread randomly across the entire experiment.

# Randomize the entire plan
# `random_state` makes the shuffle reproducible. Anyone with this number gets the same order.
randomized_plan = final_plan.sample(frac=1, random_state=42).reset_index(drop=True)

# Add a "Run" column for easy tracking
randomized_plan.index += 1
randomized_plan.index.name = "Run"

print(randomized_plan)

Your final, randomized plan will look something like this:

RunTemp_CTime_minCatalyst_pctType
190.030.00.1Factorial
275.045.00.2Center
360.060.00.3Factorial
490.060.00.1Factorial

Step 5: Export the Plan to CSV

Finally, save your plan to an excel file. This file can be easily opened in Excel or imported into lab software.

# Export the plan to a CSV file
randomized_plan.to_excel("full_factorial_plan.xlsx", index=false)

That’s it! You now have a complete, randomized, and ready-to-execute experimental plan in a file named full_factorial_plan.xlsx.

Bonus: Mixed-Level Full Factorials

What if one of your factors has 3 levels? Or you have a mix of 2, 3, and 4-level factors? The ff2n() function won’t work here, but pyDOE3 provides a more general function: fullfact().

Let’s design an experiment with:

  • Resin Type: 2 levels (“A”, “B”)
  • Oven Temp: 3 levels (70, 85, 100°C)
  • Pigment %: 4 levels (0, 5, 10, 15%)

This is a 2 × 3 × 4 = 24-run design.

from pydoe3 import fullfact

# Define the number of levels for each factor
levels = [2, 3, 4]
design_matrix = fullfact(levels)

# Note: fullfact() returns 0-indexed values (0, 1, 2...)
print(design_matrix[:5]) # Print first 5 rows
# [[0. 0. 0.]
#  [1. 0. 0.]
#  [0. 1. 0.]
#  [1. 1. 0.]
#  [0. 2. 0.]]

# Define the actual values for each level
factor_levels = {
    "Resin": ["A", "B"],
    "Oven_C": [70, 85, 100],
    "Pigment_pct": [0, 5, 10, 15]
}

# Create the DataFrame
plan_ml = pd.DataFrame(design_matrix, columns=factor_levels.keys())

# Map the indices to the real values
for col, values in factor_levels.items():
    plan_ml[col] = plan_ml[col].map(lambda idx: values[int(idx)])

# Randomize and save
plan_ml = plan_ml.sample(frac=1, random_state=42).reset_index(drop=True)
plan_ml.index += 1
plan_ml.index.name = "Run"
plan_ml.to_csv("mixed_level_plan.csv")

print(plan_ml.head())

This process gives you the flexibility to design for nearly any combination of factors and levels.

Quick Reference: Copy-Paste Template

Here’s a minimal template you can adapt for your own experiments:

from pydoe3 import ff2n
import numpy as np, pandas as pd

# Edit these factor definitions for your experiment
factors = {
    "Factor_A": (low_value, high_value),
    "Factor_B": (low_value, high_value),
    "Factor_C": (low_value, high_value)
}

# Create and convert design
design = ff2n(len(factors))
df = pd.DataFrame(design, columns=factors.keys())
for factor, (low, high) in factors.items():
    df[factor] = df[factor].map({-1: low, +1: high})

# Randomize and add run numbers
df = df.sample(frac=1, random_state=42).reset_index(drop=True)
df.index += 1
df.index.name = "Run"

# Add result column and save
df["Result"] = ""
df.to_csv("my_experiment.csv")
print(df)

Just change the factor definitions and you’re ready to go!

Up next:

<< Introducing Fractional & Central Composite Designs >>

<< Example of a Fractional Factorial Design >>