Causal Inference Methods for Policy Evaluation

6-Difference in Differences

Jacopo Mazza

Utrecht School of Economics

2026

The politics of Minimum Wages

Minimum Wages around the World

The Minimum Wage Debate

Minimum wages are a hotly debated topic in economics and politics.

Proponents argue that minimum wages are necessary to protect workers from exploitation and poverty.
Opponents argue that minimum wages lead to unemployment and reduce the competitiveness of firms.
Labour economists have been studying the effects of minimum wages for decades.
Their answer:

How would you answer this question with data?

Do you see any complication?

Minimum Wage in New Jersey

The New Jersey Minimum Wage Experiment

In 1992, New Jersey increased the minimum wage from $4.25 to $5.05 per hour.
The minimum wage in neighboring Pennsylvania remained constant.
Card and Krueger (1994)¹ used this natural experiment to study the effects of minimum wages on employment.

The idea

Compare the employment in fast-food restaurants in New Jersey and Pennsylvania before and after the minimum wage increase.

Why Pennsylvania?
- Nearby state and likely affected by common unobservable shocks.
- Presumably affected by the same unobservable shocks, but did not increase minimum wage.

Why Fast-food restaurants?
- Fast-food restaurants are major employers of low-wage workers.

Why not just compare the employment in New Jersey before and after the minimum wage increase?

Why not just compare the employment in New Jersey to that in Pennsylvania after the minimum wage increase?

Difference in Differences: The Idea

Find a control group that did not experience the treatment.
Compare the difference in the outcome variable between the treatment and control group before the treatment.
Compare the difference in the outcome variable between the treatment and control group after the treatment.
The difference in the differences is the treatment effect.
The treatment effect is estimated as the difference in trends.

Strengths and Requirements

Strengths:

Simple and intuitive.
Treated and control units can be systematically different.

Requirements:

Observe the same units before and after the treatment.

Key Assumption:

The treatment and control groups would have followed the same trend in the absence of the treatment.

Card and Krueger’s DiD Estimates

Outcome Variable: $Y$ = FTE (Full-Time Equivalent employment).
FTE decreased in Pennsylvania (−2.16) and increased slightly in New Jersey (0.59).
The difference in differences is 2.75.
This result speaks against the conventional wisdom that minimum wages reduce employment.

The DiD Estimator

Difference in Differences can be estimated in two ways:

Simple DiD: The difference in the average outcome variable between the treatment and control group before and after the treatment.
Regression DiD: A linear regression model that includes a treatment indicator and a time indicator.

Simple DiD

With one treatment group, one control group, both observed twice, the simple DiD estimator is:

\[ \hat{\delta}^{2\times 2}_{kU}=(\bar{y}^{post(k)}_k-\bar{y}^{pre(k)}_k)-(\bar{y}^{post(U)}_U-\bar{y}^{pre(U)}_U) \]

Where:

$k$ is the treatment group.
$U$ is the control group.

Rewriting as conditional expectations:

\[\begin{aligned} \hat{\delta}^{2\times 2}_{kU} &= (E[Y_{k}|Post]-E[Y_{k}|Pre]) \\ &- (E[Y_{U}|Post]-E[Y_{U}|Pre]) \end{aligned}\]

By the switching equation (and adding 0):

\[\begin{aligned} \hat{\delta}^{2\times 2}_{kU} &= (E[Y^1_{k}|Post]-E[Y^0_{k}|Pre])-(E[Y^0_{U}|Post]-E[Y^0_{U}|Pre]) \\ &+ E[Y_k^0|Post] - E[Y_k^0|Post] \end{aligned}\]

Rearrange to get conditional expected potential outcomes:

\[\begin{aligned} \hat{\delta}^{2\times 2}_{kU} &= E[Y^1_{k}|Post]-E[Y^0_{k}|Post] \\ &+ (E[Y^0_{k}|Post]-E[Y^0_{k}|Pre]) \\ &- (E[Y_U^0|Post] - E[Y_U^0|Pre]) \end{aligned}\]

(also called the 2×2 DiD design)

What Does Simple DiD Identify?

Rearranging reveals that the estimator equals:

\[\begin{aligned} \hat{\delta}^{2\times 2}_{kU} &= \underbrace{E[Y^1_{k}|Post]-E[Y^0_{k}|Post]}_{\text{ATT}} \\[6pt] &+ \underbrace{\bigl(E[Y^0_{k}|Post]-E[Y^0_{k}|Pre]\bigr) - \bigl(E[Y^0_{U}|Post]-E[Y^0_{U}|Pre]\bigr)}_{\text{Bias} = 0 \iff \text{Parallel Trends holds}} \end{aligned}\]

Parallel Trends is the assumption that the bias term equals zero.
Card-Krueger: $(21.03 - 20.44) - (21.17 - 23.33) = 0.59 - (-2.16) = \mathbf{2.75}$

Regression DiD

Difference in Differences can also be estimated using a linear regression model.

The goal is to control for group differences and time differences:

\[ Y_{it} = \alpha + \gamma D_i + \lambda POST_t + \delta (D_i \times POST_t) + \varepsilon_{it} \]

Where:

$Y_{it}$ is the outcome variable.
$D_i$ is a treatment indicator.
$POST_t$ is a post-treatment time indicator.
$\delta$ is the DiD estimator (ATT).

Regression DiD Visually

OLS Specification of the DiD Equation

In the MW example the regression DiD model would be:

$Y_{it} = \alpha + \gamma NJ_s + \lambda d_t + \delta(NJ \times d)_{st} + \varepsilon_{it}$.
- PA pre-treatment: $\alpha$.
- PA post-treatment: $\alpha + \lambda$.
- NJ pre-treatment: $\alpha + \gamma$.
- NJ post-treatment: $\alpha + \gamma + \lambda + \delta$.
DiD equation = (NJ Post - NJ Pre) - (PA Post - PA Pre) = $\delta$

The Two-Way Fixed Effects (TWFE) in R

library(tidyverse); library(modelsummary); library(fixest)
od <- causaldata::organ_donations

# Treatment variable
od <- od %>%
     mutate(Treated = State == 'California' & 
            Quarter %in% c('Q32011','Q42011','Q12012'))

# feols clusters by the first
# fixed effect by default, no adjustment necessary
clfe <- feols(Rate ~ Treated | State + Quarter,
           data = od)
msummary(clfe, stars = c('*' = .1, '**' = .05, '***' = .01),
         coef = "Treated", nobs = TRUE)

Kessler and Roth (2014) study the effect of the introduction of an “active choice” policy in California on organ donations.
The policy was introduced in Q3 2011.
The outcome variable is the organ donation rate.
The treatment variable is a dummy for California in Q3 2011, Q4 2011, and Q1 2012.
The feols function from the fixest package estimates the DiD model with two-way fixed effects.

Effect of the “Active Choice” Policy on Organ Donations

	Organ Donation Rate
Treatment	-0.022***
	(0.006)
Num.Obs.	162
FE: State	X
FE: Quarter	X

Standard errors clustered at the state level. * p<0.1, ** p<0.05, *** p<0.01

DiD with Multiple Time Periods and Groups

The DiD estimator can be extended to multiple time periods and groups.
Advantages:
- More variation in the treatment and control groups.
- More variation over time.

\[Y_{it} = \alpha + \sum^I_{k=i-1} \gamma_k D_{ki} + \sum^T_{j=t-1}\lambda_j P_{jt} + \delta_{DD} (D_i \times P_t) + \varepsilon_{it}\]

We are adding fixed effects for the treatment and control groups and for the time periods.

Standard Errors in DiD

Potential serial correlation of error terms within units over time
- Lead to biased standard errors
Cluster standard errors at the level of units (e.g., states, firms, individuals)
Influential paper by Bertrand, Duflo & Mullainathan (2004, QJE)
- They have shown that many papers were too optimistic (too low standard errors) by neglecting serial correlation

Treatment Effects in DiD

For whom are the treatment effects estimated?

In the 2×2 case: the ATT — effect on the treated group in the post period.
With multiple periods and staggered adoption, TWFE estimates a weighted average of group-time ATTs — weights can be negative with heterogeneous treatment effects (Goodman-Bacon decomposition).
Preferred solution: Callaway & Sant’Anna (2021) — estimate each group-time ATT separately, then aggregate.

Supporting the Parallel Trends Assumption

The Parallel Trends Assumption

Parallel trends assumption is crucial for the validity of the DiD estimator.
The assumption is that the treatment and control groups would have followed the same trend in the absence of the treatment.
The assumption is untestable.
The assumption can be supported by:
- Graphical analysis.
- Event study analysis.
- Placebo tests.

Graphical Analysis

A typical way to support the parallel trends assumption is to plot the outcome variable for the treatment and control groups before the treatment.
Parallel pre-trends can mitigate concerns about violations of the parallel trends assumption.

Event Study Analysis

With only one treatment group and one never-treated group the event study analysis is simply this OLS:

\[Y_{its} = \alpha + \sum_{\ell \leq -1}\mu_{\ell} (D_s \times \mathbf{1}_\ell) + \sum_{\ell \geq 0}\delta_{\ell} (D_s \times \mathbf{1}_\ell) + \lambda_t + D_s + \varepsilon_{ist}\]

$D_s$ is a treatment-group indicator for group $s$; $\mathbf{1}_\ell$ is an indicator for event-time $\ell$ (periods relative to treatment onset).
$q$ leads ($\ell < 0$, anticipatory effects) and $m$ lags ($\ell \geq 0$, post-treatment effects); $\lambda_t$ are time fixed effects.

Event Study: How it’s done

You should plot the coefficients of the interaction terms and their 95% confidence intervals.
Under no anticipation you should expect the coefficients on the leads to be 0.

Event Study Application

Miller et al. (2019) study the effect of the introduction of the Affordable Care Act (ACA) on mortality.
They focus on the near-elderly adults in states with and without the Affordable Care Act Medicaid expansions.
They find a 0.13-percentage-point decline in annual mortality (9.3% reduction over the sample mean) as a result of the ACA expansion.
This effect is driven by a reduction in disease-related deaths and grows over time.

Event Study Visualization

Estimates of Medicaid expansion’s effects on **eligibility**. Reprint from Miller et al. (2019)

Event Study Visualization (cont.)

Estimates of Medicaid expansion’s effects on the **uninsured rate**. Reprint from Miller et al. (2019).

Event Study in R

library(tidyverse); library(fixest)
od <- causaldata::organ_donations

# Treatment variable
od <- od %>% mutate(California = State == 'California')

# Interact quarter with being in the treated group using
# the fixest i() function, which also lets us specify
# a reference period (using the numeric version of Quarter)
clfe <- feols(Rate ~ i(Quarter_Num, California, ref = 3) | 
            State + Quarter_Num, data = od)

# And use coefplot() for a graph of effects
coefplot(clfe)

Event Study in R

Placebo Tests

Placebo tests look for the effect of the treatment in periods or units where the treatment should not be present.
- Same outcome in groups that should not be affected by the treatment.
- Effect on outcomes that should not be affected by the treatment.

Identifying Assumptions in DiD

Parallel Trends Assumption: The treatment and control groups would have followed the same trend in the absence of the treatment.
No selection into treatment: treatment assignment does not depend on past or future changes in the outcome variable (otherwise common trends is implausible)
No spillover effects: The treatment does not affect the control group (or vice versa — the Stable Unit Treatment Value Assumption, SUTVA)
No concurrent shocks: no other events at the same time affect both the outcome and treatment status

External Validity of DiD

DiD shows the effects for units with a change in treatment status relative to those without (control)
Can we generalize these effects to those units without a change in the treatment status?
Can we extrapolate the results to larger treatments?
How informative are the results for other situations (countries, etc.)?

Key Takeaways

DiD identifies the ATT: the effect on treated units in the post period, relative to what would have happened absent treatment.
Parallel Trends is the key assumption: treatment and control would have trended identically in the absence of the policy. It is untestable, but can be supported.
How to support parallel trends:
1. Graphical analysis of pre-treatment trends
2. Event-study plot (pre-treatment coefficients $\mu_\ell$ should be zero)
3. Placebo tests on outcomes or groups unaffected by the treatment
TWFE works well for the 2×2 case; with staggered adoption and heterogeneous effects use Callaway & Sant’Anna (2021) instead.