Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

DAGs and
potential outcomes

Session 5

PMAP 8521: Program evaluation
Andrew Young School of Policy Studies

1 / 45

Plan for today

2 / 45

Plan for today

do()ing observational
causal inference

2 / 45

Plan for today

do()ing observational
causal inference

Potential outcomes

2 / 45

do()ing observational
causal inference

3 / 45

Structural models

The relationship between nodes can be described with equations

Loc=fLoc(U1)Bkgd=fBkgd(U1)JobCx=fJobCx(Edu)Edu=fEdu(Req,Loc,Year)Earn=fEarn(Edu,Year,Bkgd,Loc,JobCx)

4 / 45

Structural models

dagify() in ggdag forces you to think this way

Earn=fEarn(Edu,Year,Bkgd,Loc,JobCx)Edu=fEdu(Req,Loc,Year)JobCx=fJobCx(Edu)Bkgd=fBkgd(U1)Loc=fLoc(U1)

dagify(
Earn ~ Edu + Year + Bkgd + Loc + JobCx,
Edu ~ Req + Loc + Bkgd + Year,
JobCx ~ Edu,
Bkgd ~ U1,
Loc ~ U1
)
5 / 45

Causal identification

All these nodes are related; there's correlation between them all

We care about
Edu → Earn, but what do we do about all the other nodes?

6 / 45

Causal identification

A causal effect is identified if the association between treatment and outcome is propertly stripped and isolated

7 / 45

Paths and associations

Arrows in a DAG transmit associations

You can redirect and control those paths by "adjusting" or "conditioning"

8 / 45

Three types of associations

Confounding

Common cause

Causation

Mediation

Collision

Selection /
endogeneity

9 / 45

Interventions

do-operator

Making an intervention in a DAG

P[Y | do(X=x)]orE[Y | do(X=x)]

10 / 45

Interventions

do-operator

Making an intervention in a DAG

P[Y | do(X=x)]orE[Y | do(X=x)]

P = probability distribution, or E = expectation/expected value

10 / 45

Interventions

do-operator

Making an intervention in a DAG

P[Y | do(X=x)]orE[Y | do(X=x)]

P = probability distribution, or E = expectation/expected value

Y = outcome, X = treatment;
x = specific value of treatment

10 / 45

Interventions

E[Y | do(X=x)]

E[ Earnings | do(One year of college)]

11 / 45

Interventions

E[Y | do(X=x)]

E[ Earnings | do(One year of college)]

E[ Firm growth | do(Government R&D funding)]

11 / 45

Interventions

E[Y | do(X=x)]

E[ Earnings | do(One year of college)]

E[ Firm growth | do(Government R&D funding)]

E[ Air quality | do(Carbon tax)]

11 / 45

Interventions

E[Y | do(X=x)]

E[ Earnings | do(One year of college)]

E[ Firm growth | do(Government R&D funding)]

E[ Air quality | do(Carbon tax)]

E[ Juvenile delinquency | do(Truancy program)]

11 / 45

Interventions

E[Y | do(X=x)]

E[ Earnings | do(One year of college)]

E[ Firm growth | do(Government R&D funding)]

E[ Air quality | do(Carbon tax)]

E[ Juvenile delinquency | do(Truancy program)]

E[ Malaria infection rate | do(Mosquito net)]

11 / 45

Interventions

When you do() X, delete all arrows into it

12 / 45

Interventions

When you do() X, delete all arrows into it

Observational DAG

12 / 45

Interventions

When you do() X, delete all arrows into it

Observational DAG

Experimental DAG

12 / 45

Interventions

E[Earnings | do(College education)]

13 / 45

Interventions

E[Earnings | do(College education)]

Observational DAG

13 / 45

Interventions

E[Earnings | do(College education)]

Observational DAG

Experimental DAG

13 / 45

Undo()ing things

We want to know P[Y | do(X)]
but all we have is
observational data X, Y, and Z

14 / 45

Undo()ing things

We want to know P[Y | do(X)]
but all we have is
observational data X, Y, and Z

P[Y | do(X)]P(Y | X)

14 / 45

Undo()ing things

We want to know P[Y | do(X)]
but all we have is
observational data X, Y, and Z

P[Y | do(X)]P(Y | X)

Correlation isn't causation!

14 / 45

Undo()ing things

Our goal with observational data:
Rewrite P[Y | do(X)] so that it doesn't have a do() anymore (is "do-free")

15 / 45

do-calculus

A set of three rules that let you manipulate a DAG
in special ways to remove do() expressions

do-calculus rules

WAAAAAY beyond the score of this class!
Just know it exists and computer algorithms can do it for you!

16 / 45

Special cases of do-calculus

Backdoor adjustment

Frontdoor adjustment

17 / 45

Backdoor adjustment

P[Y | do(X)]=ZP(Y | X,Z)×P(Z)

↑ That's complicated!

The right-hand side of the equation means "the effect of X on Y after adjusting for Z"

There's no do() on that side!

18 / 45

Frontdoor adjustment

S → T is d-separated; T → C is d-separated
combine the effects to find S → C

19 / 45

Moral of the story

If you can transform do() expressions to
do-free versions, you can legally make causal inferences from observational data

20 / 45

Moral of the story

If you can transform do() expressions to
do-free versions, you can legally make causal inferences from observational data

Backdoor adjustment is easiest to see +
dagitty and ggdag do this for you!

20 / 45

Moral of the story

If you can transform do() expressions to
do-free versions, you can legally make causal inferences from observational data

Backdoor adjustment is easiest to see +
dagitty and ggdag do this for you!

Fancy algorithms (found in the causaleffect package)
can do the official do-calculus for you too

20 / 45

Potential outcomes

21 / 45

Program effect

Outcomes and program effect
22 / 45

Some equation translations

Causal effect = δ (delta)

δ=P[Y | do(X)]

23 / 45

Some equation translations

Causal effect = δ (delta)

δ=P[Y | do(X)]

δ=E[Y | do(X)]E[Y | ^do(X)]

23 / 45

Some equation translations

Causal effect = δ (delta)

δ=P[Y | do(X)]

δ=E[Y | do(X)]E[Y | ^do(X)]

δ=(Y | X=1)(Y | X=0)

23 / 45

Some equation translations

Causal effect = δ (delta)

δ=P[Y | do(X)]

δ=E[Y | do(X)]E[Y | ^do(X)]

δ=(Y | X=1)(Y | X=0)

δ=Y1Y0

23 / 45
24 / 45

Fundamental problem
of causal inference

δi=Y1iY0iin real life isδi=Y1i???

Individual-level effects are impossible to observe!

There are no individual counterfactuals!

25 / 45

Average treatment effect (ATE)

Solution: Use averages instead

ATE=E(Y1Y0)=E(Y1)E(Y0)

26 / 45

Average treatment effect (ATE)

Solution: Use averages instead

ATE=E(Y1Y0)=E(Y1)E(Y0)

Difference between average/expected value when
program is on vs. expected value when program is off

δ=(ˉY | P=1)(ˉY | P=0)

26 / 45
Person Age Treated Outcome
with program
Outcome
without program
Effect
1 Old TRUE 80 60 20
2 Old TRUE 75 70 5
3 Old TRUE 85 80 5
4 Old FALSE 70 60 10
5 Young TRUE 75 70 5
6 Young FALSE 80 80 0
7 Young FALSE 90 100 -10
8 Young FALSE 85 80 5
27 / 45
Person Age Treated Outcome
with program
Outcome
without program
Effect
1 Old TRUE 80 60 20
2 Old TRUE 75 70 5
3 Old TRUE 85 80 5
4 Old FALSE 70 60 10
5 Young TRUE 75 70 5
6 Young FALSE 80 80 0
7 Young FALSE 90 100 -10
8 Young FALSE 85 80 5

δ=(ˉY | P=1)(ˉY | P=0)

ATE=20+5+5+5+10+0+10+58=5

28 / 45

CATE

ATE in subgroups

29 / 45

CATE

ATE in subgroups

Is the program more
effective for specific age groups?

29 / 45
Person Age Treated Outcome
with program
Outcome
without program
Effect
1 Old TRUE 80 60 20
2 Old TRUE 75 70 5
3 Old TRUE 85 80 5
4 Old FALSE 70 60 10
5 Young TRUE 75 70 5
6 Young FALSE 80 80 0
7 Young FALSE 90 100 -10
8 Young FALSE 85 80 5

δ=(ˉYO | P=1)(ˉYO | P=0)

δ=(ˉYY | P=1)(ˉYY | P=0)

CATEOld=20+5+5+104=10

CATEYoung=5+010+54=0

30 / 45

ATT and ATU

Average treatment on the treated

ATT / TOT

Effect for those with treatment

31 / 45

ATT and ATU

Average treatment on the treated

ATT / TOT

Effect for those with treatment

Average treatment on the untreated

ATU / TUT

Effect for those without treatment

31 / 45
Person Age Treated Outcome
with program
Outcome
without program
Effect
1 Old TRUE 80 60 20
2 Old TRUE 75 70 5
3 Old TRUE 85 80 5
4 Old FALSE 70 60 10
5 Young TRUE 75 70 5
6 Young FALSE 80 80 0
7 Young FALSE 90 100 -10
8 Young FALSE 85 80 5

δ=(ˉYT | P=1)(ˉYT | P=0)

δ=(ˉYU | P=1)(ˉYU | P=0)

CATETreated=20+5+5+54=8.75

CATEUntreated=10+010+54=1.25

32 / 45

ATE, ATT, and ATU

The ATE is the weighted average
of the ATT and ATU

33 / 45

ATE, ATT, and ATU

The ATE is the weighted average
of the ATT and ATU

ATE=(πTreated×ATT)+(πUntreated×ATU)

(48×8.75)+(48×1.25)

4.375+0.625=5

π here means "proportion," not 3.1415

33 / 45

Selection bias

ATE and ATT aren't always the same

ATE = ATT + Selection bias

5=8.75+xx=3.75

Randomization fixes this, makes x = 0

34 / 45

Actual data

Person Age Treated Actual outcome
1 Old TRUE 80
2 Old TRUE 75
3 Old TRUE 85
4 Old FALSE 60
5 Young TRUE 75
6 Young FALSE 80
7 Young FALSE 100
8 Young FALSE 80

Treatment not
randomly assigned

We can't see
unit-level causal effects

What do we do?!

35 / 45

Actual data

Person Age Treated Actual outcome
1 Old TRUE 80
2 Old TRUE 75
3 Old TRUE 85
4 Old FALSE 60
5 Young TRUE 75
6 Young FALSE 80
7 Young FALSE 100
8 Young FALSE 80

Treatment seems to be correlated with age

36 / 45

Actual data

Person Age Treated Actual outcome
1 Old TRUE 80
2 Old TRUE 75
3 Old TRUE 85
4 Old FALSE 60
5 Young TRUE 75
6 Young FALSE 80
7 Young FALSE 100
8 Young FALSE 80

We can estimate the ATE by finding the weighted average of age-based CATEs

As long as we assume/pretend treatment was randomly assigned within each age = unconfoundedness

 

^ATE=πOld^CATEOld+πYoung^CATEYoung

37 / 45

Actual data

^ATE=πOld^CATEOld+πYoung^CATEYoung

Person Age Treated Actual outcome
1 Old TRUE 80
2 Old TRUE 75
3 Old TRUE 85
4 Old FALSE 60
5 Young TRUE 75
6 Young FALSE 80
7 Young FALSE 100
8 Young FALSE 80

 

^CATEOld=80+75+853601=20

^CATEYoung=75180+100+803=11.667

^ATE=(48×20)+(48×11.667)=4.1667

38 / 45

¡¡¡DON'T DO THIS!!!

^ATE=^CATETreated^CATEUntreated

Person Age Treated Actual outcome
1 Old TRUE 80
2 Old TRUE 75
3 Old TRUE 85
4 Old FALSE 60
5 Young TRUE 75
6 Young FALSE 80
7 Young FALSE 100
8 Young FALSE 80

^CATETreated=80+75+85+754=78.75

^CATEUntreated=60+80+100+804=80

^ATE=78.7580=1.25

 

You can only do this if treatment is random!

39 / 45

Matching and ATEs

^ATE=πOld^CATEOld+πYoung^CATEYoung

We used age here because it correlates with (and confounds) the outcome

And we assumed unconfoundedness;
that treatment is
randomly assigned within the groups

40 / 45

 

 

Does attending a private university cause an increase in earnings?

Matching table from Mastering 'Metrics
41 / 45
Matching table from Mastering 'Metrics

This is tempting!

Average private − Average public

110+100+60+115+755=92110+30+90+604=72.5(92×59)(72.5×49)=18,888

This is wrong!

^ATE=πPrivate^CATEPrivate+πPublic^CATEPublic

42 / 45

Grouping and matching

Matching table from Mastering 'Metrics

These groups look like they have similar characteristics

Unconfoundedness?

43 / 45
Matching table from Mastering 'Metrics

CATE Group A + CATE Group B

110+1002110=5,0006030=30,000(5×35)+(30×25)=9,000

This is less wrong!

^ATE=πGroup A^CATEGroup A+πGroup B^CATEGroup B

44 / 45

Matching with regression

Earnings=α+β1Private+β2Group+ϵ

45 / 45

Matching with regression

Earnings=α+β1Private+β2Group+ϵ

model_earnings <- lm(earnings ~ private + group_A, data = schools_small)
45 / 45

Matching with regression

Earnings=α+β1Private+β2Group+ϵ

model_earnings <- lm(earnings ~ private + group_A, data = schools_small)
term estimate std.error statistic p.value
(Intercept) 40000 11952.29 3.35 0.08
privateTRUE 10000 13093.07 0.76 0.52
group_ATRUE 60000 13093.07 4.58 0.04
45 / 45

Matching with regression

Earnings=α+β1Private+β2Group+ϵ

model_earnings <- lm(earnings ~ private + group_A, data = schools_small)
term estimate std.error statistic p.value
(Intercept) 40000 11952.29 3.35 0.08
privateTRUE 10000 13093.07 0.76 0.52
group_ATRUE 60000 13093.07 4.58 0.04

β1 = $10,000This is less wrong!Significance details!

45 / 45

Plan for today

2 / 45
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow