- Concepts:
Exchangeability (same average outcome even if subjects are swapped; treatment is exogenous);
Positivity (probability of receiving treatment > 0);
Common support (strata contains data for both treatment and control);
ATE Average Treatment Effect combines (Y1|D=1 - Y0|D=1) and (Y1|D=0 - Y0|D=1), Conditional ATE
CATE, Local ATE
LATE or complier
average causal effect,
ATT Average Treatment Effect for the Treated only considers (Y1|D=1 - Y0|D=1); ATT and ATE are the same in RCTs because Y0|D=1 equals Y0|D=0, and same for Y1 (i.e. baselines would be the same, outcomes would be the same) (
explanation)
- Estimation Methods:
-- IP Weighting: models treatment P(A=a|L) and then compute outcome
-- Standardization: models outcome E(Y|A=a, L=l) directly e.g. regularized regressions
--
Matching: exact match, distance based matching - bias correction methods can be used (ch5.3.2)
, propensity score based matching (ps + nn), coarsened exact matching
-- Instrumental Variables (IV) and 2SLS: estimates LATE
-- Regression Discontinuity (RDD): does not coexist with matching, estimates LATE
-- Diff-in-diff (DD): estimates ATT, requires parallel trend assumption
-- Synthetic control: e.g. forecasting to create a synthetic trend
-- Doubly robust estimator: combines IPW and Standardization using a canonical link
-
Inverse Probability Weighting (IPW) IP weighting creates a pseudo population, so we get two copies of data for each individual under condition L: one receives treatment A and the other no treatment. Inverse probability weighting proportionately "bumps up" the under-represented arm within a condition L (
Chapter 2.1). Graphically, IP weighting breaks the link between condition L and treatment A, so that the association between A and Y is causal. Key is the propensity score modeling should be a good representation of A given L. Can use robust variance estimator (GEE for this) to compute the average treatment effect and confidence interval. IP Weights: W = 1/f(A|L); Stabilized IP Weights: SW = f(A)/f(A/L), and can result in narrower confidence intervals for non-saturated models
- Linear Regression: a type of standardization. when it fails? Relationship is non-linear
- X-Learner: first, build one model each for treatment and control; then, impute the treatment effect for each observation using the opposite model (e.g. Y1 - Yhat1, where Yhat1 is built with control data); Then, build a final model to estimate the imputed treatment effect using inverse probability weighting
- Instrumental Variable: Z -> T <- U -> Y; Wald estimator COV(Y,Z)/COV(A,Z) or (E(Y|Z=1)-E(Y|Z=0))/(E(A|Z=1) - E(A|Z=0)); Calculates compliers average causal effect or LATE (sans defiers, always takers, and never-takers); STRONG LIMITATIONS from Hernán Robins: "...standard IV estimation is better reserved for settings with lots of unmeasured
confounding, a truly dichotomous and time-fixed treatment, a strong and
causal proposed instrument, and in which either effect homogeneity is expected to hold, or one is genuinely interested in the effect in the compliers and
monotonicity is expected to hold."