Method for estimating a nonlinear causal relationship \(X\to Y\) that is assumed to linearly extrapolate outside of the support of \(X\).

NILE(
  Y,
  X,
  A,
  lambda.star = "test",
  intercept = TRUE,
  df = 100,
  x.new = NULL,
  test = "penalized",
  p.min = 0.05,
  plot = TRUE,
  f.true = NULL,
  par.x = list(),
  par.a = list(),
  par.cv = list(num.folds = 10, optim = "optimize")
)

Arguments

Y

A numeric vector with observations from the target variable

X

A numeric vector with observations from the predictor variable

A

A numeric vector with observations from the exogenous variable

lambda.star

either a positive numeric, Inf or "test"; weight which determines the relative importance of the OLS loss and the TSLS loss, see details.

intercept

logical; indicating whether an intercept should be included into the regression model

df

positive integer; number of basis splines used to model the nonlinear function \(X\to Y\)

x.new

numeric; x-values at which predictions are desried

test

character; !!!

p.min

numeric between 0 and 1; the significance level at which the test which determines lambda.star should be tested.

plot

logical; diagnostic plots

f.true

real-valued function of one variable; If the groundtruth is known, it can be suplied and will be included in diagnostic plots.

par.x

a list of different parameters determining the \(B\)-spline regression of \(Y\) onto \(X\)

  • breaks numeric vector; knots at which B-spline is placed

  • num.breaks positive interger; number of knots used (ignored if breaks is supplied)

  • n.order positive integer; order of B-spline basis (default is 4, which corresponds to cubic splines)

  • pen.degree positive integer; 1 corresponds to a penalty on the first, and 2 (default) corresponds to a penalty on the second order derivative.

par.a

a list of different parameters determining the B-spline regression of the residuals \(Y - B(X)\beta\) onto the residuals \(A- ???\) !!!

  • breaks same as above

  • num.breaks same as above

  • n.order same as above

  • pen.degree same as above

par.cv

A list with parameters for the cross-validation procedure.

  • num.folds either "leave-one-out" or positive integer; number of CV folds (default = 10)

  • optim one of "optimize" (default) or "grid.search"; optimization method used for cross-validation.

  • n.grid positive integer; number of grid points used in grid search

Value

An object of class AR, containing the following elements

  • coefficients estimated splines coefficients for the relationship \(X\to Y\)

  • residuals prediction residuals

  • fitted.values fitted values

  • betaA estimated spline coefficients for the regression of the prediction residuals onto the variable \(A\)

  • fitA fitted values for the regression of the prediction residuals onto the variable \(A\)

  • Y the response variable

  • BX basis object holding all relevant information about the B-spline basis used for the regression \(X\to Y\)

  • BA basis object holding all relevant information about the B-spline basis used for the regression of the residuals onto A

  • ols OLS loss at the estimated parameters

  • iv TSLS loss at the estimated parameters

  • ar NILE loss at the estimated parameters, that is, \(OLS(\theta) + \texttt{lambda.star}\times TSLS(\theta)\)

  • intercept was an intercept supplied?

  • lambdaX penalty parameter used for the spline regression \(X\to Y\)

  • lambdaA penalty parameter used for the regression of residuals onto \(A\)

  • lambda.star the (estimated) value for lambda.star

  • pred predicted values at the supplied vector x.new

Details

The NILE estimator can be used to learn a nonlinear causal influence of a real-valued predictor \(X\) on a real-valued response variable \(Y\). It exploits an instrumental variable setting; it assumes that the variable \(A\) is a valid instrument. The estimator uses B-splines to estimate the nonlinear relationship. It further assumes that the causal function extrapolates lienarly outside of the empirical support of \(X\), and can therefore be used to obtain causal predictions (that is, predicted values for \(Y\) under confounding-removing interventions on \(X\)) even for values of \(X\) which lie outside the training support.

On a more technical side, the NILE estimator proceeds as follows. First, two B-splines \(B = (B_1,...,B_{df})\) and \(C = (C_1, ..., C_{df})\) are constructed, which span the values of \(X\) and \(A\), respectively. These give rise to the loss functions OLS(\(\theta\)), which corresponds to the MSE for the prediction residuals \(Y - \theta^T B\), and TSLS(\(\theta\)), which are the fitted values of the spline-regression of the residuals \(Y - \theta^T B\) onto the spline basis \(C\). The NILE estimator the estimates \(\theta\) by minimizing the objective function $$OLS(\theta) + \texttt{lambda.star}\times TSLS(\theta) + PEN(\theta),$$ where PEN(\(\theta\)) is a quadratic penalty term which enforces smoothness.

The parameter lambda.star is chosen as the largest positive value for which the corresponding solution yields prediction residuals which pass a test for vanishing product moment with all basis functions \(C_1(A), \dots C_{df}(A)\).

References

Christiansen R, Pfister N, Jakobsen ME, Gnecco N, Peters J (2020). “The difficult task of distribution generalization in nonlinear models.” Available from https://arxiv.org/abs/2006.07433.

Author

Rune Christiansen krunechristiansen@math.ku.dk

Examples

n.splines.true <- 4 fX <- function(x, extrap, beta){ bx <- splines::ns(x, knots = seq(from=extrap[1], to=extrap[2], length.out=(n.splines.true+1))[ -c(1,n.splines.true+1)], Boundary.knots = extrap) bx%*%beta } # data generating model n <- 200 set.seed(2) beta0 <- runif(n.splines.true, -1,1) alphaA <- alphaEps <- alphaH <- 1/sqrt(3) A <- runif(n,-1,1) H <- runif(n,-1,1) X <- alphaA * A + alphaH * H + alphaEps*runif(n,-1,1) Y <- fX(x=X,extrap=c(-.7,.7), beta=beta0) + .3 * H + .2 * runif(n,-1,1) x.new <- seq(-2,2,length.out=100) f.new <- fX(x=x.new,extrap=c(-.7,.7), beta=beta0) plot(X,Y, pch=20)
lines(x.new,f.new,col="#0072B2",lwd=3)
## FIT! fit <- NILE(Y, # response X, # predictors (so far, only 1-dim supported) A, # anchors (1 or 2-dim, although 2-dim is experimental so far) lambda.star = "test", # (0 = OLS, Inf = IV, (0,Inf) = # nonlinear anchor regression, "test" = NILE) test = "penalized", intercept = TRUE, df = 50, # number of splines used for X -> Y p.min = 0.05, # level at which test for lambda is performed x.new = x.new, # values at which predictions are required plot=TRUE, # diagnostics plots f.true = function(x) fX(x,c(-.7,.7), beta0), # if supplied, the # true causal function is added to the plot par.x = list(lambda=NULL, # positive smoothness penalty for X -> Y, # if NULL, it is chosen by CV to minimize out-of-sample # AR objective breaks=NULL, # if breaks are supplied, exactly these # will be used for splines basis num.breaks=20, # will result in num.breaks+2 splines, # ignored if breaks is supplied. n.order=4 # order of splines ), par.a = list(lambda=NULL, # positive smoothness penalty for fit of # residuals onto A. If NULL, we first compute the OLS # fit of Y onto X, # and then choose lambdaA by CV to # minimize the out-of-sample MSE for predicting # the OLS residuals breaks=NULL, # same as above num.breaks=4, # same as above n.order=4 # same as above ))
#> [1] "lambda.cv.a = 1.4293232458819" #> [1] "lambda.cv.x = 0.00159332920036247" #> [1] "lambda.star.p.uncorr = 3.5873327255249"