| Title: | Estimation under not Missing at Random Nonresponse |
|---|---|
| Description: | Methods to estimate finite-population parameters under nonresponse that is not missing at random (NMAR, nonignorable). Incorporates auxiliary information and user-specified response models, and supports independent samples and complex survey designs via objects from the 'survey' package. Provides diagnostics and optional variance estimates. For methodological background see Qin, Leung and Shao (2002) <doi:10.1198/016214502753479338> and Riddles, Kim and Im (2016) <doi:10.1093/jssam/smv047>. |
| Authors: | Maciej Beresewicz [aut, cre] (ORCID: <https://orcid.org/0000-0002-8281-4301>), Igor Kołodziej [aut, ctb], Mateusz Iwaniuk [aut, ctb] |
| Maintainer: | Maciej Beresewicz <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.2 |
| Built: | 2026-06-04 07:48:32 UTC |
| Source: | https://github.com/ncn-foreigners/nmar |
Returns missingness-model coefficients.
## S3 method for class 'nmar_result' coef(object, ...)## S3 method for class 'nmar_result' coef(object, ...)
object |
An 'nmar_result' object. |
... |
Ignored. |
A named numeric vector or 'NULL'.
Returns a coefficients table (Estimate, Std. Error, statistic, p-value) from a 'summary_nmar_result*' object when missingness-model coefficients and a variance matrix are available. If the summary does not carry missingness-model coefficients, returns 'NULL'.
## S3 method for class 'summary_nmar_result' coef(object, ...)## S3 method for class 'summary_nmar_result' coef(object, ...)
object |
An object of class 'summary_nmar_result' (or subclass). |
... |
Ignored. |
The statistic column is labelled "t value" when finite degrees of freedom are present in survey designs, otherwise it is labelled "z value".
A data.frame with rows named by coefficient, or 'NULL' if not available.
Wald confidence interval for NMAR results
## S3 method for class 'nmar_result' confint(object, parm, level = 0.95, ...)## S3 method for class 'nmar_result' confint(object, parm, level = 0.95, ...)
object |
An object of class 'nmar_result'. |
parm |
Ignored. |
level |
Confidence level. |
... |
Ignored. |
A 1x2 numeric matrix with confidence limits.
Returns Wald-style confidence intervals for missingness-model coefficients from a 'summary_nmar_result*' object. Uses t-quantiles when finite degrees of freedom are available, otherwise normal quantiles.
## S3 method for class 'summary_nmar_result' confint(object, parm, level = 0.95, ...)## S3 method for class 'summary_nmar_result' confint(object, parm, level = 0.95, ...)
object |
An object of class 'summary_nmar_result' (or subclass). |
parm |
A specification of which coefficients are to be given confidence intervals, either a vector of names or a vector of indices. By default, all coefficients are considered. |
level |
The confidence level required. |
... |
Ignored. |
A numeric matrix with columns giving lower and upper confidence limits for each parameter. Row names correspond to coefficient names. Returns 'NULL' if coefficients are unavailable.
Constructs an engine specification for the empirical likelihood estimator of a full-data mean under nonignorable nonresponse.
el_engine( standardize = TRUE, trim_cap = Inf, on_failure = c("return", "error"), variance_method = c("bootstrap", "none"), bootstrap_reps = 500, auxiliary_means = NULL, control = list(), strata_augmentation = TRUE, n_total = NULL, start = NULL, family = c("logit", "probit") )el_engine( standardize = TRUE, trim_cap = Inf, on_failure = c("return", "error"), variance_method = c("bootstrap", "none"), bootstrap_reps = 500, auxiliary_means = NULL, control = list(), strata_augmentation = TRUE, n_total = NULL, start = NULL, family = c("logit", "probit") )
standardize |
logical; standardize predictors. Default |
trim_cap |
numeric; cap for EL weights ( |
on_failure |
character; |
variance_method |
character; one of |
bootstrap_reps |
integer; number of bootstrap replicates when
|
auxiliary_means |
named numeric vector; population means for auxiliary
design columns. Names must match the materialized model.matrix column names
on the first RHS after formula expansion, e.g., factor indicator columns
created by |
control |
Optional solver configuration forwarded to
|
strata_augmentation |
logical; when |
n_total |
numeric; optional when supplying respondents-only data (no |
start |
list; optional starting point for the solver. Fields:
|
family |
Missingness model family. Either |
The implementation follows Qin, Leung, and Shao (2002): the response
mechanism is modeled as and
the joint distribution of is represented nonparametrically by respondent
masses that satisfy empirical likelihood constraints. The mean is estimated
as a respondent weighted mean with weights proportional to
, where are base
weights ( for IID data and for survey
designs) and is the EL denominator.
For data.frame inputs the estimator solves the Qin-Leung-Shao
estimating equations for with reparameterized
as , and profiles out the response multiplier
using the closed-form QLS identity (their Eq. 10). For
survey.design inputs the estimator uses a design-weighted analogue
(Chen and Sitter 1999, Wu 2005) with an explicit and an
additional linkage equation involving the nonrespondent design-weight total
.
Numerical stability:
is optimized on the logit scale so .
The response-model linear predictor is capped and EL denominators
are floored at a small positive value. The analytic Jacobian is
consistent with this guard via an active-set mask.
Optional trimming (trim_cap) is applied only after solving, to
the unnormalized masses . This changes the
returned weights and therefore the point estimate.
Formula syntax and data constraints: nmar() accepts a
partitioned right-hand side y_miss ~ auxiliaries | response_only. Variables
left of | enter auxiliary moment constraints. Variables right of |
enter only the response model. The outcome (LHS) is always included as a
response-model predictor through the evaluated LHS expression. Explicit use of
the outcome on the RHS is rejected. The response model always includes an
intercept, while the auxiliary block never includes an intercept.
To include a covariate in both the auxiliary constraints and the response
model, repeat it on both sides, e.g. y_miss ~ X | X.
Auxiliary means: If auxiliary_means = NULL (default) and the
outcome contains at least one NA, auxiliary means are estimated from the
full input and used as in the QLS constraints. For respondents-only
data (no NA in the outcome), n_total must be supplied, and if the
auxiliary RHS is non-empty, auxiliary_means must also be supplied.
When standardize = TRUE, supply auxiliary_means on the original
data scale, the engine applies the same standardization internally.
Survey scale: For survey.design inputs, n_total must
be on the same analysis scale as weights(design). The
default is sum(weights(design)).
Convergence and identification: the stacked EL system can have
multiple solutions. Adding response-only predictors (variables to the right
of |) can make the problem sensitive to starting values. Inspect
diagnostics such as jacobian_condition_number and consider supplying
start = list(beta = ..., W = ...) when needed.
Variance: The EL engine supports bootstrap standard errors via
variance_method = "bootstrap" or can skip variance with
variance_method = "none".
Bootstrap uses no additional packages for IID resampling, and will run
sequentially by default. If the suggested future.apply package is
installed, IID bootstrap can use future.apply::future_lapply() according to
the user's future::plan() for parallel execution.
Bootstrap backend is controlled by the package option nmar.bootstrap_apply:
"auto"(default) Use base::lapply() unless the current
future plan has more than one worker, in which case use
future.apply::future_lapply() if available.
"base"Always use base::lapply(), even
if future.apply is installed.
"future"Always use future.apply::future_lapply().
For survey.design inputs, replicate-weight bootstrap requires the
suggested packages survey and svrep.
A list of class "nmar_engine_el" containing configuration fields
to be supplied to nmar() together with a formula and data.
Qin, J., Leung, D., and Shao, J. (2002). Estimation with survey data under nonignorable nonresponse or informative sampling. Journal of the American Statistical Association, 97(457), 193-200. doi:10.1198/016214502753479338
Chen, J., and Sitter, R. R. (1999). A pseudo empirical likelihood approach for the effective use of auxiliary information in complex surveys. Statistica Sinica, 9, 385-406.
Wu, C. (2005). Algorithms and R codes for the pseudo empirical likelihood method in survey sampling. Survey Methodology, 31(2), 239-243.
nmar, weights.nmar_result, summary.nmar_result
set.seed(1) n <- 200 X <- rnorm(n) Y <- 2 + 0.5 * X + rnorm(n) p <- plogis(-0.7 + 0.4 * scale(Y)[, 1]) R <- runif(n) < p if (all(R)) R[1] <- FALSE df <- data.frame(Y_miss = Y, X = X) df$Y_miss[!R] <- NA_real_ # Estimate auxiliary mean from full data eng <- el_engine(auxiliary_means = NULL, variance_method = "none") # Put X in both the auxiliary block and the response model fit <- nmar(Y_miss ~ X | X, data = df, engine = eng) summary(fit) # Response-only predictors can be placed to the right of | Z <- rnorm(n) df2 <- data.frame(Y_miss = Y, X = X, Z = Z) df2$Y_miss[!R] <- NA_real_ eng2 <- el_engine(auxiliary_means = NULL, variance_method = "none") fit2 <- nmar(Y_miss ~ X | Z, data = df2, engine = eng2) print(fit2) # Survey design usage if (requireNamespace("survey", quietly = TRUE)) { des <- survey::svydesign(ids = ~1, weights = ~1, data = df) eng3 <- el_engine(auxiliary_means = NULL, variance_method = "none") fit3 <- nmar(Y_miss ~ X, data = des, engine = eng3) summary(fit3) }set.seed(1) n <- 200 X <- rnorm(n) Y <- 2 + 0.5 * X + rnorm(n) p <- plogis(-0.7 + 0.4 * scale(Y)[, 1]) R <- runif(n) < p if (all(R)) R[1] <- FALSE df <- data.frame(Y_miss = Y, X = X) df$Y_miss[!R] <- NA_real_ # Estimate auxiliary mean from full data eng <- el_engine(auxiliary_means = NULL, variance_method = "none") # Put X in both the auxiliary block and the response model fit <- nmar(Y_miss ~ X | X, data = df, engine = eng) summary(fit) # Response-only predictors can be placed to the right of | Z <- rnorm(n) df2 <- data.frame(Y_miss = Y, X = X, Z = Z) df2$Y_miss[!R] <- NA_real_ eng2 <- el_engine(auxiliary_means = NULL, variance_method = "none") fit2 <- nmar(Y_miss ~ X | Z, data = df2, engine = eng2) print(fit2) # Survey design usage if (requireNamespace("survey", quietly = TRUE)) { des <- survey::svydesign(ids = ~1, weights = ~1, data = df) eng3 <- el_engine(auxiliary_means = NULL, variance_method = "none") fit3 <- nmar(Y_miss ~ X, data = des, engine = eng3) summary(fit3) }
Extract engine configuration
engine_config(x)engine_config(x)
x |
An object inheriting from class 'nmar_engine'. |
A named list of configuration fields.
Returns identifier for an engine object.
engine_name(x)engine_name(x)
x |
An object inheriting from class 'nmar_engine'. |
A single character string, e.g. "empirical_likelihood".
Constructs a configuration for the exponential tilting estimator under
nonignorable nonresponse.
The estimator solves using nleqslv to apply EM algorithm.
exptilt_engine( standardize = FALSE, on_failure = c("return", "error"), variance_method = c("bootstrap", "none"), bootstrap_reps = 10, supress_warnings = FALSE, control = list(), family = c("logit", "probit"), y_dens = c("normal", "lognormal", "exponential", "binomial"), stopping_threshold = 1, sample_size = 2000 )exptilt_engine( standardize = FALSE, on_failure = c("return", "error"), variance_method = c("bootstrap", "none"), bootstrap_reps = 10, supress_warnings = FALSE, control = list(), family = c("logit", "probit"), y_dens = c("normal", "lognormal", "exponential", "binomial"), stopping_threshold = 1, sample_size = 2000 )
standardize |
logical; standardize predictors. Default |
on_failure |
character; |
variance_method |
character; one of |
bootstrap_reps |
integer; number of bootstrap replicates when
|
supress_warnings |
Logical; suppress variance-related warnings. |
control |
Named list of control parameters passed to
See |
family |
character; response model family, either |
y_dens |
Outcome density model ( |
stopping_threshold |
Numeric; early stopping threshold. If the maximum absolute value of the score function falls below this threshold, the algorithm stops early (default: 1). |
sample_size |
Integer; maximum sample size for stratified random sampling (default: 2000). When the dataset exceeds this size, a stratified random sample is drawn to optimize memory usage. The sampling preserves the ratio of respondents to non-respondents in the original data. |
The method is a robust Propensity-Score Adjustment (PSA) approach for Not Missing at Random (NMAR).
It uses Maximum Likelihood Estimation (MLE), basing the likelihood on the observed part of the sample (), making it robust against outcome model misspecification.
The propensity score is estimated by assuming an instrumental variable () that is independent of the response status given other covariates and the study variable.
Estimator calculates fractional imputation weights .
The final estimator is a weighted average, where the weights are the inverse of the estimated response probabilities , satisfying the estimating equation:
where is the set of observed respondents.
An engine object of class c("nmar_engine_exptilt","nmar_engine").
This is a configuration list; it is not a fit. Pass it to nmar.
Minsun Kim Riddles, Jae Kwang Kim, Jongho Im A Propensity-score-adjustment Method for Nonignorable Nonresponse Journal of Survey Statistics and Methodology, Volume 4, Issue 2, June 2016, Pages 215–245.
generate_test_data <- function( n_rows = 500, n_cols = 1, case = 1, x_var = 0.5, eps_var = 0.9, a = 0.8, b = -0.2 ) { # Generate X variables - fixed to match comparison X <- as.data.frame(replicate(n_cols, rnorm(n_rows, 0, sqrt(x_var)))) colnames(X) <- paste0("x", 1:n_cols) # Generate Y - fixed coefficients to match comparison eps <- rnorm(n_rows, 0, sqrt(eps_var)) if (case == 1) { # Use fixed coefficient of 1 for all x variables to match: y = -1 + x1 + epsilon X$Y <- as.vector(-1 + as.matrix(X) %*% rep(1, n_cols) + eps) } else if (case == 2) { X$Y <- -2 + 0.5 * exp(as.matrix(X) %*% rep(1, n_cols)) + eps } else if (case == 3) { X$Y <- -1 + sin(2 * as.matrix(X) %*% rep(1, n_cols)) + eps } else if (case == 4) { X$Y <- -1 + 0.4 * as.matrix(X)^3 %*% rep(1, n_cols) + eps } Y_original <- X$Y # Missingness mechanism - identical to comparison pi_obs <- 1 / (1 + exp(-(a + b * X$Y))) # Create missing values mask <- runif(nrow(X)) > pi_obs mask[1] <- FALSE # Ensure at least one observation is not missing X$Y[mask] <- NA return(list(X = X, Y_original = Y_original)) } res_test_data <- generate_test_data(n_rows = 500, n_cols = 1, case = 1) x <- res_test_data$X exptilt_config <- exptilt_engine( y_dens = 'normal', control = list(maxit = 10), stopping_threshold = 0.1, standardize = FALSE, family = 'logit', bootstrap_reps = 5 ) formula = Y ~ x1 res <- nmar(formula = formula, data = x, engine = exptilt_config, trace_level = 1) summary(res)generate_test_data <- function( n_rows = 500, n_cols = 1, case = 1, x_var = 0.5, eps_var = 0.9, a = 0.8, b = -0.2 ) { # Generate X variables - fixed to match comparison X <- as.data.frame(replicate(n_cols, rnorm(n_rows, 0, sqrt(x_var)))) colnames(X) <- paste0("x", 1:n_cols) # Generate Y - fixed coefficients to match comparison eps <- rnorm(n_rows, 0, sqrt(eps_var)) if (case == 1) { # Use fixed coefficient of 1 for all x variables to match: y = -1 + x1 + epsilon X$Y <- as.vector(-1 + as.matrix(X) %*% rep(1, n_cols) + eps) } else if (case == 2) { X$Y <- -2 + 0.5 * exp(as.matrix(X) %*% rep(1, n_cols)) + eps } else if (case == 3) { X$Y <- -1 + sin(2 * as.matrix(X) %*% rep(1, n_cols)) + eps } else if (case == 4) { X$Y <- -1 + 0.4 * as.matrix(X)^3 %*% rep(1, n_cols) + eps } Y_original <- X$Y # Missingness mechanism - identical to comparison pi_obs <- 1 / (1 + exp(-(a + b * X$Y))) # Create missing values mask <- runif(nrow(X)) > pi_obs mask[1] <- FALSE # Ensure at least one observation is not missing X$Y[mask] <- NA return(list(X = X, Y_original = Y_original)) } res_test_data <- generate_test_data(n_rows = 500, n_cols = 1, case = 1) x <- res_test_data$X exptilt_config <- exptilt_engine( y_dens = 'normal', control = list(maxit = 10), stopping_threshold = 0.1, standardize = FALSE, family = 'logit', bootstrap_reps = 5 ) formula = Y ~ x1 res <- nmar(formula = formula, data = x, engine = exptilt_config, trace_level = 1) summary(res)
Constructs a configuration for the nonparametric exponential tilting estimator
under nonignorable nonresponse.
This engine implements the "Fully Nonparametric Approach" from **Appendix 2**
of Riddles et al. (2016). The estimator uses an
Expectation-Maximization (EM) algorithm to directly estimate the
nonresponse odds for aggregated, categorical data.
exptilt_nonparam_engine(refusal_col = "", max_iter = 100, tol_value = 1e-06)exptilt_nonparam_engine(refusal_col = "", max_iter = 100, tol_value = 1e-06)
refusal_col |
character; the column name in |
max_iter |
integer; the maximum number of iterations for the EM algorithm. |
tol_value |
numeric; the convergence tolerance for the EM algorithm. The loop stops when the sum of absolute changes in the odds matrix is less than this value. |
This engine is designed for cases where all variables (outcomes $Y$,
response predictors $X_1$, and instrumental variables $X_2$) are categorical,
and the input data is pre-aggregated into strata.
The method assumes an instrumental variable is available. The
response probability is assumed to depend on and $Y$, but *not*
on .
The EM algorithm iteratively solves for the nonresponse odds:
where is the expected count of non-respondents
(calculated in the E-step) and is the observed count
of respondents for a given stratum $(x_1, y)$.
The final output from the nmar call is an object containing
data_to_return, an aggregated data frame where the original
'refusal' counts have been redistributed into the outcome columns
(e.g., 'Voted_A', 'Voted_B') as expected non-respondent counts.
An engine object of class c("nmar_engine_exptilt_nonparam","nmar_engine").
This is a configuration list; it is not a fit. Pass it to nmar.
Minsun Kim Riddles, Jae Kwang Kim, Jongho Im A Propensity-score-adjustment Method for Nonignorable Nonresponse Journal of Survey Statistics and Methodology, Volume 4, Issue 2, June 2016, Pages 215–245. (See **Appendix 2** for this specific method).
# Test data (Riddles 2016, Table 9) voting_data_example <- data.frame( Gender = rep(c("Male", "Male", "Male", "Male", "Female", "Female", "Female", "Female"), 1), Age_group = c("20-29", "30-39", "40-49", ">=50", "20-29", "30-39", "40-49", "50+"), Voted_A = c(93, 104, 146, 560, 106, 129, 170, 501), Voted_B = c(115, 233, 295, 350, 159, 242, 262, 218), Other = c(4, 8, 5, 3, 8, 5, 5, 7), Refusal = c(28, 82, 49, 174, 62, 70, 69, 211), Total = c(240, 427, 495, 1087, 335, 446, 506, 937) ) np_em_config <- exptilt_nonparam_engine( refusal_col = "Refusal", max_iter = 100, tol_value = 0.001 ) # Formula: Y1 + Y2 + ... ~ X1_vars | X2_vars # Here, Y = Voted_A, Voted_B, Other # x1 = Gender (response model) # x2 = Age_group (instrumental variable) em_formula <- Voted_A + Voted_B + Other ~ Gender | Age_group results_em_np <- nmar( formula = em_formula, data = voting_data_example, engine = np_em_config, trace_level = 0 ) # View the final adjusted counts # (Original counts + expected non-respondent counts) print(results_em_np$data_final)# Test data (Riddles 2016, Table 9) voting_data_example <- data.frame( Gender = rep(c("Male", "Male", "Male", "Male", "Female", "Female", "Female", "Female"), 1), Age_group = c("20-29", "30-39", "40-49", ">=50", "20-29", "30-39", "40-49", "50+"), Voted_A = c(93, 104, 146, 560, 106, 129, 170, 501), Voted_B = c(115, 233, 295, 350, 159, 242, 262, 218), Other = c(4, 8, 5, 3, 8, 5, 5, 7), Refusal = c(28, 82, 49, 174, 62, 70, 69, 211), Total = c(240, 427, 495, 1087, 335, 446, 506, 937) ) np_em_config <- exptilt_nonparam_engine( refusal_col = "Refusal", max_iter = 100, tol_value = 0.001 ) # Formula: Y1 + Y2 + ... ~ X1_vars | X2_vars # Here, Y = Voted_A, Voted_B, Other # x1 = Gender (response model) # x2 = Age_group (instrumental variable) em_formula <- Voted_A + Voted_B + Other ~ Gender | Age_group results_em_np <- nmar( formula = em_formula, data = voting_data_example, engine = np_em_config, trace_level = 0 ) # View the final adjusted counts # (Original counts + expected non-respondent counts) print(results_em_np$data_final)
Returns fitted response probabilities.
## S3 method for class 'nmar_result' fitted(object, ...)## S3 method for class 'nmar_result' fitted(object, ...)
object |
An 'nmar_result' object. |
... |
Ignored. |
A numeric vector (possibly length 0).
Returns a single concise line summarizing an engine configuration.
## S3 method for class 'nmar_engine' format(x, ...)## S3 method for class 'nmar_engine' format(x, ...)
x |
An engine object inheriting from 'nmar_engine'. |
... |
Unused. |
A length-1 character vector.
Returns the estimation formula if available.
## S3 method for class 'nmar_result' formula(x, ...)## S3 method for class 'nmar_result' formula(x, ...)
x |
An 'nmar_result' object. |
... |
Ignored. |
A formula or 'NULL'.
One-row diagnostics for NMAR fits.
## S3 method for class 'nmar_result' glance(x, ...)## S3 method for class 'nmar_result' glance(x, ...)
x |
An object of class 'nmar_result'. |
... |
Ignored. |
A one-row data frame with diagnostics and metadata.
Interface for NMAR estimation.
nmar() validates basic inputs and dispatches to an engine (for example
el_engine). The engine controls the estimation method and
interprets formula. See the engine documentation for model-specific
requirements.
nmar(formula, data, engine, trace_level = 0)nmar(formula, data, engine, trace_level = 0)
formula |
A two-sided formula. Engines support a partitioned
right-hand side via |
data |
A |
engine |
An NMAR engine configuration object created by
|
trace_level |
Integer 0-3; controls verbosity during estimation
(default
|
An object of class "nmar_result" with an engine-specific subclass
(for example "nmar_result_el"). Use summary(),
se, confint(), weights(), coef(),
fitted(), and generics::tidy() / generics::glance() to
access estimates, standard errors, weights, and diagnostics.
el_engine, exptilt_engine,
exptilt_nonparam_engine, summary.nmar_result,
weights.nmar_result
set.seed(1) n <- 200 x1 <- rnorm(n) z1 <- rnorm(n) y_true <- 0.5 + 0.3 * x1 + 0.2 * z1 + rnorm(n, sd = 0.3) resp <- rbinom(n, 1, plogis(2 + 0.1 * y_true + 0.1 * z1)) if (all(resp == 1)) resp[sample.int(n, 1)] <- 0L y_obs <- ifelse(resp == 1, y_true, NA_real_) # Empirical likelihood engine df_el <- data.frame(Y_miss = y_obs, X = x1, Z = z1) eng_el <- el_engine(variance_method = "none") fit_el <- nmar(Y_miss ~ X | Z, data = df_el, engine = eng_el) summary(fit_el) # Exponential tilting engine dat_et <- data.frame(y = y_obs, x2 = z1, x1 = x1) eng_et <- exptilt_engine( y_dens = "normal", family = "logit", variance_method = "none" ) fit_et <- nmar(y ~ x2 | x1, data = dat_et, engine = eng_et) summary(fit_et) # Survey design if (requireNamespace("survey", quietly = TRUE)) { w <- runif(n, 0.5, 2) des <- survey::svydesign(ids = ~1, weights = ~w, data = data.frame(Y_miss = y_obs, X = x1, Z = z1)) eng_svy <- el_engine(variance_method = "none") fit_svy <- nmar(Y_miss ~ X | Z, data = des, engine = eng_svy) summary(fit_svy) } # Bootstrap variance usage # future.apply is optional, if installed, bootstrap may run in parallel under # the user's future::plan() set.seed(2) eng_boot <- el_engine( variance_method = "bootstrap", bootstrap_reps = 20 ) fit_boot <- nmar(Y_miss ~ X | Z, data = df_el, engine = eng_boot) se(fit_boot)set.seed(1) n <- 200 x1 <- rnorm(n) z1 <- rnorm(n) y_true <- 0.5 + 0.3 * x1 + 0.2 * z1 + rnorm(n, sd = 0.3) resp <- rbinom(n, 1, plogis(2 + 0.1 * y_true + 0.1 * z1)) if (all(resp == 1)) resp[sample.int(n, 1)] <- 0L y_obs <- ifelse(resp == 1, y_true, NA_real_) # Empirical likelihood engine df_el <- data.frame(Y_miss = y_obs, X = x1, Z = z1) eng_el <- el_engine(variance_method = "none") fit_el <- nmar(Y_miss ~ X | Z, data = df_el, engine = eng_el) summary(fit_el) # Exponential tilting engine dat_et <- data.frame(y = y_obs, x2 = z1, x1 = x1) eng_et <- exptilt_engine( y_dens = "normal", family = "logit", variance_method = "none" ) fit_et <- nmar(y ~ x2 | x1, data = dat_et, engine = eng_et) summary(fit_et) # Survey design if (requireNamespace("survey", quietly = TRUE)) { w <- runif(n, 0.5, 2) des <- survey::svydesign(ids = ~1, weights = ~w, data = data.frame(Y_miss = y_obs, X = x1, Z = z1)) eng_svy <- el_engine(variance_method = "none") fit_svy <- nmar(Y_miss ~ X | Z, data = des, engine = eng_svy) summary(fit_svy) } # Bootstrap variance usage # future.apply is optional, if installed, bootstrap may run in parallel under # the user's future::plan() set.seed(2) eng_boot <- el_engine( variance_method = "bootstrap", bootstrap_reps = 20 ) fit_boot <- nmar(Y_miss ~ X | Z, data = df_el, engine = eng_boot) se(fit_boot)
This dataset is derived from the 'h05' dataset (Polish household budgets for 2005) found in the 'RClas' package. The original data was cleaned to remove all rows with missing values.
polish_householdspolish_households
A data frame with 19,330 rows and 17 columns. The key variables are:
TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
TODO
Numeric. The **true** scaled expenditure ('expenditure / mean(expenditure)'). This is the complete study variable without missingness.
TODO
Integer. The simulated response indicator (1=responded, 0=nonresponse).
Numeric. The **observed** scaled expenditure, containing 7,778 'NA' values where 'R = 0'. This is the variable to be used as the NMAR-affected outcome.
To create a realistic test case for nonignorable nonresponse (NMAR), a nonresponse mechanism was simulated and applied to the scaled expenditure variable ('y_exp').
The key simulation steps were: 1. 'y_exp' (true study variable) was created by scaling total expenditure. 2. A true response probability ('resp') was created using the logistic model: 'plogis(1 - 0.6 * y_exp)'. 3. A response indicator ('R') was simulated based on this probability. 4. The final variable 'y_exp_miss' was generated by setting 'y_exp' to 'NA' wherever 'R' was 0.
The response is **nonignorable** because the probability of missingness depends directly on the value of the expenditure variable itself.
TODO
'riddles_case1', 'riddles_case2', 'riddles_case3', 'riddles_case4'
Compact summary for 'nmar_engine' objects.
## S3 method for class 'nmar_engine' print(x, ...)## S3 method for class 'nmar_engine' print(x, ...)
x |
An engine object inheriting from 'nmar_engine'. |
... |
Unused. |
'x', invisibly.
Print method for nmar_result
## S3 method for class 'nmar_result' print(x, ...)## S3 method for class 'nmar_result' print(x, ...)
x |
nmar_result object |
... |
Additional parameters |
'x', invisibly.
Print for objects of class nmar_result_el.
## S3 method for class 'nmar_result_el' print(x, ...)## S3 method for class 'nmar_result_el' print(x, ...)
x |
An object of class |
... |
Ignored. |
x, invisibly.
This print method is tailored for 'nmar_result_exptilt' objects and shows a concise, human-friendly summary of the estimation result together with exptilt-specific diagnostics (loss, iterations) and a compact view of the response coefficients stored in the fitted model.
## S3 method for class 'nmar_result_exptilt' print(x, ...)## S3 method for class 'nmar_result_exptilt' print(x, ...)
x |
An object of class 'nmar_result_exptilt'. |
... |
Ignored. |
'x', invisibly.
Print method for summary.nmar_result
## S3 method for class 'summary_nmar_result' print(x, ...)## S3 method for class 'summary_nmar_result' print(x, ...)
x |
summary_nmar_result object |
... |
Additional parameters |
'x', invisibly.
A simulated dataset of 500 observations based on Simulation Study I (Model 1, Case 1) of Riddles, Kim, and Im (2016). The data features a nonignorable nonresponse (NMAR) mechanism where the response probability depends on the study variable 'y'.
riddles_case1riddles_case1
A data frame with 500 rows and 4 variables:
Numeric. The auxiliary variable, x ~ Normal(0, 0.5).
Numeric. The study variable with nonignorable nonresponse. 'y' contains 'NA's for nonrespondents.
Numeric. The complete, true value of 'y' before missingness was introduced.
Integer. The response indicator (1 = responded, 0 = nonresponse).
This dataset was generated using the following model parameters (n = 500):
x ~ Normal(mean = 0, variance = 0.5)
e ~ Normal(mean = 0, variance = 0.9)
y_true = -1 + x + e
logit(pi) = 0.8 - 0.2 * y_true
Riddles, M. K., Kim, J. K., & Im, J. (2016). A Propensity-Score-Adjustment Method for Nonignorable Nonresponse. Journal of Survey Statistics and Methodology, 4(1), 1-31.
A simulated dataset of 500 observations based on Simulation Study I (Model 1, Case 2) of Riddles, Kim, and Im (2016). The data features a nonignorable nonresponse (NMAR) mechanism where the response probability depends on the study variable 'y'.
riddles_case2riddles_case2
A data frame with 500 rows and 4 variables:
Numeric. The auxiliary variable, x ~ Normal(0, 0.5).
Numeric. The study variable with nonignorable nonresponse. 'y' contains 'NA's for nonrespondents.
Numeric. The complete, true value of 'y' before missingness was introduced.
Integer. The response indicator (1 = responded, 0 = nonresponse).
This dataset was generated using the following model parameters (n = 500):
x ~ Normal(mean = 0, variance = 0.5)
e ~ Normal(mean = 0, variance = 0.9)
y_true = -2 + 0.5 * exp(x) + e
logit(pi) = 0.8 - 0.2 * y_true
Riddles, M. K., Kim, J. K., & Im, J. (2016). A Propensity-Score-Adjustment Method for Nonignorable Nonresponse. Journal of Survey Statistics and Methodology, 4(1), 1-31.
A simulated dataset of 500 observations based on Simulation Study I (Model 1, Case 3) of Riddles, Kim, and Im (2016). The data features a nonignorable nonresponse (NMAR) mechanism where the response probability depends on the study variable 'y'.
riddles_case3riddles_case3
A data frame with 500 rows and 4 variables:
Numeric. The auxiliary variable, x ~ Normal(0, 0.5).
Numeric. The study variable with nonignorable nonresponse. 'y' contains 'NA's for nonrespondents.
Numeric. The complete, true value of 'y' before missingness was introduced.
Integer. The response indicator (1 = responded, 0 = nonresponse).
This dataset was generated using the following model parameters (n = 500):
x ~ Normal(mean = 0, variance = 0.5)
e ~ Normal(mean = 0, variance = 0.9)
y_true = -1 + sin(2 * x) + e
logit(pi) = 0.8 - 0.2 * y_true
Riddles, M. K., Kim, J. K., & Im, J. (2016). A Propensity-Score-Adjustment Method for Nonignorable Nonresponse. Journal of Survey Statistics and Methodology, 4(1), 1-31.
A simulated dataset of 500 observations based on Simulation Study I (Model 1, Case 4) of Riddles, Kim, and Im (2016). The data features a nonignorable nonresponse (NMAR) mechanism where the response probability depends on the study variable 'y'.
riddles_case4riddles_case4
A data frame with 500 rows and 4 variables:
Numeric. The auxiliary variable, x ~ Normal(0, 0.5).
Numeric. The study variable with nonignorable nonresponse. 'y' contains 'NA's for nonrespondents.
Numeric. The complete, true value of 'y' before missingness was introduced.
Integer. The response indicator (1 = responded, 0 = nonresponse).
This dataset was generated using the following model parameters (n = 500):
x ~ Normal(mean = 0, variance = 0.5)
e ~ Normal(mean = 0, variance = 0.9)
y_true = -1 + 0.4 * x^3 + e
logit(pi) = 0.8 - 0.2 * y_true
Riddles, M. K., Kim, J. K., & Im, J. (2016). A Propensity-Score-Adjustment Method for Nonignorable Nonresponse. Journal of Survey Statistics and Methodology, 4(1), 1-31.
Returns the standard error of the primary mean estimate.
se(object, ...)se(object, ...)
object |
An 'nmar_result' or subclass. |
... |
Ignored. |
Numeric scalar.
Summary method for nmar_result
## S3 method for class 'nmar_result' summary(object, conf.level = 0.95, ...)## S3 method for class 'nmar_result' summary(object, conf.level = 0.95, ...)
object |
nmar_result object |
conf.level |
Confidence level for intervals. |
... |
Additional parameters |
An object of class 'summary_nmar_result'.
Summarize estimation, standard error and missingness-model coefficients.
## S3 method for class 'nmar_result_el' summary(object, ...)## S3 method for class 'nmar_result_el' summary(object, ...)
object |
An object of class |
... |
Ignored. |
An object of class summary_nmar_result_el.
Summarize estimation, standard error and model coefficients.
## S3 method for class 'nmar_result_exptilt' summary(object, conf.level = 0.95, ...)## S3 method for class 'nmar_result_exptilt' summary(object, conf.level = 0.95, ...)
object |
An object of class 'nmar_result_exptilt'. |
conf.level |
Confidence level for confidence interval (default 0.95). |
... |
Ignored. |
An object of class 'summary_nmar_result_exptilt'.
Return a data frame with the primary estimate and missingness-model coefficients.
## S3 method for class 'nmar_result' tidy(x, conf.level = 0.95, ...)## S3 method for class 'nmar_result' tidy(x, conf.level = 0.95, ...)
x |
An object of class 'nmar_result'. |
conf.level |
Confidence level for the primary estimate. |
... |
Ignored. |
A data frame with one row for the primary estimate and, when available, additional rows for the response-model coefficients.
Variance-covariance for NMAR results
## S3 method for class 'nmar_result' vcov(object, ...)## S3 method for class 'nmar_result' vcov(object, ...)
object |
An object of class 'nmar_result'. |
... |
Ignored. |
A 1x1 numeric matrix (the variance of the primary estimate).
This dataset contains the aggregated exit poll results for the Gangdong-Gap district in Seoul from the 2012 nineteenth South Korean legislative election. The data is transcribed directly from Table 9 of Riddles, Kim, and Im (2016).
votingvoting
A data frame with 8 rows and 7 variables:
Factor. The gender of the voter ("Male", "Female").
Character. The age group of the voter.
Numeric. Count of respondents voting for Party A.
Numeric. Count of respondents voting for Party B.
Numeric. Count of respondents voting for another party.
Numeric. Count of sampled individuals who refused to respond (this is the nonresponse count).
Numeric. Total individuals sampled in the group (Responders + Refusals).
In the paper's application, 'Gender' is used as the nonresponse instrumental variable and 'Age_group' is the primary auxiliary variable .
Riddles, M. K., Kim, J. K., & Im, J. (2016). A Propensity-Score-Adjustment Method for Nonignorable Nonresponse. *Journal of Survey Statistics and Methodology*, 4(1), 1–31. (Data from Table 9, p. 20).
Return analysis weights stored in an 'nmar_result' as either probability-scale (summing to 1) or population-scale (summing to 'sample$n_total'). The function normalizes stored masses and attaches informative attributes.
## S3 method for class 'nmar_result' weights(object, scale = c("probability", "population"), ...)## S3 method for class 'nmar_result' weights(object, scale = c("probability", "population"), ...)
object |
An 'nmar_result' object. |
scale |
One of '"probability"' (default) or '"population"'. |
... |
Additional arguments (ignored). |
Numeric vector of weights with length equal to the number of respondents.