
nonprobsvy - Inference Based on Non-Probability Samples
Statistical inference with non-probability samples when auxiliary information from external sources such as probability samples or population totals or means is available. The package implements various methods such as inverse probability (propensity score) weighting, mass imputation and doubly robust approach. Details can be found in: Chen et al. (2020) <doi:10.1080/01621459.2019.1677241>, Yang et al. (2020) <doi:10.1111/rssb.12354>, Kim et al. (2021) <doi:10.1111/rssa.12696>, Yang et al. (2021) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2021001/article/00004-eng.htm> and Wu (2022) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2022002/article/00002-eng.htm>. For details on the package and its functionalities see <doi:10.48550/arXiv.2504.04255>.
Last updated
inverse-probability-weightslasso-regressionnonprobability-samplingpropensity-scoressurveyopenblascpp
7.83 score 55 stars 1 dependents 91 scripts 315 downloadsblocking - Various Blocking Methods for Entity Resolution
The goal of 'blocking' is to provide blocking methods for record linkage and deduplication using approximate nearest neighbour (ANN) algorithms and graph techniques. It supports multiple ANN implementations via 'rnndescent', 'RcppHNSW', 'RcppAnnoy', and 'mlpack' packages, and provides integration with the 'reclin2' package. The package generates shingles from character strings and similarity vectors for record comparison, and includes evaluation metrics for assessing blocking performance including false positive rate (FPR) and false negative rate (FNR) estimates. For details see: Papadakis et al. (2020) <doi:10.1145/3377455>, Steorts et al. (2014) <doi:10.1007/978-3-319-11257-2_20>, Dasylva and Goussanou (2021) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X202100200002>, Dasylva and Goussanou (2022) <doi:10.1007/s42081-022-00153-3>.
Last updated
annoyapproximate-nearest-neighbor-searchdeduplicationentity-resolutionhnswigraphrecord-linkage
6.98 score 14 stars 1 dependents 17 scripts 564 downloads
NMAR - Estimation under not Missing at Random Nonresponse
Methods to estimate finite-population parameters under nonresponse that is not missing at random (NMAR, nonignorable). Incorporates auxiliary information and user-specified response models, and supports independent samples and complex survey designs via objects from the 'survey' package. Provides diagnostics and optional variance estimates. For methodological background see Qin, Leung and Shao (2002) <doi:10.1198/016214502753479338> and Riddles, Kim and Im (2016) <doi:10.1093/jssam/smv047>.
Last updated
missing-datanon-ignorableselection-biassurvey
5.60 score 4 stars 8 scripts 156 downloadsautomatedRecLin - Record Linkage Based on an Entropy-Maximizing Classifier
The goal of 'automatedRecLin' is to perform record linkage (also known as entity resolution) in unsupervised or supervised settings. It compares pairs of records from two datasets using selected comparison functions to estimate the probability or density ratio between matched and non-matched records. Based on these estimates, it predicts a set of matches that maximizes entropy. For details see: Lee et al. (2022) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2022001/article/00007-eng.htm>, Vo et al. (2023) <https://ideas.repec.org/a/eee/csdana/v179y2023ics0167947322002365.html>, Sugiyama et al. (2008) <doi:10.1007/s10463-008-0197-x>.
Last updated
entity-resolutionrecord-linkage
5.32 score 2 stars 8 scripts 391 downloadsjointCalib - A Joint Calibration of Totals and Quantiles
A small package containing functions to perform a joint calibration of totals and quantiles. The calibration for totals is based on Deville and Särndal (1992) <doi:10.1080/01621459.1992.10475217>, the calibration for quantiles is based on Harms and Duchesne (2006) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X20060019255>. The package uses standard calibration via the 'survey', 'sampling' or 'laeken' packages. In addition, entropy balancing via the 'ebal' package and empirical likelihood based on codes from Wu (2005) <https://www150.statcan.gc.ca/n1/pub/12-001-x/2005002/article/9051-eng.pdf> can be used. See the paper by Beręsewicz and Szymkowiak (2023) for details <arXiv:2308.13281>. The package also includes functions to reweight the control group to the treatment reference distribution and to balance the covariate distribution using the covariate balancing propensity score via the 'CBPS' package for binary treatment observational studies.
Last updated
calibrationcausal-inferenceprobability-samplessamplingsurveysurvey-methodologyweighting
5.20 score 8 stars 8 scripts 224 downloads
