Package: blocking 1.0.3

Maciej Beręsewicz

blocking: Various Blocking Methods for Entity Resolution

The goal of 'blocking' is to provide blocking methods for record linkage and deduplication using approximate nearest neighbour (ANN) algorithms and graph techniques. It supports multiple ANN implementations via 'rnndescent', 'RcppHNSW', 'RcppAnnoy', and 'mlpack' packages, and provides integration with the 'reclin2' package. The package generates shingles from character strings and similarity vectors for record comparison, and includes evaluation metrics for assessing blocking performance including false positive rate (FPR) and false negative rate (FNR) estimates. For details see: Papadakis et al. (2020) <doi:10.1145/3377455>, Steorts et al. (2014) <doi:10.1007/978-3-319-11257-2_20>, Dasylva and Goussanou (2021) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X202100200002>, Dasylva and Goussanou (2022) <doi:10.1007/s42081-022-00153-3>.

Authors:Maciej Beręsewicz [aut, cre], Adam Struzik [aut, ctr]

blocking_1.0.3.tar.gz
blocking_1.0.3.zip(r-4.7)blocking_1.0.3.zip(r-4.6)blocking_1.0.3.zip(r-4.5)
blocking_1.0.3.tgz(r-4.6-any)blocking_1.0.3.tgz(r-4.5-any)
blocking_1.0.3.tar.gz(r-4.7-any)blocking_1.0.3.tar.gz(r-4.6-any)
blocking_1.0.3.tgz(r-4.6-emscripten)
manual.pdf |manual.html
card.svg |card.png
blocking/json (API)
NEWS

# Install 'blocking' in R:
install.packages('blocking', repos = c('https://ncn-foreigners.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/ncn-foreigners/blocking/issues

Pkgdown/docs site:https://ncn-foreigners.ue.poznan.pl

Datasets:
  • census - Fictional census data
  • cis - Fictional customer data
  • foreigners - Fictional 2024 population of foreigners in Poland
  • RLdata500 - RLdata500 dataset from the RecordLinkage package

On CRAN:

Conda:

annoyapproximate-nearest-neighbor-searchdeduplicationentity-resolutionhnswigraphrecord-linkage

6.98 score 14 stars 1 packages 17 scripts 564 downloads 10 exports 49 dependencies

Last updated from:f06c0737b9. Checks:9 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK255
source / vignettesOK359
linux-release-x86_64OK216
macos-release-arm64OK190
macos-oldrel-arm64OK218
windows-develOK160
windows-releaseOK177
windows-oldrelOK216
wasm-releaseOK146

Exports:blockingcontrol_annoycontrol_hnswcontrol_kdcontrol_lshcontrol_nndcontrols_anncontrols_txtest_block_errorpair_ann

Dependencies:BHbitbit64clicliprcpp11crayondata.tabledigestdqrngfloatgluehmsigraphlatticelgrlifecyclemagrittrMatrixMatrixExtramlapimlpackpillarpkgconfigprettyunitsprogressR6RcppRcppAnnoyRcppArmadilloRcppEnsmallenRcppHNSWreadrRhpcBLASctlrlangrnndescentrsparsesitmoSnowballCstringitext2vectibbletidyselecttokenizerstzdbutf8vctrsvroomwithr

Blocking records for deduplication

Rendered fromv1-deduplication.Rmdusingknitr::rmarkdownon May 14 2026.

Last update: 2026-02-08
Started: 2023-11-05

Blocking records for record linkage

Rendered fromv2-reclin.Rmdusingknitr::rmarkdownon May 14 2026.

Last update: 2025-12-27
Started: 2023-11-05

Integration with existing packages

Rendered fromv3-integration.Rmdusingknitr::rmarkdownon May 14 2026.

Last update: 2025-12-27
Started: 2025-05-31