Package: blocking 1.0.3
blocking: Various Blocking Methods for Entity Resolution
The goal of 'blocking' is to provide blocking methods for record linkage and deduplication using approximate nearest neighbour (ANN) algorithms and graph techniques. It supports multiple ANN implementations via 'rnndescent', 'RcppHNSW', 'RcppAnnoy', and 'mlpack' packages, and provides integration with the 'reclin2' package. The package generates shingles from character strings and similarity vectors for record comparison, and includes evaluation metrics for assessing blocking performance including false positive rate (FPR) and false negative rate (FNR) estimates. For details see: Papadakis et al. (2020) <doi:10.1145/3377455>, Steorts et al. (2014) <doi:10.1007/978-3-319-11257-2_20>, Dasylva and Goussanou (2021) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X202100200002>, Dasylva and Goussanou (2022) <doi:10.1007/s42081-022-00153-3>.
Authors:
blocking_1.0.3.tar.gz
blocking_1.0.3.zip(r-4.7)blocking_1.0.3.zip(r-4.6)blocking_1.0.3.zip(r-4.5)
blocking_1.0.3.tgz(r-4.6-any)blocking_1.0.3.tgz(r-4.5-any)
blocking_1.0.3.tar.gz(r-4.7-any)blocking_1.0.3.tar.gz(r-4.6-any)
blocking_1.0.3.tgz(r-4.6-emscripten)
manual.pdf |manual.html✨
card.svg |card.png
blocking/json (API)
NEWS
| # Install 'blocking' in R: |
| install.packages('blocking', repos = c('https://ncn-foreigners.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/ncn-foreigners/blocking/issues
Pkgdown/docs site:https://ncn-foreigners.ue.poznan.pl
- census - Fictional census data
- cis - Fictional customer data
- foreigners - Fictional 2024 population of foreigners in Poland
- RLdata500 - RLdata500 dataset from the RecordLinkage package
annoyapproximate-nearest-neighbor-searchdeduplicationentity-resolutionhnswigraphrecord-linkage
Last updated from:f06c0737b9. Checks:9 OK. Indexed: yes.
| Target | Result | Time | Files | Syslog |
|---|---|---|---|---|
| linux-devel-x86_64 | OK | 255 | ||
| source / vignettes | OK | 359 | ||
| linux-release-x86_64 | OK | 216 | ||
| macos-release-arm64 | OK | 190 | ||
| macos-oldrel-arm64 | OK | 218 | ||
| windows-devel | OK | 160 | ||
| windows-release | OK | 177 | ||
| windows-oldrel | OK | 216 | ||
| wasm-release | OK | 146 |
Exports:blockingcontrol_annoycontrol_hnswcontrol_kdcontrol_lshcontrol_nndcontrols_anncontrols_txtest_block_errorpair_ann
Dependencies:BHbitbit64clicliprcpp11crayondata.tabledigestdqrngfloatgluehmsigraphlatticelgrlifecyclemagrittrMatrixMatrixExtramlapimlpackpillarpkgconfigprettyunitsprogressR6RcppRcppAnnoyRcppArmadilloRcppEnsmallenRcppHNSWreadrRhpcBLASctlrlangrnndescentrsparsesitmoSnowballCstringitext2vectibbletidyselecttokenizerstzdbutf8vctrsvroomwithr
Blocking records for deduplication
Rendered fromv1-deduplication.Rmdusingknitr::rmarkdownon May 14 2026.Last update: 2026-02-08
Started: 2023-11-05
Blocking records for record linkage
Rendered fromv2-reclin.Rmdusingknitr::rmarkdownon May 14 2026.Last update: 2025-12-27
Started: 2023-11-05
Integration with existing packages
Rendered fromv3-integration.Rmdusingknitr::rmarkdownon May 14 2026.Last update: 2025-12-27
Started: 2025-05-31
Readme and manuals
Help Manual
| Help page | Topics |
|---|---|
| Block records based on character vectors | blocking |
| Fictional census data | census |
| Fictional customer data | cis |
| Controls for the Annoy algorithm | control_annoy |
| Controls for the HNSW algorithm | control_hnsw |
| Controls for the k-d tree algorithm | control_kd |
| Controls for the LSH algorithm | control_lsh |
| Controls for the NND algorithm | control_nnd |
| Controls for approximate nearest neighbours algorithms | controls_ann |
| Controls for processing character data | controls_txt |
| Estimate errors due to blocking in record linkage | est_block_error |
| Fictional 2024 population of foreigners in Poland | foreigners |
| Integration with the reclin2 package | pair_ann |
| RLdata500 dataset from the RecordLinkage package | RLdata500 |
