Package: blocking 1.0.3
blocking: Various Blocking Methods for Entity Resolution
The goal of 'blocking' is to provide blocking methods for record linkage and deduplication using approximate nearest neighbour (ANN) algorithms and graph techniques. It supports multiple ANN implementations via 'rnndescent', 'RcppHNSW', 'RcppAnnoy', and 'mlpack' packages, and provides integration with the 'reclin2' package. The package generates shingles from character strings and similarity vectors for record comparison, and includes evaluation metrics for assessing blocking performance including false positive rate (FPR) and false negative rate (FNR) estimates. For details see: Papadakis et al. (2020) <doi:10.1145/3377455>, Steorts et al. (2014) <doi:10.1007/978-3-319-11257-2_20>, Dasylva and Goussanou (2021) <https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X202100200002>, Dasylva and Goussanou (2022) <doi:10.1007/s42081-022-00153-3>.
Authors:
blocking_1.0.3.tar.gz
blocking_1.0.3.zip(r-4.7)blocking_1.0.3.zip(r-4.6)blocking_1.0.3.zip(r-4.5)
blocking_1.0.3.tgz(r-4.6-any)blocking_1.0.3.tgz(r-4.5-any)
blocking_1.0.3.tar.gz(r-4.7-any)blocking_1.0.3.tar.gz(r-4.6-any)
blocking_1.0.3.tgz(r-4.6-emscripten)
manual.pdf |manual.html✨
DESCRIPTION |NEWS
card.svg |card.png
blocking/json (API)
| # Install 'blocking' in R: |
| install.packages('blocking', repos = c('https://ncn-foreigners.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/ncn-foreigners/blocking/issues
Pkgdown/docs site:https://ncn-foreigners.ue.poznan.pl
- census - Fictional census data
- cis - Fictional customer data
- foreigners - Fictional 2024 population of foreigners in Poland
- RLdata500 - RLdata500 dataset from the RecordLinkage package
annoyapproximate-nearest-neighbor-searchdeduplicationentity-resolutionhnswigraphrecord-linkage
Last updated from:6cfda9bab5. Checks:8 OK, 1 ERROR. Indexed: yes.
| Target | Result | Time | Files | Syslog |
|---|---|---|---|---|
| linux-devel-x86_64 | OK | 212 | ||
| source / vignettes | OK | 349 | ||
| linux-release-x86_64 | OK | 200 | ||
| macos-release-arm64 | OK | 114 | ||
| macos-oldrel-arm64 | ERROR | 150 | ||
| windows-devel | OK | 150 | ||
| windows-release | OK | 156 | ||
| windows-oldrel | OK | 144 | ||
| wasm-release | OK | 131 |
Exports:blockingcontrol_annoycontrol_hnswcontrol_kdcontrol_lshcontrol_nndcontrols_anncontrols_txtest_block_errorpair_ann
Dependencies:BHbitbit64clicliprcpp11crayondata.tabledigestdqrngfloatgluehmsigraphlatticelgrlifecyclemagrittrMatrixMatrixExtramlapimlpackpillarpkgconfigprettyunitsprogressR6RcppRcppAnnoyRcppArmadilloRcppEnsmallenRcppHNSWreadrRhpcBLASctlrlangrnndescentrsparsesitmoSnowballCstringitext2vectibbletidyselecttokenizerstzdbutf8vctrsvroomwithr
Last update: 2026-06-14
Started: 2023-11-05
Last update: 2026-06-14
Started: 2025-05-31
Last update: 2025-12-27
Started: 2023-11-05
Readme and manuals
Help Manual
| Help page | Topics |
|---|---|
| Block records based on character vectors | blocking |
| Fictional census data | census |
| Fictional customer data | cis |
| Controls for the Annoy algorithm | control_annoy |
| Controls for the HNSW algorithm | control_hnsw |
| Controls for the k-d tree algorithm | control_kd |
| Controls for the LSH algorithm | control_lsh |
| Controls for the NND algorithm | control_nnd |
| Controls for approximate nearest neighbours algorithms | controls_ann |
| Controls for processing character data | controls_txt |
| Estimate errors due to blocking in record linkage | est_block_error |
| Fictional 2024 population of foreigners in Poland | foreigners |
| Integration with the reclin2 package | pair_ann |
| RLdata500 dataset from the RecordLinkage package | RLdata500 |
