bibnets constructs bibliometric networks from scholarly
metadata. It imports the export formats of the major bibliographic
databases, converts them internally to a common tabular representation,
and projects that representation into networks through a single function
per network type. The package covers the standard constructions and adds
several that are less commonly available: position-based attention
weighting, aggregation of entities into higher-level networks, a range
of counting and similarity weights, and temporal construction over time
windows.
bibnets reads Scopus, Web of Science, OpenAlex,
Lens.org, Dimensions, Crossref, BibTeX, and RIS exports.
read_biblio() detects the format from file content and
dispatches to the corresponding reader; all readers return an identical
schema, so records from different databases can be combined without
manual reconciliation. Multi-valued fields — authors, references, and
keywords — are parsed into list-columns. A data frame already in this
form is used directly, by naming the relevant column, without a
reader.
Dedicated builders construct co-authorship, co-citation, bibliographic coupling, keyword co-occurrence, direct citation, and historiograph networks; a generic builder covers other projections. The builders share one interface and return the same edge list, so the network type is determined by the function name.
Counting methods determine each publication’s contribution to an edge. They range from full and fractional counting to position-aware schemes (harmonic, geometric, golden-ratio, first, last, and first–last), and six similarity measures — cosine, association strength, Jaccard, inclusion, and equivalence — rescale the projected weights.
Attention weighting assigns each author a positional weight that sums to one across the byline, so a publication’s credit is distributed by byline position rather than equally and a large author list does not dominate the network. Aggregation pools the references or members of a group to construct collaboration and coupling networks among countries, institutions, or sources rather than individuals. Temporal construction applies any builder over fixed, sliding, or cumulative windows, and a disparity-filter backbone retains the edges that are significant relative to each node’s strength.
The incidence matrix is stored as a sparse dgCMatrix and
projected with crossprod() or tcrossprod();
edges are extracted without forming a dense node-by-node matrix, so
memory scales with the number of non-zero co-occurrences rather than
with the square of the vocabulary. The package imports only Matrix,
stats, and utils.
Constructed networks are exported to igraph, tidygraph, cograph, Gephi, GraphML, and sparse-matrix representations.
Every builder returns a
bibnets_network: a tidy data frame with
four columns —
from, to — the two endpoints of an
edge,count — the raw binary co-occurrence count for that
pair,weight — the analytical weight after counting and
optional similarity normalization.With counting = "full" and
similarity = "none", weight equals
count. They diverge once fractional counting or a
similarity measure is applied.
The builders, at a glance:
| Function | Nodes | An edge means |
|---|---|---|
author_network() |
authors | co-authorship, author coupling, or co-citation |
reference_network() |
cited references | two references cited together |
document_network() |
documents | shared references, shared citers, or direct citation |
keyword_network() |
keywords | two keywords appear together |
source_network() |
journals | sources share references or are co-cited |
country_network() |
countries | countries collaborate or share references |
institution_network() |
institutions | institutions collaborate or share references |
conetwork() |
any field | entities co-occur, or share values of another field |
local_citations() |
documents | within-corpus citation counts |
historiograph() |
documents | directed citation history among top-cited papers |
temporal_network() |
any builder’s nodes | the same network over time windows |
You do not need a special reader. Any data frame with one row per paper works — point a builder at the column that holds the entity and tell it the delimiter:
papers <- data.frame(
`Author Names` = c("Smith J, Doe A, Lee K", "Smith J, Lee K",
"Doe A, Lee K", "Smith J, Doe A"),
check.names = FALSE
)
author_network(papers, authors = "Author Names", sep = ",")
#> # bibnets network: author_collaboration | 3 nodes · 3 edges | counting: full
#> from to weight count
#> 1 DOE A LEE K 2 2
#> 2 DOE A SMITH J 2 2
#> 3 LEE K SMITH J 2 2If your data is a scholarly export instead, read it first — the format is detected from the file content — then build with the defaults:
Either way the result is the same four-column edge list, ready to inspect, prune, or export.
read_biblio() accepts a file, a folder, or several
files, and detects Scopus, Web of Science, OpenAlex, BibTeX, RIS,
Lens.org, and Dimensions from the content:
data <- read_biblio("export.csv")
data <- read_biblio("folder_with_exports/")
data <- read_biblio(c("part_1.csv", "part_2.csv"))The format-specific readers can also be called directly
(read_scopus(), read_wos(),
read_openalex_csv(), read_dimensions(),
read_lens(), read_bibtex(),
read_ris()).
For a CSV that matches no known export, map each source column onto a
standard field by name — authors,
keywords, references, countries,
affiliations, or journal. Naming any of them
reads the file as a generic CSV, so you do not pass format
yourself:
data <- read_biblio(
"custom.csv",
id = "paper_id",
authors = "Author Names",
keywords = "Tags",
sep = ","
)Each mapped column is split on sep into the standard
list-column, so afterwards every builder works with its defaults.
As the quick start showed, you can skip the reader entirely and let the builder split a column for you. The same column arguments are available on every builder:
author_network(my_df, authors = "Author Names", sep = ",")
keyword_network(my_df, keywords = "Tags", sep = ",")The work identifier is the id column. You need not
supply one: when no id column is present each row is
treated as one document; pass id = "paper_id" to use a
differently-named column. Surrounding quotes are stripped by default
(strip_quotes = TRUE), and in a coupling network the
references column takes its own references_sep. The
companion vignette("reading-data") covers every reader and
these options in full.
Readers return a common set of columns:
data(scopus_quantum_cloud)
sc <- scopus_quantum_cloud
names(sc)[1:12]
#> [1] "id" "title" "year" "journal"
#> [5] "doi" "cited_by_count" "abstract" "type"
#> [9] "authors" "references" "keywords" "affiliations"The columns that matter for network construction are id,
the list-columns authors / references /
keywords, and year (used by
temporal_network()). Source-specific extras such as
countries, affiliations, and
keywords_plus are kept when available.
Two references are linked when a paper cites both:
refs <- reference_network(sc, min_occur = 2)
head(refs, 5)
#> # bibnets network: reference_co_citation | 7 nodes · 5 edges | counting: full
#> from to weight count
#> 1 HE K., ZHANG X., REN S., SUN … SIMONYAN K., ZISSERMAN A., VE… 10 10
#> 2 HE K., ZHANG X., REN S., SUN … SANDLER M., HOWARD A., ZHU M.… 8 8
#> 3 HAN S., MAO H., DALLY W.J., D… SIMONYAN K., ZISSERMAN A., VE… 8 8
#> 4 HE K., ZHANG X., REN S., SUN … SIMONYAN K., ZISSERMAN A., VE… 8 8
#> 5 KRIZHEVSKY A., HINTON G., LEA… SIMONYAN K., ZISSERMAN A., VE… 7 7A similarity measure offsets the advantage of very frequently cited works:
head(reference_network(sc, min_occur = 2, similarity = "cosine"), 3)
#> # bibnets network: reference_co_citation | 6 nodes · 3 edges | counting: full | similarity: cosine
#> from to weight count
#> 1 ANDRI R., CAVIGELLI L., ROSSI… CAPOTONDI A., RUSCI M., FARIS… 1 2
#> 2 BAI Y., ZENG B., LI C., ZHANG… CASTELLI M., CLEMENTE F.M., P… 1 2
#> 3 CHEN K., ET AL., A DNN OPTIMI… CHEN Y., ET AL., SAMBA: SINGL… 1 2Coupling links two documents that share cited references:
head(document_network(sc, type = "coupling", similarity = "cosine"), 5)
#> # bibnets network: document_coupling | 10 nodes · 5 edges | counting: full | similarity: cosine
#> from to weight count
#> 1 2-s2.0-85169545148 2-s2.0-85150169631 0.4671 12
#> 2 2-s2.0-85203687776 2-s2.0-85200587918 0.3872 10
#> 3 2-s2.0-85131677679 2-s2.0-85172072697 0.3424 7
#> 4 2-s2.0-85187392673 2-s2.0-85124224751 0.269 11
#> 5 2-s2.0-85161914543 2-s2.0-85100337829 0.2443 13Direct citation is directed — from cites to
— and only within the corpus (the cited work must also be a row in the
data):
kw <- keyword_network(sc, min_occur = 2)
head(kw, 5)
#> # bibnets network: keyword_co_occurrence | 5 nodes · 5 edges | counting: full
#> from to weight count
#> 1 EDGE COMPUTING QUANTIZATION 16 16
#> 2 DEEP LEARNING QUANTIZATION 14 14
#> 3 DEEP LEARNING EDGE COMPUTING 13 13
#> 4 DEEP LEARNING FPGA 10 10
#> 5 PRUNING QUANTIZATION 10 10Labels are trimmed and upper-cased during construction, so
machine learning, Machine Learning, and
MACHINE LEARNING are one node. Association strength is a
common choice for co-occurrence maps:
head(keyword_network(sc, min_occur = 2, similarity = "association"), 3)
#> # bibnets network: keyword_co_occurrence | 6 nodes · 3 edges | counting: full | similarity: association
#> from to weight count
#> 1 AIR QUALITY PREDICTION POST-TRAINING QUANTISATION 0.5 2
#> 2 LFSR SEED QUANTIZATION (SIGNAL) 0.5 2
#> 3 BOOTH MULTIPLIERS SHIFT MULTIPLIERS 0.5 2head(country_network(oa, counting = "fractional"), 5)
#> # bibnets network: country_collaboration | 8 nodes · 5 edges | counting: fractional
#> from to weight count
#> 1 BR CL 9.7 11
#> 2 CA US 9.5 13
#> 3 AU US 8.967 15
#> 4 DE NL 8.311 10
#> 5 CN US 8.2 11
head(institution_network(oa, counting = "fractional", min_occur = 2), 5)
#> # bibnets network: institution_collaboration | 10 nodes · 5 edges | counting: fractional
#> from to weight count
#> 1 FINLAND UNIVERSITY UNIVERSITY OF EASTERN FINLAND 5.778 13
#> 2 MAASTRICHT SCHOOL OF MANAGEME… MAASTRICHT UNIVERSITY 4.833 6
#> 3 ESCUELA SUPERIOR POLITECNICA … MONASH UNIVERSITY 4.667 6
#> 4 UNIVERSIDADE FEDERAL DE SANTA… UNIVERSITY OF VALPARAÍSO 4.417 10
#> 5 KUMAMOTO UNIVERSITY KYUSHU UNIVERSITY 4 4
head(source_network(sc, type = "coupling", min_occur = 2), 5)
#> # bibnets network: source_coupling | 5 nodes · 5 edges | counting: full
#> from to weight count
#> 1 IEEE TRANSACTIONS ON CIRCUITS… IEEE TRANSACTIONS ON COMPUTER… 48 48
#> 2 IEEE TRANSACTIONS ON CIRCUITS… PROCEEDINGS OF THE IEEE 40 40
#> 3 IEEE TRANSACTIONS ON CIRCUITS… IEEE TRANSACTIONS ON VERY LAR… 39 39
#> 4 IEEE JOURNAL OF SOLID-STATE C… IEEE TRANSACTIONS ON CIRCUITS… 31 31
#> 5 IEEE TRANSACTIONS ON COMPUTER… IEEE TRANSACTIONS ON VERY LAR… 29 29For coupling networks, min_occur is applied to the
aggregated entity before the network is built.
conetwork() covers projections without a dedicated
wrapper. One field links entities that co-occur; a second field
(by) links them through a shared value:
head(conetwork(sc, "keywords", min_occur = 2), 3)
#> # bibnets network: keywords_co_occurrence | 3 nodes · 3 edges | counting: full
#> from to weight count
#> 1 EDGE COMPUTING QUANTIZATION 16 16
#> 2 DEEP LEARNING QUANTIZATION 14 14
#> 3 DEEP LEARNING EDGE COMPUTING 13 13
head(conetwork(sc, "authors", by = "keywords", min_occur = 2), 3)
#> # bibnets network: authors_by_keywords | 6 nodes · 3 edges | counting: full
#> from to weight count
#> 1 CAI H LIU B 36 36
#> 2 WANG Y YIN S 30 30
#> 3 AMROUCH H ANAGNOSTOPOULOS I 24 24The second result links authors through shared keywords — a thematic similarity network, not a co-authorship one.
The same raw counts support different similarity scores; only
weight changes, count does not:
none <- keyword_network(sc, min_occur = 2, similarity = "none")
cos <- keyword_network(sc, min_occur = 2, similarity = "cosine")
head(none[, c("from", "to", "weight", "count")], 3)
#> # bibnets network: unknown | 3 nodes · 3 edges
#> from to weight count
#> 1 EDGE COMPUTING QUANTIZATION 16 16
#> 2 DEEP LEARNING QUANTIZATION 14 14
#> 3 DEEP LEARNING EDGE COMPUTING 13 13
head(cos[, c("from", "to", "weight", "count")], 3)
#> # bibnets network: unknown | 6 nodes · 3 edges
#> from to weight count
#> 1 AIR QUALITY PREDICTION POST-TRAINING QUANTISATION 1 2
#> 2 LFSR SEED QUANTIZATION (SIGNAL) 1 2
#> 3 BOOTH MULTIPLIERS SHIFT MULTIPLIERS 1 2normalize() uses the diagonal of the projected matrix as
each node’s total occurrence count:
| Similarity | Denominator | Meaning | When to use |
|---|---|---|---|
"none" |
No denominator; the projected matrix is returned as raw weighted co-occurrence, with the diagonal removed by the network builder unless self-loops are requested. | weight stays on the same scale as the counted
projection. |
Use when absolute co-occurrence or counted edge strength is the quantity of interest. |
"cosine" |
Square root of the product of the two node totals. | Symmetric size correction; pairs are high when their overlap is large relative to both nodes’ frequencies. | Use as a general-purpose correction for very frequent nodes while preserving a familiar similarity scale. |
"association" |
Product of the two node totals. | Symmetric association-strength normalization; strongly penalizes pairs involving very frequent nodes. | Use for co-occurrence maps where you want rare, unexpectedly tight pairings to stand out. |
"jaccard" |
Sum of the two node totals minus their observed edge value. | Symmetric overlap over a union-like total. | Use when the edge should represent shared occurrence as a share of either node’s combined footprint. |
"inclusion" |
The smaller of the two node totals. | Symmetric containment-oriented score; it reaches high values when the smaller node mostly appears with the larger one. | Use when subset or specialization relationships are more important than balanced overlap. |
"equivalence" |
Product of the two node totals, with the edge value squared before division. | Cosine-like normalization with stronger penalty for weak or occasional overlap. | Use when following equivalence-index conventions or when only consistently paired nodes should remain strong. |
edges <- author_network(oa, type = "collaboration")
c(all = nrow(edges),
threshold = nrow(prune(edges, threshold = 2)),
top_n = nrow(prune(edges, top_n = 5)),
top_nodes = nrow(filter_top(edges, n = 50)))
#> all threshold top_n top_nodes
#> 12270 779 12188 956prune(threshold = x) — absolute edge-weight
cutoff.prune(top_n = k) — keep each node’s strongest
edges.filter_top(n = k) — keep edges among the most-connected
nodes.backbone() applies the disparity filter, which keeps
edges that are strong relative to a node’s local strength distribution —
not a global cutoff:
temporal_network() runs any builder over time windows
(fixed, sliding, or cumulative):
tn <- temporal_network(oa, author_network, "collaboration", window = 3)
names(tn)
#> [1] "2011-2013" "2014-2016" "2017-2019" "2020-2022" "2023-2025" "2026-2026"Each window’s edge list carries a window column. Windows
with fewer than two records, or no surviving edges, are dropped; a
builder error inside a window becomes a warning labelled with that
window.
local_citations() counts how often each document is
cited by others in the same corpus; historiograph() builds
the directed citation graph among the top-cited documents:
head(local_citations(sc), 5)
#> id lcs gcs year
#> 1 2-s2.0-105007159281 0 0 2025
#> 2 2-s2.0-105006878874 0 0 2025
#> 3 2-s2.0-85211114952 0 0 2024
#> 4 2-s2.0-105001072133 0 0 2025
#> 5 2-s2.0-85210832535 0 5 2025
#> title
#> 1 Quantum Computing in the RAN with Qu4Fec: Closing Gaps Towards Quantum-based FEC Processors
#> 2 An FPGA-based bit-level weight sparsity and mixed-bit accelerator for neural networks
#> 3 FQP: A Fibonacci Quantization Processor with Multiplication-Free Computing and Topological-Order Routing
#> 4 SysCIM: A Heterogeneous Chip Architecture for High-Efficiency CNN Training at Edge
#> 5 Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-Performance and Energy-Efficient Object Detection
#> journal
#> 1 Proceedings of the ACM on Measurement and Analysis of Computing Systems
#> 2 Journal of Systems Architecture
#> 3 Proceedings - Design Automation Conference
#> 4 IEEE Transactions on Very Large Scale Integration (VLSI) Systems
#> 5 Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
#> doi
#> 1 10.1145/3727128
#> 2 10.1016/j.sysarc.2025.103463
#> 3 10.1145/3649329.3656502
#> 4 10.1109/TVLSI.2025.3526363
#> 5 10.1007/978-3-031-73411-3_15
h <- historiograph(sc, n = 10)
h$nodes
#> [1] id lcs gcs year title journal doi
#> <0 rows> (or 0-length row.names)
head(h$edges, 5)
#> [1] from to year_from year_to
#> <0 rows> (or 0-length row.names)Both require reference strings or IDs to match document IDs in the data; if the cited works are external, local counts stay low.
The edge list is already usable; converters cover the common targets:
edges <- keyword_network(sc, min_occur = 2)
m <- to_matrix(edges) # sparse adjacency matrix
m[1:4, 1:4]
#> 4 x 4 sparse Matrix of class "dgCMatrix"
#> ACCELERATION ACCELERATOR ACCURACY AI ACCELERATOR
#> ACCELERATION . . . .
#> ACCELERATOR . . . .
#> ACCURACY . . . .
#> AI ACCELERATOR . . . .
gephi <- to_gephi(edges) # Gephi node/edge tables
head(gephi$edges, 3)
#> Source Target Weight Type count
#> 1 EDGE COMPUTING QUANTIZATION 16 Undirected 16
#> 2 DEEP LEARNING QUANTIZATION 14 Undirected 14
#> 3 DEEP LEARNING EDGE COMPUTING 13 Undirected 13
cat(substr(to_graphml(edges), 1, 200)) # GraphML, no XML dependency
#> <?xml version="1.0" encoding="UTF-8"?>
#> <graphml xmlns="http://graphml.graphdrawing.org/graphml"
#> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
#> xsi:schemaLocation="http://graphto_igraph(), to_tbl_graph(), and
to_cograph() are available when their (suggested) packages
are installed.
bibnets_networkThe object records how it was built, as attributes:
edges <- author_network(oa, type = "collaboration", counting = "harmonic")
c(type = attr(edges, "network_type"),
counting = attr(edges, "counting"),
sim = attr(edges, "similarity"))
#> type counting sim
#> "author_collaboration" "harmonic" "none"
summary(edges)
#> bibnets network
#> ------------------------------
#> Type : author_collaboration
#> Counting : harmonic
#> Similarity : none
#> Nodes : 4029
#> Edges : 12270
#> Density : 0.0015
#> Weight : min 1.43e-05 median 0.0106 max 1.41
#> Top nodes : DRAGAN GAŠEVIĆ(89), ARI KORHONEN(60), CLAUDIA SZABO(60), JUDY SHEARD(60), PAUL DENNY(56)print() reports the network type, node and edge counts,
and the counting and similarity methods — so a saved edge list always
says how it was made.
The methodology implemented in bibnets is described
in:
López-Pernas, S., Saqr, M., & Apiola, M. (2023). Scientometrics: A Concise Introduction and a Detailed Methodology for Mapping the Scientific Field of Computing Education Research. In M. Apiola, S. López-Pernas, & M. Saqr (Eds.), Past, Present and Future of Computing Education Research: A Global Perspective (pp. 79–99). Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-031-25336-2_5
Saqr, M., López-Pernas, S., Conde, M. Á., & Hernández-García, Á. (2024). Social Network Analysis: A primer, a guide and a tutorial in R. In M. Saqr & S. López-Pernas (Eds.), Learning Analytics Methods and Tutorials: A Practical Guide Using R (pp. 491–518). Springer, Cham. https://doi.org/10.1007/978-3-031-54464-4_15