WMAP (Weighted Multi-site Analysis Package) integrates evidence from multiple observational studies with different group compositions using balancing weights.
# Load demonstration dataset
data(demo)
# The demo data contains:
# S - Study/site indicators (7 hospitals)
# Z - Group indicators (2 groups: IDC vs ILC)
# X - Patient covariates (clinical variables)
# Y - Outcomes (8 gene expression measurements)
# naturalGroupProp - Known population group proportions# Estimate causal effects
result <- causal.estimate(
S = S, # Study indicators
Z = Z, # Group indicators
X = X, # Covariates
Y = Y, # Outcomes
B = 20, # Bootstrap samples (use 100+ in practice)
method = "IC", # Weighting method
naturalGroupProp = naturalGroupProp,
seed = 123 # For reproducibility
)
#> *************************
#> Bootstrap: 1
#> *************************
#> *************************
#> Bootstrap: 2
#> *************************
#> *************************
#> Bootstrap: 3
#> *************************
#> *************************
#> Bootstrap: 4
#> *************************
#> *************************
#> Bootstrap: 5
#> *************************
#> *************************
#> Bootstrap: 6
#> *************************
#> *************************
#> Bootstrap: 7
#> *************************
#> *************************
#> Bootstrap: 8
#> *************************
#> *************************
#> Bootstrap: 9
#> *************************
#> *************************
#> Bootstrap: 10
#> *************************
#> *************************
#> Bootstrap: 11
#> *************************
#> *************************
#> Bootstrap: 12
#> *************************
#> *************************
#> Bootstrap: 13
#> *************************
#> *************************
#> Bootstrap: 14
#> *************************
#> *************************
#> Bootstrap: 15
#> *************************
#> *************************
#> Bootstrap: 16
#> *************************
#> *************************
#> Bootstrap: 17
#> *************************
#> *************************
#> Bootstrap: 18
#> *************************
#> *************************
#> Bootstrap: 19
#> *************************
#> *************************
#> Bootstrap: 20
#> *************************# Print summary
summary(result)
#> ===========================================
#> Summary of WMAP Causal Estimates
#> ===========================================
#> Method: IC
#> Bootstrap samples: 20
#> Effective sample size (ESS): 44.20%
#>
#> Mean differences with 95% CI:
#> -------------------------------------------
#> Outcome Estimate (95% CI)
#> -------------------------------------------
#> Y 1 0.04 (-0.19, -0.14)
#> Y 2 -0.71 (-0.75, -0.71)
#> Y 3 -0.84 (-0.84, -0.82)
#> Y 4 -0.14 (-0.16, -0.11)
#> Y 5 0.37 ( 0.33, 0.37)
#> Y 6 -0.53 (-0.31, -0.31)
#> Y 7 0.02 (-0.03, 0.02)
#> Y 8 -0.45 (-0.51, -0.50)
#>
#> Standard Deviation Ratios with 95% CI:
#> -------------------------------------------
#> Outcome Ratio (95% CI)
#> -------------------------------------------
#> Y 1 1.52 ( 1.61, 1.69)
#> Y 2 1.38 ( 1.12, 1.15)
#> Y 3 1.41 ( 1.32, 1.36)
#> Y 4 1.60 ( 1.31, 1.35)
#> Y 5 1.77 ( 1.64, 1.65)
#> Y 6 1.11 ( 1.17, 1.23)
#> Y 7 0.77 ( 0.59, 0.74)
#> Y 8 1.19 ( 0.71, 0.76)
# Visualize bootstrap ESS distribution
plot(result)Effective Sample Size (ESS): The Percent ESS indicates how much information remains after weighting:
Mean Differences: Shows estimated differences between Group 1 and Group 2 with 95% confidence intervals.
SD Ratios: Ratios of standard deviations between groups.
WMAP provides three methods:
# Compare methods
result_ic <- causal.estimate(S, Z, X, Y, B = 100, method = "IC", naturalGroupProp)
result_igo <- causal.estimate(S, Z, X, Y, B = 100, method = "IGO", naturalGroupProp)
result_flexor <- causal.estimate(S, Z, X, Y, B = 100, method = "FLEXOR", naturalGroupProp)
# Compare ESS
result_ic$percentESS
result_igo$percentESS
result_flexor$percentESSBy default, causal.estimate uses observed outcomes
directly (outcome_model = FALSE). Set
outcome_model = TRUE to fit a random forest on X, S, Z and
substitute predicted outcomes for observed Y in all moment calculations
(mean, SD, median).
# Default: use observed outcomes directly
result <- causal.estimate(S, Z, X, Y, B = 100, method = "IC",
naturalGroupProp = naturalGroupProp)
# RF predictions substituted for observed Y
result <- causal.estimate(S, Z, X, Y, B = 100, method = "IC",
naturalGroupProp = naturalGroupProp,
outcome_model = TRUE)When outcome_model = TRUE, a random forest is fit per
outcome per bootstrap iteration, which increases runtime.
By default (crossfit_folds = NULL), the random forest is
fit on the full sample. Set crossfit_folds to an integer
>= 2 for K-fold cross-fitting:
# Minimum for stable CIs
result <- causal.estimate(S, Z, X, Y, B = 50, method = "IC", naturalGroupProp)
# Recommended for publication
result <- causal.estimate(S, Z, X, Y, B = 200, method = "IC", naturalGroupProp)
# High precision
result <- causal.estimate(S, Z, X, Y, B = 500, method = "IC", naturalGroupProp)Trade-off: More bootstrap samples = better precision but longer computation time
Skip bootstrap for quick exploration:
Speed up computation using multiple cores:
# Install required packages first
install.packages(c("future", "future.apply"))
# Define n_cores based on your system (e.g., 4)
result <- causal.estimate(
S, Z, X, Y,
B = 100,
method = "IC",
naturalGroupProp = naturalGroupProp,
parallel = TRUE,
n_cores = 4
)The parallelization strategy depends on the environment. On
Mac/Linux, multicore is used: workers are created by
duplicating the current R process, so data is already in memory and no
copying is required. On Windows, multisession is used: each
worker is a fresh R process that receives a copy of the data, which has
higher startup overhead. RStudio disables multicore
regardless of platform, so multisession will be used when
running inside RStudio. For best performance on Mac/Linux, run your
script from the terminal using Rscript or an interactive R
session.
WMAP automatically checks for common problems and warns you:
# 1. Load and check data
data(demo)
table(S, Z) # Check all cells have observations
sum(is.na(cbind(S, Z, X, Y))) # Check for missing values
# 2. Run analysis
result <- causal.estimate(
S, Z, X, Y,
B = 100,
method = "FLEXOR",
naturalGroupProp = naturalGroupProp,
seed = 123
)
# 3. Check diagnostics
summary(result)
plot(result)# Compare all three methods
set.seed(123)
results_ic <- causal.estimate(S, Z, X, Y, B = 100, method = "IC", naturalGroupProp)
results_igo <- causal.estimate(S, Z, X, Y, B = 100, method = "IGO", naturalGroupProp)
results_flexor <- causal.estimate(S, Z, X, Y, B = 100, method = "FLEXOR", naturalGroupProp)
# Compare ESS
cat("IC ESS:", results_ic$percentESS, "%\n")
cat("IGO ESS:", results_igo$percentESS, "%\n")
cat("FLEXOR ESS:", results_flexor$percentESS, "%\n")
# Compare estimates
summary(results_ic)
summary(results_igo)
summary(results_flexor)# Test sensitivity to bootstrap sample size
result_b50 <- causal.estimate(S, Z, X, Y, B = 50, method = "IGO", naturalGroupProp, seed = 123)
result_b100 <- causal.estimate(S, Z, X, Y, B = 100, method = "IGO", naturalGroupProp, seed = 123)
result_b200 <- causal.estimate(S, Z, X, Y, B = 200, method = "IGO", naturalGroupProp, seed = 123)
# Compare CI widths
summary(result_b50)
summary(result_b100)
summary(result_b200)Extract balancing weights without running full causal analysis: