Introduction
meta-analysis combines quantitative findings from multiple studies to evaluate the consistency of research outcomes and derive a more precise estimate of an effect1. It has become indispensable in fields such as psychology, medicine, education, and the social sciences, where evidence synthesis is crucial2. A meta-analysis is a fundamental tool in basic and clinical research, combining multiple studies’ results to obtain more objective conclusions. Increasing the sample size improves precision and statistical power, reducing the risk of false positives or negatives. It allows trends to be identified, heterogeneity between studies to be assessed, and publication biases to be detected. In medicine, guiding clinical decisions and generating high-quality evidence is key. Its ability to synthesize information makes it an essential methodology for validating treatments and understanding various health conditions1.
R is a free, open-source programming language offering extensive data management tools, statistical analysis, and graphics3. Among the many available packages, metafor4 stands out for its flexibility in performing both fixed- and random-effects meta-analyses. Other packages – including meta5, netmeta6, dmetar7, robumeta8, and bayesmeta9 – further enhance the analytical capabilities in R. These tools enable researchers to address specialized questions, such as comparing multiple interventions through network meta-analysis or examining trends over time with cumulative meta-analysis, and to work with individual participant data (IPD).
This review study aims to review the fundamental concepts and statistical principles of meta-analysis and provide a step-by-step guide for conducting various types of meta-analyses in R. In addition, it explores alternative approaches, including network meta-analysis, cumulative meta-analysis, individual patient data (IPD) meta-analysis, Bayesian meta-analysis, and multivariate meta-analysis.
Theoretical background
Effect sizes and their importance
Effect sizes provide a standardized metric for comparing outcomes across studies. Typical measures include, among others:
- − Cohen’s d and Hedges’ g: used for continuous outcomes10.
- − Odds ratios: often applied in medical and epidemiological research11.
- − Correlation coefficients (r): frequently used in social science studies12.
Standardizing these measures enables researchers to combine data from studies that may have used different scales13. Cooper, hedges, and valentine note that selecting and carefully computing effect sizes is essential for drawing valid conclusions in meta-analysis.
Fixed-effects versus random-effects models
A critical decision in meta-analysis is choosing between fixed-effects and random-effects models:
- − Fixed-effects model: assumes that all studies share a single true effect size and that any observed differences are due to random sampling error14.
- − Random-effects model: recognizes that accurate effect sizes may vary from study to study due to differences in study characteristics, populations, or interventions15. This model incorporates a between-study variance component (τ2) and is generally more appropriate when combining diverse studies4.
Assessing heterogeneity
Heterogeneity refers to variations in study outcomes beyond what would be expected by chance. It is typically quantified using the following:
- − Cochran’s Q test: evaluates whether the variability in effect sizes exceeds what is expected by chance alone16.
- − I2 Statistic (mainly): this statistic expresses the percentage of total variability attributable to heterogeneity rather than sampling error17.
- These metrics help determine the appropriate model and guide the interpretation of the results13.
Publication bias
Publication bias occurs when studies with significant or positive results are more likely to be published, potentially disturbing overall findings11,18. Standard tools for detecting publication bias include:
- − Funnel plots: graphs that plot effect sizes against study precision; symmetry suggests a low risk of bias18.
- − Egger’s test: a statistical method to assess the asymmetry of the funnel plot11.
- Recent methodological advances have further refined these diagnostic tools. Improving the quality of data and interpretation19,20.
Conducting a meta-analysis in R
The first step is to compile and organize data from each study (data preparation). A typical dataset includes:
- − Study identifier: a unique name or code for each study.
- − Effect size (yi): the standardized measure of effect.
- − Variance (vi): the variance associated with the effect size.
Nonetheless, there are a bunch of different variables to include, according to the type of meta-analysis, and multiple ways to describe data. For example, the following R code creates a sample dataset:
- # Load the metafor package
- library(metafor)
- # Create a sample dataset with five studies data <- data.frame(
- study = c (“Study 1”, “Study 2”, “Study 3”, “Study 4”, “Study 5”),
- yi = c (0.2, 0.5, −0.1, 0.3, 0.4),
- vi = c(0.04, 0.06, 0.05, 0.03, 0.04)
- )
- # Display the dataset
- Print(data)
Table 1 summarizes the sample data with study identifiers, effect sizes, and variances1,2.
Table 1. Sample data for meta-analysis
| Study | Effect size (yi) | Variance (vi) |
|---|---|---|
| Study 1 | 0.2 | 0.04 |
| Study 2 | 0.5 | 0.06 |
| Study 3 | −0.1 | 0.05 |
| Study 4 | 0.3 | 0.03 |
| Study 5 | 0.4 | 0.04 |
Performing the meta-analysis
Using the prepared data, a random-effects meta-analysis is conducted with the DerSimonian-Laird estimator. The following code illustrates the process:
- # Perform a random-effects meta-analysis res <- rma(yi, vi, data = data, method = “DL”)
- # Print the summary of the meta-analysis summary(res)
The rma() function calculates the pooled effect size and estimates heterogeneity (τ2), providing key statistics such as Cochran’s Q and I2,4,14.
Visualization
Visual tools are essential for interpreting meta-analytic results. The forest and the funnel plot are the most common ways to describe data visually.
Forest plot
A forest plot displays individual study effect sizes, confidence intervals (CI), and the overall pooled effect. The following code generates a forest plot:
# Create a forest plot
forest(res, slab = data$study, xlab = “Effect Size”, mlab = “Overall Effect”)
Figure 1 schematically represents a typical forest plot1,15.
Figure 1. Schematic representation of a forest plot.
Funnel plot
A funnel plot helps assess publication bias by plotting effect sizes against study precision. The following code produces a funnel plot:
# Generate a funnel plot
funnel(res)
Figure 2 provides a schematic illustration of a funnel plot11,18–20.
Figure 2. Schematic representation of a funnel plot.
Prediction interval
In a random-effects meta-analysis, the pooled estimate reflects the average true effect across studies, but substantial heterogeneity means that the effect in a future study may differ21,22. A 95% prediction interval (PI) provides the range in which the true effect of a new, similar study is expected to fall with 95% probability23–26. Unlike the CI of the pooled effect, which reflects uncertainty around the mean effect, the PI incorporates both within-study error and between-study heterogeneity, offering a more clinically meaningful interpretation of variability across settings27,28.
In R, PIs can be easily computed:
- # metafor
- pred_res <- predict(res, digits = 3)
- pred_res
- # meta package
- library(meta)
- meta_res <- metagen(TE = data$yi,
- seTE = sqrt(data$vi),
- studlab = data$study,
- sm = “SMD,”
- comb.random = TRUE,
- prediction = TRUE)
- summary(meta_res)
Including the PI is recommended, especially when heterogeneity (τ2, I2) is present, because it helps readers understand the possible range of effects in practice.
Alternative types of meta-analysis and R packages
Beyond traditional pairwise meta-analysis, several alternative methods address complex research questions and data structures. Below are five key approaches.
Network meta-analysis (NMA)
NMA (also known as multiple-treatments meta-analysis) compares three or more interventions simultaneously by synthesizing both direct (head-to-head) and indirect comparisons (via a common comparator). This approach is especially useful in clinical and public health research where direct comparisons may be limited.
METHODOLOGICAL CONSIDERATIONS
- − Assumptions: NMA assumes transitivity; effect modifiers are similarly distributed across comparisons.
- − Consistency: it is vital to check for consistency between direct and indirect evidence.
- − Heterogeneity: as with pairwise analyses, heterogeneity is assessed using I2, Q, and τ2.
R PACKAGES AND IMPLEMENTATION
- − netmeta: a widely used package for frequentist NMAs that offers functions for consistency checks, network diagrams, and forest plots6.
Example code:
- library(netmeta)
- # Sample dataset for NMA
- nma_data <- data.frame(
- study = c(“StudyA”, “StudyB”, “StudyC”, “StudyD”),
- treat1 = c(“DrugA”, “DrugA”, “DrugB”, “DrugC”),
- treat2 = c(“DrugB”, “DrugC”, “DrugC”, “DrugD”),
- TE = c(0.3, −0.1, 0.2, 0.5),
- seTE = c(0.1, 0.15, 0.1, 0.2)
- )
- # Conduct NMA using standardized mean differences
- net <- netmeta(TE, seTE, treat1, treat2, studlab = study, data = nma_data, sm = “SMD”)
- summary(net)
- netgraph(net).
Cumulative meta-analysis
Cumulative meta-analysis updates the pooled effect size as new studies are added, typically chronologically. This method reveals how evidence accumulates over time and helps determine when further studies may have little impact on overall conclusions20.
METHODOLOGICAL CONSIDERATIONS
- − Temporal trends: sequentially adding studies can highlight trends or shifts in effect sizes, indicating changes in study quality or interventions.
- − Stability: a stable cumulative effect suggests robustness, whereas fluctuations may indicate heterogeneity.
R PACKAGES AND IMPLEMENTATION
- − meta: provides functions to sequentially add studies and visualize evolving summary effects5.
Example code:
- library(meta)
- # Create a meta-analysis object using sample data
- meta_res <- metagen(TE = data$yi, seTE = sqrt(data$vi), studlab = data$study, sm = “SMD”)
- # Perform cumulative meta-analysis
- cum_meta <- metacum(meta_res, studlab = data$study)
- # Plot cumulative results
- forest(cum_meta).
IPD meta-analysis
IPD meta-analysis aggregates raw data from each study rather than relying solely on summary statistics. This approach allows for detailed subgroup analyses, covariate adjustments, and exploration of interactions, thereby offering higher precision21–24.
METHODOLOGICAL CONSIDERATIONS
- − Data harmonization: ensuring variable compatibility across studies is crucial.
- − Statistical modeling: mixed-effects or hierarchical models are typically used to account for within-study clustering.
- − Resource intensity: gathering individual-level data requires significant effort and collaboration.
R PACKAGES AND IMPLEMENTATION
- − ipdmeta: designed for pooling and analyzing IPD with models that account for clustering23,24.
- − metafor: can be adapted for IPD analysis by incorporating study-level random effects.
- Example consideration:
- library(lme4)
- # Assume ipd_data are a data frame with columns: outcome, treatment, covariate, and study
- ipd_model <- lmer(outcome ~ treatment + covariate + (1 | study), data = ipd_data)
- summary(ipd_model).
Bayesian meta-analysis
Bayesian meta-analysis incorporates prior knowledge and quantifies uncertainty probabilistically. This framework is beneficial when data are sparse or studies are heterogeneous, offering complete posterior distributions and credible intervals25.
METHODOLOGICAL CONSIDERATIONS
- − Prior specification: choosing appropriate priors is critical; informative priors can improve estimates when data are limited.
- − Computational complexity: bayesian models often use mcmc sampling, which may require significant computing resources.
- − Interpretation: results include posterior distributions and credible intervals, which can be more intuitive than traditional Cis.
R PACKAGES AND IMPLEMENTATION
- − bayesmeta: provides an accessible interface for Bayesian meta-analysis with options to specify priors9
- − Other packages such as gemtc, rstanarm, and brms also support flexible Bayesian modeling.
- Example code:
- library(bayesmeta)
- # Conduct Bayesian meta-analysis using sample data
- bayes_res <- bayesmeta(y = data$yi, sigma = sqrt(data$vi))
- summary(bayes_res)
- plot(bayes_res).
Multivariate meta-analysis
Many studies report multiple correlated outcomes. Multivariate meta-analysis synthesizes these outcomes simultaneously, accounting for their correlations and providing a more comprehensive analysis than separate univariate models23.
METHODOLOGICAL CONSIDERATIONS
- − Correlation structure: accurately modeling within-study correlations is essential.
- − Model complexity: multivariate models are more complex and may require specialized software.
R PACKAGES AND IMPLEMENTATION
- − metafor: handles multivariate meta-analysis using the rma.mv() function, which allows for multiple random-effects terms and incorporates correlation structures23.
Example code:
- library(metafor)
- # Simulated data with two correlated outcomes per study
- data_mv <- data.frame(
- study = paste(“Study”, 1:5, sep= ““),
- yi1 = c(0.2, 0.5, −0.1, 0.3, 0.4),
- vi1 = c(0.04, 0.06, 0.05, 0.03, 0.04),
- yi2 = c(0.1, 0.4, 0.0, 0.2, 0.3),
- vi2 = c(0.05, 0.07, 0.06, 0.04, 0.05)
- )
- # For demonstration, assume a constant within-study correlation
- mv_res <- rma.mv(yi = c(data_mv$yi1, data_mv$yi2),
- V = rep(data_mv$vi1, 2), # Simplified for demonstration
- random = ~ 1 | study,
- data = data_mv)
- summary(mv_res).
Putting all together in an example
Using the sample data from table 1, the random-effects model produced the following summary output:
Random-effects model (k = 5; τ² estimator: DL)
- − τ2 (between-study variance): 0.0050 (SE = 0.0032).
- − τ (square root of τ²): 0.0707.
- − I² (percentage of variability due to heterogeneity): 27.4%.
- − H² (total variability/sampling variability): 1.38.
TEST FOR HETEROGENEITY
− Q (df = 4) = 5.23, p = 0.266.
Model results
This output indicates a pooled effect size of approximately 0.28, with a 95% CI ranging from 0.133 to 0.427. An I2 of 27.4% suggests that about one-quarter of the variability is due to actual differences between studies, and the Q test does not indicate significant heterogeneity1,16.
In meta-analyses, the I2 statistic is widely used, but its interpretation as an absolute measure of heterogeneity can be misleading, as it depends on the sample size and the number of studies included. For this reason, many methodological guidelines recommend supplementing (or preferring) Cochran’s Q and tau-squared (τ2) to quantify the actual variability between studies and make an appropriate decision between fixed-effect and random-effect models. For example, in an analysis with few studies, a low I2 does not rule out significant heterogeneity if τ2 is high; in contrast, a high I2 may reflect the effect size and number of studies more than true clinical variability.
This approach – integrating Q, τ2, and I2 – allows a more accurate and reliable interpretation of combined results, especially in meta-analyses in urology where the heterogenous nature of the studies (small sample size, variability of techniques, heterogeneous designs) is common.
Common pitfalls in meta-analysis in R and how to solve them
While R offers powerful tools for meta-analysis, several common pitfalls can compromise the quality and interpretability of your analysis. Awareness of these challenges and adopting best practices can significantly improve your work.
Poor data quality and incomplete data
Pitfall: inaccurate, incomplete, or poorly coded data can lead to biased estimates or incorrect conclusions.
Solution: ensure thorough data cleaning and validation. Develop a detailed codebook and verify data accuracy before analysis. Use reproducible data management workflows and consider sensitivity analyses to assess the impact of missing or uncertain data13.
Incorrect data formatting and input
Pitfall: misformatted data (e.g., incorrect column names or data types) can result in errors or unexpected results when using R packages.
Solution: follow the package documentation for required data structures. Validate data frames with summary statistics and visual inspections (e.g., using str() and summary()) before analysis.
| Estimate | Standard error | z-value | p | 95% confidence interval lower | 95% confidence interval upper |
|---|---|---|---|---|---|
| 0.2800 | 0.0750 | 3.73 | 0.0002 | 0.1330 | 0.4270 |
Model misspecification (fixed vs. random effects)
Pitfall: choosing an inappropriate model (fixed vs. random effects) may lead to misinterpretation of heterogeneity and effect size estimates.
Solution: assess study heterogeneity using statistics such as I², Q, and τ², and justify your model choice based on the data characteristics. If in doubt, perform both analyses and compare results.
Inadequate assessment of heterogeneity
Pitfall: overlooking the evaluation of heterogeneity can mask true variability across studies
Solution: Always compute heterogeneity measures and visualize them using forest plots. If heterogeneity is substantial, consider subgroup analyses or meta-regression16,17.
Ignoring publication bias
Pitfall: failure to assess publication bias can skew overall findings if studies with non-significant results are underrepresented.
Solution: utilize funnel plots and formal tests (e.g., Egger’s test) to detect publication bias. When bias is suspected, discuss its potential impact on the conclusions and consider adjusting your analysis accordingly11,18.
Overlooking sensitivity analyses and outlier detection
Pitfall: not testing the robustness of your results by exploring the influence of individual studies may lead to overconfidence in the findings.
Solution: conduct sensitivity analyses by removing outliers or influential studies, and compare the stability of your estimates. Packages like dmetar offer tools for such diagnostics.
Misinterpretation of statistical outputs and visualizations
Pitfall: misreading the outputs (e.g., confusing statistical significance with clinical relevance) can misguide conclusions.
Solution: familiarize yourself with the statistical outputs of R packages and complement quantitative findings with clear visualizations (e.g., forest and funnel plots). Consult methodological references when in doubt4,15.
Inadequate reporting and reproducibility
Pitfall: failing to document code and data processing steps undermines the reproducibility of the meta-analysis
Solution: maintain well-documented scripts and comment on your code. Consider sharing your code and data via repositories like GitHub to promote transparency and reproducibility3.
Interpretation of findings
The illustrative meta-analysis suggests a moderate overall effect size, indicating that the intervention under study has a positive impact. The moderate heterogeneity (I2 ≈ 27.4%) indicates that although some variability exists among studies, the overall findings are consistent. These results should be interpreted in the context of study designs, populations, and interventions2.
Advantages and limitations of meta-analysis in R
Advantages
- − Reproducibility: R’s scripting environment facilitates the sharing and replication of analyses, thereby enhancing transparency28.
- − Flexibility: R supports various meta-analytic approaches, from standard pairwise to advanced Bayesian models5, with various packages.
- − Visualization: R’s graphics capabilities enable the production of publication-quality plots that effectively communicate findings29.
- − Extensibility: R integrates well with other tools, enabling advanced techniques such as robust variance estimation and automated data extraction27.
Limitations
- − Learning curve: R requires programming knowledge, which can be a barrier for beginners3.
- − Data quality: the accuracy of a meta-analysis depends on the quality of the input data. Variability in study designs and reporting can introduce bias19.
- − Publication bias: even with diagnostic tools like funnel plots, publication bias remains a concern that must be addressed through careful sensitivity analyses11.
Recent advances and future directions
As I discussed earlier, recent developments have further refined meta-analytic methods:
- − Bayesian meta-analysis: incorporating prior information offers more nuanced uncertainty quantification25.
- − NMA: this approach is increasingly popular for comparing multiple interventions simultaneously20.
- − Cumulative meta-analysis: analyzing how evidence accumulates over time can reveal when an intervention’s effect stabilizes20.
- − IPD meta-analysis: although resource-intensive, IPD meta-analysis remains the gold standard for detailed subgroup analyses23,24.
- − Automated data extraction: emerging natural language processing techniques are beginning to streamline the data extraction process27.
Practical recommendations for researchers
- − Data collection and management: carefully extract and code effect sizes and variances while maintaining a detailed codebook for transparency27.
- − Model selection: use heterogeneity statistics (Q, I2, τ2) to choose between fixed- or random-effects models and perform subgroup and sensitivity analyses.
- − Diagnostic testing: employ forest and funnel plots alongside formal tests to assess heterogeneity and publication bias11,18.
- − Specialized approaches: consider network, cumulative, or IPD meta-analyses for complex research questions.
- − Reproducibility: thoroughly document your code and consider sharing it via platforms like GitHub28.
- − Stay current: engage with ongoing research and methodological updates to refine your meta-analytic techniques3.
Applications in urology
In order to illustrate how the methodological procedures described in this manuscript can be applied, specific examples based on actual urological evidence are presented below, structured with published data and minimal reconstructions of summarized effects. This allows us to show how to reproduce analyses even when the individual data from each study are not available, facilitating a quantitative synthesis applicable to clinical practice.
Antibiotic prophylaxis in cystoscopy
Study: efficacy of antibiotic prophylaxis in cystoscopy to prevent urinary tract infection (UTI) – García-Perdomo et al., 201527. Total patients included: 3,038. Primary outcome (UTI): relative risk (RR) = 0.53 (95% CI: 0.31-0.90). Asymptomatic bacteriuria: RR = 0.28 (95% CI: 0.20-0.39). Since the primary data per arm are not available, a dataset can be constructed based on summary effects, transforming the published RRs to log(RR) and estimating the variance from the CIs. For example, in R:
- library(metafor)
- data <- data.frame(
- study = c(“García-Perdomo 2015 – UTI”, García-Perdomo 2015 – Bacteriuria”),
- yi = c(log(0.53), log(0.28)),
- sei = c((log(0.90)-log(0.31))/3.92, (log(0.39)-log (0.20))/3.92)
- )
- data$vi <- data$sei2
- print(data)
This dataset allows you to reproduce a simplified meta-analysis with rma(), evaluate consistency of results, and demonstrate the quantitative synthesis process. An RR < 1 indicates a protective effect (risk reduction), although its clinical relevance should be interpreted with caution.
Predictors of ureteral stent failure in malignant obstruction
Study: Predictors for failure of endoscopic ureteric stenting in malignant ureteric obstruction – Guachetá-Bomba, Echeverría-García and García-Perdomo (BJU Int., 2021)28. Total number of patients: 761; 30-day stent failure rate: 32% (95% CI: 21-45%). Reported risk factors include bladder invasion/trigon deformity (HR = 4.8; 95% CI: 1.28-8.5), severe hydronephrosis (HR = 3.92; 95% CI: 0.32-7.52), extensive tumor involvement (HR = 2.1; 95% CI: 1.1-3.9), and elevated creatinine (> 2 mg/dL) (HR = 1.7; 95% CI: 1.04-2.80). With this data, a dataset of summary effects (log-HR) and estimated variance can be constructed for each predictor:
- library(metafor)
- data_stent <- data.frame(
- predictor = c(“InvasionTrigono”,”HidronefrosisSevera”,”CompromisoTumoral”,”CreatininaElevada”), yi = c(log(4.8), log(3.92), log(2.1), log(1.7)),
- sei = c(
- (log(8.5)-log(1.28))/3.92,
- (log(7.52)-log(0.32))/3.92,
- (log(3.9)-log(1.1))/3.92,
- (log(2.80)-log(1.04))/3.92
- )
- )
- data_stent$vi <- data_stent$seI2
- print(data_stent)
This approach allows for exploratory meta-analysis by predictor, estimation of combined log-HRs, and evaluation of the robustness and heterogeneity of the results. A positive log-HR indicates an elevated risk of stent failure associated with the predictor.
Conclusion
Meta-analysis is key to synthesizing evidence and obtaining reliable conclusions. R, with its multiple packages (metafor, netmeta, meta, dmetar, bayesmeta, among others), provides a robust and reproducible environment for these studies. This manuscript reviews standard and advanced approaches with illustrative examples and graphics. Methods such as network, cumulative, IPD, Bayesian, and multivariate meta-analysis allow complex questions to be answered more precisely. Following best practices and updating methodologies ensures rigor, transparency, and better evidence-based decisions.
Funding
The authors declare that this work was carried out with the authors’ own resources.
Conflicts of interest
The authors declare that they have no conflicts of interest.
Ethical considerations
Protection of human subjects and animals. The authors declare that no experiments on humans or animals were performed for this research.
Confidentiality, informed consent, and ethical approval. This study does not involve personal patient data, medical records, or biological samples, and does not require ethical approval. SAGER guidelines do not apply.
Declaration on the use of artificial intelligence (AI). The authors declare that no generative artificial intelligence was used in the writing or creation of the content of this manuscript.
