Scott Alexander Malec Assistant Professor

Publications Funding Education Identity

Research Identifiers

ORCID iD: https://orcid.org/0000-0003-1696-1781

Keywords

Literature-based discovery; causal modeling; causal feature selection;

Funding (4)

Using the literature to build causal models of retrospective observational data ✓ NIH

2023-08 to 2026-07 | Award

United States National Library of Medicine (Bethesda, Maryland, US)

Homepage URL: https://reporter.nih.gov/search/agkD0LqiWk-OD1wThSJt3Q/project-details/10879451

GRANT_NUMBER: R00LM013367

Show more detail

Organization identifiers

United States National Library of Medicine: Bethesda, Maryland, US

📄 Project Abstract (from NIH)

Health data contain a wealth of information for research. Health data, such as found in electronic health
records (EHRs), allow for the identification links between health events, such as drug exposures and side-
effects. Some of these links indicate stable dependencies deemed as causes. Causal insight allows reverse-
engineering disease. If confounding is not addressed, it will be difficult to distinguish causative from correlative
links. Our approach is to identify confounders explicitly. Graphical causal modeling (GCMs) can discover
causal links from data and prior knowledge. GCMs summarize causal links between variables. Automated
selection of variables would allow GCMs to scale and yield more insight from data. Literature-based discovery
(LBD) methods were developed to identify links between concepts in the literature. Advanced methods permit
the search for concepts linked to each other through specific verbs, e.g., “causes”, “treats”. Our hypothesis is
that we can exploit structured knowledge extracted from the literature to inform GCMs. In prior work, we found
that LBD + GCM was better at identifying side-effects in EHR data than traditional methods. Compared to
methods which use solely data, we hypothesize that our method will increase the ability to detect causal
relationships from EHR data. The first aim is to determine the extent to which LBD-informed GCM improves the
identification of causal links for drug safety. We will build LBD-informed GCMs using publicly available
reference datasets for drug safety. These reference datasets contain drug/side-effect pairs for performance
benchmarking. (A) Test the ability of GCM algorithms to identify known causal links solely using data. We will
systematically evaluate GCM algorithms based on their ability to re-discover causal links in a reference
standard. Results will guide our studies on how GCM can be tuned. (B) Determine the effect of adding different
subsets of LBD-derived information to GCMs at identifying drug side-effects. We will build causal models using
increasing numbers confounders. The second aim is to test the ability of LBD built with disease-specific
literature to improve the relevance of LBD derived confounders for Alzheimer's Disease (AD). We chose AD for
its high prevalence and relative lack of effective pharmacologic treatment. (A) Compare LBD strategies in a
disease-specific setting. We will test LBD variants using disease-specific literature or with LBD lacking subject-
matter restrictions. (B) Define the ability of robust LBD-informed GCM to validate drug repurposing candidates
for treating AD symptoms. We will test the ability of advanced methods to iteratively resolve hidden latent
confounding, when detected, to improve effect estimates. The fulfillment of these aims will yield new methods
to combine insights from the literature with causal modeling to uncover causal relationships of drug exposures
on adverse events and on beneficial outcomes.

👤 Principal Investigator(s) (from NIH)

Scott Alexander Malec

🏛️ Recipient Organization (from NIH)

UNIVERSITY OF NEW MEXICO HEALTH SCIS CTR (ALBUQUERQUE, NM, UNITED STATES)

📅 Project Dates (from NIH)

Start: 2021-08-01T00:00:00
End: 2027-07-31T00:00:00

💰 Award Amount (from NIH)

$248,671

📊 Fiscal Year (from NIH)

2025

🏷️ Activity Code (from NIH)

R00

🔢 Project Number (from NIH)

5R00LM013367-05

🔗 Full Project Record (from NIH)

View complete project details on NIH

Added

2023-11-16

Last modified

2023-11-16

Source:

Scott Alexander Malec | ✓ Enriched from NIH

Using the literature to build causal models of retrospective observational data ✓ NIH

2021-08-01 to 2023-07-31 | Grant

United States National Library of Medicine (Bethesda, US)

Homepage URL: https://app.dimensions.ai/details/grant/grant.9844339

GRANT_NUMBER: K99LM013367

Show more detail

Organization identifiers

United States National Library of Medicine: Bethesda, US

Funding project translated title

Funding project translated title (en)

Using the literature to build causal models of retrospective observational data

📄 Project Abstract (from NIH)

👤 Principal Investigator(s) (from NIH)

Scott Alexander Malec

🏛️ Recipient Organization (from NIH)

UNIVERSITY OF PITTSBURGH AT PITTSBURGH (PITTSBURGH, PA, UNITED STATES)

📅 Project Dates (from NIH)

Start: 2021-08-01T00:00:00
End: 2023-07-31T00:00:00

💰 Award Amount (from NIH)

$68,663

📊 Fiscal Year (from NIH)

2022

🏷️ Activity Code (from NIH)

K99

🔢 Project Number (from NIH)

5K99LM013367-02

🔗 Full Project Record (from NIH)

View complete project details on NIH

Added

2022-07-20

Last modified

2022-07-20

Source:

DimensionsWizard via Scott Alexander Malec | ✓ Enriched from NIH

Using Biomedical Knowledge to Identify Plausible Signals for Pharmacovigilance

2013-09-01 to 2017-08-31 | Grant

United States National Library of Medicine (n/a, US)

Homepage URL: https://grants.uberresearch.com/100000092/R01LM011563/Using-Biomedical-Knowledge-to-Identify-Plausible-Signals-for-Pharmacovigilance

GRANT_NUMBER: grant.R01LM011563

Show more detail

Organization identifiers

United States National Library of Medicine: n/a, US

Funding project translated title

Funding project translated title (en)

Using Biomedical Knowledge to Identify Plausible Signals for Pharmacovigilance

Added

2018-02-06

Last modified

2018-02-06

Source:

DimensionsWizard via Scott Alexander Malec

NLM Training Program in Biomedical Informatics & Data Science for Predoctoral and Postdoctoral Fellows

1992-07-01 to 2018-06-30 | Grant

United States National Library of Medicine (n/a, US)

Homepage URL: https://grants.uberresearch.com/100000092/T15LM007093/NLM-Training-Program-in-Biomedical-Informatics-Data-Science-for-Predoctoral-and-Postdoctoral-Fellows

GRANT_NUMBER: grant.T15LM007093

Show more detail

Organization identifiers

United States National Library of Medicine: n/a, US

Funding project translated title

Funding project translated title (en)

NLM Training Program in Biomedical Informatics & Data Science for Predoctoral and Postdoctoral Fellows

Added

2018-02-06

Last modified

2018-02-06

Source:

DimensionsWizard via Scott Alexander Malec

Education and qualifications (4)

University of Pittsburgh: Pittsburgh, PA, US

2018-08-01 to 2023-07-31 | Postdoctoral Scholar (Department of Biomedical Informatics)

Education

Show more detail

Department

Department of Biomedical Informatics

Added

2018-09-20

Last modified

2024-09-05

Source: Scott Alexander Malec

University of Texas Health Science Center at Houston: Houston, Texas, US

2015-08-25 to 2018-08-15 | PhD (School of Biomedical Informatics)

Education

Show more detail

Department

School of Biomedical Informatics

Added

2018-02-06

Last modified

2018-09-20

Source: Scott Alexander Malec

Carnegie Mellon University: Pittsburgh, PA, US

2005-08-26 to 2010-05-15 | MSIT (CMU)

Education

Show more detail

Organization identifiers

RINGGOLD: 6612

Carnegie Mellon University : Pittsburgh, PA, US

Department

CMU

Added

2018-02-06

Last modified

2018-02-06

Source: Scott Alexander Malec

University of Pittsburgh: Pittsburgh, PA, US

2002-01-01 to 2003-12-13 | MLIS (Department of Library and Information Science)

Education

Show more detail

Organization identifiers

RINGGOLD: 6614

University of Pittsburgh : Pittsburgh, PA, US

Department

Department of Library and Information Science

Added

2018-02-06

Last modified

2018-02-06

Source: Scott Alexander Malec

VarEx: A Large Language Model Pipeline for Automated Extraction of Exposures, Outcomes, and Covariates from Epidemiologic Studies

2026-06-15 | Preprint

DOI: 10.64898/2026.06.13.26355589

Contributors: Manjil M. Pradhan; Rajesh Upadhayaya; Scott A. Malec

Show more detail

Homepage URL

https://doi.org/10.64898/2026.06.13.26355589

Contributors

Manjil M. Pradhan (Author)

Rajesh Upadhayaya (Author)

Scott A. Malec (Author)

External identifiers

DOI: 10.64898/2026.06.13.26355589

Added

2026-06-16

Last modified

2026-06-18

Source:

Crossref

CausalKnowledgeTrace: A Novel Computational Framework for Automated Literature-Based Causal Graph Construction and Evidence-Based Variable Selection in Biomedical Research

2026-05-12 | Preprint

DOI: 10.64898/2026.05.07.723601

Contributors: Rajesh Upadhayaya; Manjil Pradhan; Vincent Metzger; Scott Alexander Malec

Show more detail

Homepage URL

http://dx.doi.org/10.64898/2026.05.07.723601

Contributors

Rajesh Upadhayaya (Author)

Manjil Pradhan (Author)

Vincent Metzger (Author)

Scott Alexander Malec (Author)

External identifiers

DOI: 10.64898/2026.05.07.723601

Abstract

Abstract

Background
Variable selection for causal inference from observational biomedical data is challenging, as overlooking confounders or conditioning on colliders leads to biased estimates. While vast causal knowledge exists in biomedical literature, manually extracting this information for principled variable selection is impractical at scale.

Methods
We developed CausalKnowledgeTrace, a Python-based computational framework with Django web interface that systematically leverages structured causal knowledge from the Semantic MEDLINE Database (SemMedDB) to inform variable selection in causal studies. The system implements a six-stage analysis pipeline using NetworkX for graph operations, including graph parsing, basic analysis, comprehensive cycle detection, systematic generic node removal, post-removal analysis, and formal causal inference with bias detection.

Results
Analysis of the hypertension-Alzheimer’s relationship across three degree neighborhoods (1-3) demonstrated systematic scaling of causal complexity: 361-866 variables, 429-1,442 relationships, with graph densities of 0.0033-0.0019. The analysis revealed complex cyclic structures with 54-606 baseline cycles across degree levels. Processing times ranged from 0.3-1.0 seconds for all three degrees, demonstrating computational efficiency for complex biomedical networks. Key confounders identified across all degrees included inflammation, diabetes, insulin resistance, obesity, and ischemia. In the third degree of graph, the pipeline structurally identified 39 confounders, 11 mediators, and 3 colliders from the causal graph. Among the key identified confounders and mediators—including obesity, oxidative stress, ischemia, and vascular diseases—all were found to have strong supporting evidence in established epidemiological and pathophysiological literature.

Conclusions
CausalKnowledgeTrace provides a scalable, evidence-based approach to causal graph construction that systematically identifies confounders and bias structures often missed by conventional approaches. The Python-Django architecture enables both standalone analysis and integration into larger computational workflows, representing a significant advance in computational support for causal inference in biomedical research.

Statement of Significance

Problem or Issue
Selecting proper confounders and variables for causal inference from observational biomedical datasets is challenging and often biased by limited expertise or manual review.

What is Already Known
Existing approaches rely on domain experts, statistical variable screening, or manual construction of causal graphs, but these often overlook literature-documented confounders and complex biases.

What this Paper Adds
This paper introduces an automated, literature-based framework for synthesizing and validating causal graphs, identifying critical variables and complex bias structures, such as M-bias and butterfly bias, with full evidentiary traceability.

Who would benefit from the new knowledge in this paper?
Epidemiologists, biomedical researchers, informaticians, and clinical investigators seeking reliable and transparent causal modeling for observational studies.

Added

2026-05-25

Last modified

2026-05-25

Source:

Scott Alexander Malec

Detecting Uncoded Self-Harm in Veterans' Electronic Health Records Using Positive and Unlabeled Learning: Retrospective Observational Study.

Journal of medical Internet research

2026-04-19 | Journal article

DOI: 10.2196/89071

Contributors: Kumar P; Viszolay AD; Upadhayaya R; Moomtaheen F; Greer DR (and 12 more)

Show more detail

Homepage URL

https://doi.org/10.2196/89071

Contributors

Kumar P (Author) [ORCID: 0000-0002-4981-9020]

Viszolay AD (Author)

Upadhayaya R (Author) [ORCID: 0009-0000-3045-1089]

Moomtaheen F (Author)

Greer DR (Author)

Bologa CG (Author)

Schneider KA (Author)

Davis SE (Author)

Matheny ME (Author) [ORCID: 0000-0003-3217-4147]

van der Goes D (Author)

Villarreal G (Author)

Zhu Y (Author)

Tohen M (Author)

Malec SA (Author)

Yang JJ (Author)

Fielstein EM (Author)

Lambert CG (Author)

External identifiers

PMID: 42001435

DOI: 10.2196/89071

Abstract

BackgroundSuicide and self-harm remain major public health concerns in the United States. Early identification is critical for effective intervention, yet underdiagnosis and undercoding are common across mental health conditions, and only positive cases are typically labeled in healthcare data. As a result, reliable negative examples are missing. Positive and Unlabeled (PU) learning is well-suited to such data, enabling estimation of phenotype prevalence and identification of undiagnosed individuals at elevated risk for self-harm as well as other mental illnesses.ObjectiveTo identify U.S. Veterans whose self-harm events were not explicitly captured through diagnostic codes in electronic health records (EHRs) and estimate the prevalence of ever self-harm cases among Veterans using a novel PU learning algorithm applicable to undetected mental health diagnoses.MethodsWe performed a retrospective observational study using Veterans Health Administration EHRs (from October 1, 1999 to August 31, 2019), selecting a random 25% sample of 1,329,120 Veterans out of 5,316,480 (1,193,563 males and 135,557 females) with at least 2 years of observation. The study cohort comprised 24,625 veterans with coded self-harm and 1,304,495 uncoded for self-harm, with the mean age for coded individuals 38.39 (SD 12.17) and uncoded individuals 48.76 (SD 15.04). We applied our PULSNAR (Positive Unlabeled Learning Selected Not At Random) algorithm to estimate the proportion of individuals with uncoded self-harm. The selected covariates included age at enrollment and the presence or absence of recorded medical conditions, procedures, and clinical observations throughout the observation period. Four experts (raters) independently reviewed charts of 97 uncoded Veterans, each selected from 1% intervals of calibrated PULSNAR probabilities from 0.01 to 0.97. Agreement was assessed among raters, PULSNAR classifications, and consensus review decisions. Post hoc calibration was used to refine prevalence estimates.ResultsOf the 159,049 covariates in the dataset, the XGBoost model within our PULSNAR framework identified 1,302 (0.82%) as informative for classification. Only 1.85% (24,625/1,329,120) of Veterans had diagnostic codes indicating self-harm events, while PULSNAR estimated an overall prevalence of 10.46% (139,026/1,329,120) by identifying an additional ?=8.77% (114,404/1,304,495) of self-harm cases among the uncoded population. Of the 97 chart-reviewed patients, 39 had documented but uncoded self-harm. PULSNAR estimates were post hoc calibrated such that their sum over the 97 cases equaled 39, which resulted in PULSNAR adjusted coded and imputed estimation of 7.91% (105,133/1,329,120). When applied to the 1.3M Veterans, PULSNAR suggests that coded self-harm represents only 23.4% (95% CI: 17.76% to 31.51%) of all documented (coded + notes) self-harm.ConclusionsUnder the Selected Not At Random assumption, PULSNAR provides an innovative and scalable framework for estimating the clinically documented prevalence of mental health conditions and identifying the uncoded individuals with calibrated prediction, without requiring confirmed negative labels. This method offers an alternative to time-consuming chart reviews for detecting likely cases missing structured coding capture. By addressing diagnostic undercoding of mental health conditions in EHRs, this approach has the potential to enhance the estimation of mental health prevalence and support screening, activation of automated clinical decision support, targeted intervention, better resource allocation, and research to improve outcomes in real-world settings.Clinicaltrial

Added

2026-04-27

Last modified

2026-04-27

Source:

Scott Alexander Malec

Computational Tools for Target Illumination and Early Stage Drug Discovery: DrugCentral, Badapple, Smartsfilter, TICTAC, TIN-X, TIGA, CKT, and more…

Zenodo

2026-03-08 | Conference poster

DOI: 10.5281/zenodo.18905711

Contributors: Jack Ringer; Bivek Sharma Panthi; Bat Ochir Artur; Jeremiah I Abok; Rajesh Upadhayaya (and 9 more)

Show more detail

Contributors

Jack Ringer (Author) [ORCID: 0009-0008-8493-0139]

Bivek Sharma Panthi (Author)

Bat Ochir Artur (Author)

Jeremiah I Abok (Author) [ORCID: 0000-0003-0119-9181]

Rajesh Upadhayaya (Author) [ORCID: 0009-0000-3045-1089]

Manjil M Pradhan (Author)

Vincent T Metzger (Author) [ORCID: 0000-0002-8041-0370]

Scott Alexander Malec (Author) [ORCID: 0000-0003-1696-1781]

Ian Watson (Author) [ORCID: 0000-0002-3086-9845]

Kerry Fowler (Author)

Cristian G Bologa (Author) [ORCID: 0000-0003-2232-4244]

Tudor I Oprea (Author) [ORCID: 0000-0002-6195-6976]

Christophe Gerard Lambert (Author) [ORCID: 0000-0003-1994-2893]

Jeremy J Yang (Author) [ORCID: 0000-0002-1476-6192]

External identifiers

DOI: 10.5281/zenodo.18905711

Abstract

Presented at the OpenEye - Cadence Molecular Sciences - CUP XXV Meeting, Santa Fe, New Mexico, March 10-12, 2026. New and improved and empowered web apps, APIs, and other computational tools, for target illumination and other early stage drug discovery applications, developed by the Translational Informatics Division, UNM School of Medicine Department of Internal Medicine, with support from many academic, government, and industrial colleagues.

Added

2026-03-08

Last modified

2026-03-08

Source:

DataCite

Predicting Alzheimer’s Disease Diagnosis, a Decade or more Years before Onset using the Electronic Health Record and Random Forest Machine Learning Models

2025-11-06 | Preprint

DOI: 10.1101/2025.11.04.25338396

Contributors: Sanya B. Taneja; Richard D. Boyce; Scott A. Malec; Steven M. Albert; C. Elizabeth Shaaban (and 9 more)

Show more detail

Homepage URL

https://doi.org/10.1101/2025.11.04.25338396

Contributors

Sanya B. Taneja (Author)

Richard D. Boyce (Author)

Scott A. Malec (Author)

Steven M. Albert (Author)

C. Elizabeth Shaaban (Author)

Arthur S. Levine (Author)

Paul Munro (Author)

Jiang Bian (Author)

Jie Xu (Author)

Demetrius Maraganore (Author)

Karen Schliep (Author)

Jonathan C. Silverstein (Author)

Michelle Kienholz (Author)

Helmet T. Karim (Author)

External identifiers

DOI: 10.1101/2025.11.04.25338396

Added

2025-11-07

Last modified

2025-11-12

Source:

Crossref

Unsupervised Latent Pattern Analysis for Estimating Type 2 Diabetes Risk in Undiagnosed Populations

2025-10-12 | Conference paper

DOI: 10.1145/3765612.3767225

Contributors: Praveen Kumar; Vincent T. Metzger; Scott Alexander Malec

Show more detail

Homepage URL

https://doi.org/10.1145/3765612.3767225

Contributors

Praveen Kumar (Author)

Vincent T. Metzger (Author)

Scott Alexander Malec (Author)

External identifiers

DOI: 10.1145/3765612.3767225

Added

2025-12-10

Last modified

2025-12-10

Source:

Crossref

Detecting Opioid Use Disorder in Health Claims Data With Positive Unlabeled Learning

IEEE Journal of Biomedical and Health Informatics

2025-02 | Journal article

DOI: 10.1109/JBHI.2024.3515805

Contributors: Praveen Kumar; Fariha Moomtaheen; Scott A. Malec; Jeremy J. Yang; Cristian G. Bologa (and 9 more)

Show more detail

Homepage URL

https://doi.org/10.1109/JBHI.2024.3515805

Contributors

Praveen Kumar (Author)

Fariha Moomtaheen (Author)

Scott A. Malec (Author)

Jeremy J. Yang (Author)

Cristian G. Bologa (Author)

Kristan A Schneider (Author)

Yiliang Zhu (Author)

Mauricio Tohen (Author)

Gerardo Villarreal (Author)

Douglas J. Perkins (Author)

Elliot M. Fielstein (Author)

Sharon E. Davis (Author)

Michael E. Matheny (Author)

Christophe G. Lambert (Author)

External identifiers

DOI: 10.1109/JBHI.2024.3515805

Added

2025-02-10

Last modified

2025-02-19

Source:

Crossref

Unsupervised Latent Pattern Analysis for Estimating Type 2 Diabetes Risk in Undiagnosed Populations

Arxiv

2025 | Other

DOI: 10.48550/arXiv.2505.21824

Contributors: Kumar, P.; Metzger, V.T.; Malec, S.A.

Show more detail

Homepage URL

https://www.scopus.com/inward/record.url?eid=2-s2.0-105008088523&partnerID=MN8TOARS

Contributors

Kumar, P. (Author)

Metzger, V.T. (Author)

Malec, S.A. (Author)

External identifiers

DOI: 10.48550/arXiv.2505.21824

EID: 2-s2.0-105008088523

ISSN: 23318422

Added

2026-06-24

Last modified

2026-06-24

Source:

Scopus - Elsevier

An open source knowledge graph ecosystem for the life sciences

Scientific Data

2024 | Journal article

DOI: 10.1038/s41597-024-03171-w

Contributors: Callahan, T.J.; Tripodi, I.J.; Stefanski, A.L.; Cappelletti, L.; Taneja, S.B. (and 27 more)

Show more detail

Homepage URL

https://www.scopus.com/inward/record.url?eid=2-s2.0-85190165835&partnerID=MN8TOARS

Contributors

Callahan, T.J. (Author)

Tripodi, I.J. (Author)

Stefanski, A.L. (Author)

Cappelletti, L. (Author)

Taneja, S.B. (Author)

Wyrwa, J.M. (Author)

Casiraghi, E. (Author)

Matentzoglu, N.A. (Author)

Reese, J. (Author)

Silverstein, J.C. (Author)

Hoyt, C.T. (Author)

Boyce, R.D. (Author)

Malec, S.A. (Author)

Unni, D.R. (Author)

Joachimiak, M.P. (Author)

Robinson, P.N. (Author)

Mungall, C.J. (Author)

Cavalleri, E. (Author)

Fontana, T. (Author)

Valentini, G. (Author)

Mesiti, M. (Author)

Gillenwater, L.A. (Author)

Santangelo, B. (Author)

Vasilevsky, N.A. (Author)

Hoehndorf, R. (Author)

Bennett, T.D. (Author)

Ryan, P.B. (Author)

Hripcsak, G. (Author)

Kahn, M.G. (Author)

Bada, M. (Author)

Baumgartner, W.A. (Author)

Hunter, L.E. (Author)

External identifiers

DOI: 10.1038/s41597-024-03171-w

EID: 2-s2.0-85190165835

ISSN: 20524463

Added

2026-06-24

Last modified

2026-06-24

Source:

Scopus - Elsevier

An Open-Source Knowledge Graph Ecosystem for the Life Sciences

arXiv

2023 | Preprint

DOI: 10.48550/ARXIV.2307.05727

Contributors: Tiffany J. Callahan; Ignacio J. Tripodi; Adrianne L. Stefanski; Luca Cappelletti; Sanya B. Taneja (and 27 more)

Show more detail

Homepage URL

https://arxiv.org/abs/2307.05727

Contributors

Tiffany J. Callahan (Author)

Ignacio J. Tripodi (Author)

Adrianne L. Stefanski (Author)

Luca Cappelletti (Author)

Sanya B. Taneja (Author)

Jordan M. Wyrwa (Author)

Elena Casiraghi (Author)

Nicolas A. Matentzoglu (Author)

Justin Reese (Author)

Jonathan C. Silverstein (Author)

Charles Tapley Hoyt (Author)

Richard D. Boyce (Author)

Scott A. Malec (Author)

Deepak R. Unni (Author)

Marcin P. Joachimiak (Author)

Peter N. Robinson (Author)

Christopher J. Mungall (Author)

Emanuele Cavalleri (Author)

Tommaso Fontana (Author)

Giorgio Valentini (Author)

Marco Mesiti (Author)

Lucas A. Gillenwater (Author)

Brook Santangelo (Author)

Nicole A. Vasilevsky (Author)

Robert Hoehndorf (Author)

Tellen D. Bennett (Author)

Patrick B. Ryan (Author)

George Hripcsak (Author)

Michael G. Kahn (Author)

Michael Bada (Author)

William A. Baumgartner (Author)

Lawrence E. Hunter (Author)

External identifiers

DOI: 10.48550/ARXIV.2307.05727

Abstract

Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to automatically construct them. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoints and abstraction algorithms), and benchmarks (e.g., prebuilt KGs and embeddings). We evaluate the ecosystem by surveying open-source KG construction methods and analyzing its computational performance when constructing 12 large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.

Added

2023-11-16

Last modified

2025-02-19

Source:

Scott Alexander Malec

Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer's disease

Journal of Biomedical Informatics

2023 | Journal article

DOI: 10.1016/j.jbi.2023.104368

Contributors: Malec, S.A.; Taneja, S.B.; Albert, S.M.; Elizabeth Shaaban, C.; Karim, H.T. (and 4 more)

Show more detail

Homepage URL

https://www.scopus.com/inward/record.url?eid=2-s2.0-85160204184&partnerID=MN8TOARS

Contributors

Malec, S.A. (Author)

Taneja, S.B. (Author)

Albert, S.M. (Author)

Elizabeth Shaaban, C. (Author)

Karim, H.T. (Author)

Levine, A.S. (Author)

Munro, P. (Author)

Callahan, T.J. (Author)

Boyce, R.D. (Author)

External identifiers

DOI: 10.1016/j.jbi.2023.104368

EID: 2-s2.0-85160204184

ISSN: 15320464

Added

2026-06-24

Last modified

2026-06-24

Source:

Scopus - Elsevier

Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: a use case studying depression as a risk factor for Alzheimer's disease

Biorxiv

2022 | Other

DOI: 10.1101/2022.07.18.500549

Contributors: Malec, S.A.; Taneja, S.B.; Albert, S.M.; Shaaban, C.E.; Karim, H.T. (and 4 more)

Show more detail

Homepage URL

https://www.scopus.com/inward/record.url?eid=2-s2.0-85136931452&partnerID=MN8TOARS

Contributors

Malec, S.A. (Author)

Taneja, S.B. (Author)

Albert, S.M. (Author)

Shaaban, C.E. (Author)

Karim, H.T. (Author)

Levine, A.S. (Author)

Munro, P. (Author)

Callahan, T.J. (Author)

Boyce, R.D. (Author)

External identifiers

DOI: 10.1101/2022.07.18.500549

EID: 2-s2.0-85136931452

ISSN: 26928205

Added

2026-06-24

Last modified

2026-06-24

Source:

Scopus - Elsevier

Does clinical data capture modifiable midlife risk factors for Alzheimer’s disease?

Alzheimer's & Dementia

2021-12 | Journal article

DOI: 10.1002/alz.055756

Contributors: C Elizabeth Shaaban; Sanya B Taneja; Kailyn F Witonsky; Scott A Malec; Helmet T Karim (and 5 more)

Show more detail

Homepage URL

http://dx.doi.org/10.1002/alz.055756

Contributors

C Elizabeth Shaaban (Author)

Sanya B Taneja (Author)

Kailyn F Witonsky (Author)

Scott A Malec (Author)

Helmet T Karim (Author)

Sheila Pratt (Author)

Arthur S Levine (Author)

Paul Munro (Author)

Richard D Boyce (Author)

Steven M Albert (Author)

External identifiers

DOI: 10.1002/alz.055756

ISSN: 1552-5260

ISSN: 1552-5279

Added

2023-04-12

Last modified

2025-02-19

Source:

Scott Alexander Malec

Exploring Novel Computable Knowledge in Structured Drug Product Labels.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science

2020-05 | Journal article

Contributors: Malec SA; Boyce RD

Show more detail

Homepage URL

http://europepmc.org/abstract/med/32477661

Contributors

Malec SA (Author)

Boyce RD (Author)

External identifiers

PMID: 32477661

PMC: PMC7233092

Added

2020-07-08

Last modified

2025-02-19

Source:

Europe PubMed Central

Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance

Medrxiv

2020 | Other

DOI: 10.1101/2020.07.08.20113035

Contributors: Malec, S.A.; Bernstam, E.V.; Wei, P.; Boyce, R.D.; Cohen, T.

Show more detail

Homepage URL

https://www.scopus.com/inward/record.url?eid=2-s2.0-85099418719&partnerID=MN8TOARS

Contributors

Malec, S.A. (Author)

Bernstam, E.V. (Author)

Wei, P. (Author)

Boyce, R.D. (Author)

Cohen, T. (Author)

External identifiers

DOI: 10.1101/2020.07.08.20113035

EID: 2-s2.0-85099418719

Added

2026-06-24

Last modified

2026-06-24

Source:

Scopus - Elsevier

Literature-Based Discovery of Confounding in Observational Clinical Data

AMIA Annual Symposium Proceedings AMIA Symposium

2016 | Journal article

Contributors: Malec, S.A.; Wei, P.; Xu, H.; Bernstam, E.V.; Myneni, S. (and 1 more)

Show more detail

Homepage URL

https://www.scopus.com/inward/record.url?eid=2-s2.0-85028853045&partnerID=MN8TOARS

Contributors

Malec, S.A. (Author)

Wei, P. (Author)

Xu, H. (Author)

Bernstam, E.V. (Author)

Myneni, S. (Author)

Cohen, T. (Author)

External identifiers

EID: 2-s2.0-85028853045

ISSN: 1942597X

Added

2026-06-24

Last modified

2026-06-24

Source:

Scopus - Elsevier

Literature-Based Discovery of Confounding in Observational Clinical Data.

AMIA Annual Symposium Proceedings

2016 | Journal article

Contributors: Malec SA; Wei P; Xu H; Bernstam EV; Myneni S (and 1 more)

Show more detail

Homepage URL

http://europepmc.org/abstract/med/28269951

Contributors

Malec SA (Author)

Wei P (Author)

Xu H (Author)

Bernstam EV (Author)

Myneni S (Author)

Cohen T (Author)

External identifiers

PMID: 28269951

PMC: PMC5333204

Added

2018-02-06

Last modified

2025-02-19

Source:

Europe PubMed Central

Propp Revisited: Integration of Linguistic Markup into Structured Content Descriptors of Tales

Digital Humanities 2010

2010-07 | Conference paper

Show more detail

Added

2018-02-07

Last modified

2025-02-19

Source:

Scott Alexander Malec

Integration of Linguistic Markup into Semantic Models of Folk Narratives: The Fairy Tale Use Case.

Unpublished

2010 | Conference paper

DOI: 10.13140/2.1.2365.2801

Contributors: Piroska Lendvai; Thierry Declerck; Sándor Darányi; Pablo Gervás; Raquel Hervás (and 2 more)

Show more detail

Contributors

Piroska Lendvai (Author)

Thierry Declerck (Author)

Sándor Darányi (Author)

Pablo Gervás (Author)

Raquel Hervás (Author)

Scott Malec (Author)

Federico Peinado (Author)

External identifiers

DOI: 10.13140/2.1.2365.2801

Added

2018-02-06

Last modified

2025-10-09

Source:

DataCite

Integration of linguistic markup into semantic models of folk narratives: The fairy tale use case

Proceedings of the 7th International Conference on Language Resources and Evaluation Lrec 2010

2010 | Conference paper

Contributors: Lendvai, P.; Declerck, T.; Darányi, S.; Gervás, P.; Hervás, R. (and 2 more)

Show more detail

Homepage URL

https://www.scopus.com/inward/record.url?eid=2-s2.0-84891136006&partnerID=MN8TOARS

Contributors

Lendvai, P. (Author)

Declerck, T. (Author)

Darányi, S. (Author)

Gervás, P. (Author)

Hervás, R. (Author)

Malec, S. (Author)

Peinado, F. (Author)

External identifiers

EID: 2-s2.0-84891136006

ISBN: 2951740867 9782951740860

Added

2026-06-24

Last modified

2026-06-24

Source:

Scopus - Elsevier

AutoPropp: Toward the Automatic Markup, Classification, and Annotation of Russian Magic Tales

Other

Show more detail

Homepage URL

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.462.2443

Added

2018-02-06

Last modified

2025-02-19

Source:

Scott Alexander Malec

Scott Alexander Malec Assistant Professor

Contact Info

Websites

Research Identifiers

Keywords

Funding (4)

Using the literature to build causal models of retrospective observational data ✓ NIH

Organization identifiers

📄 Project Abstract (from NIH)

👤 Principal Investigator(s) (from NIH)

🏛️ Recipient Organization (from NIH)

📅 Project Dates (from NIH)

💰 Award Amount (from NIH)

📊 Fiscal Year (from NIH)

🏷️ Activity Code (from NIH)

🔢 Project Number (from NIH)

🔗 Full Project Record (from NIH)

Added

Last modified

Using the literature to build causal models of retrospective observational data ✓ NIH

Organization identifiers

Funding project translated title

📄 Project Abstract (from NIH)

👤 Principal Investigator(s) (from NIH)

🏛️ Recipient Organization (from NIH)

📅 Project Dates (from NIH)

💰 Award Amount (from NIH)

📊 Fiscal Year (from NIH)

🏷️ Activity Code (from NIH)

🔢 Project Number (from NIH)

🔗 Full Project Record (from NIH)

Added

Last modified

Using Biomedical Knowledge to Identify Plausible Signals for Pharmacovigilance

Organization identifiers

Funding project translated title

Added

Last modified

NLM Training Program in Biomedical Informatics & Data Science for Predoctoral and Postdoctoral Fellows

Organization identifiers

Funding project translated title

Added

Last modified

Education and qualifications (4)

University of Pittsburgh: Pittsburgh, PA, US

Department

Added

Last modified

University of Texas Health Science Center at Houston: Houston, Texas, US

Department

Added

Last modified

Carnegie Mellon University: Pittsburgh, PA, US

Organization identifiers

Department

Added

Last modified

University of Pittsburgh: Pittsburgh, PA, US

Organization identifiers

Department

Added

Last modified

VarEx: A Large Language Model Pipeline for Automated Extraction of Exposures, Outcomes, and Covariates from Epidemiologic Studies

Homepage URL

Contributors

External identifiers

Added

Last modified

CausalKnowledgeTrace: A Novel Computational Framework for Automated Literature-Based Causal Graph Construction and Evidence-Based Variable Selection in Biomedical Research

Homepage URL

Contributors

External identifiers

Abstract

Added

Last modified

Detecting Uncoded Self-Harm in Veterans' Electronic Health Records Using Positive and Unlabeled Learning: Retrospective Observational Study.

Homepage URL

Contributors

External identifiers

Abstract