Independent Study in Biomedical Informatics (ISBDS)
This document provides ideas for research projects, and links to research plan templates, which are partially completed plans. Template files are available via the ISBDS course GitHub repository. For ISBDS, a research plan template can vary within biomedical science topics, but definitely includes a specific data source, overall problem statement, and methodological approach. Students will be required to complete the template to comprise a preliminary research plan for approval prior to registration. Advisors are invited to contribute research plan templates in their areas of interest and expertise, which may be based on the generic ISBDS Research Plan Template.
General Suggestions
- Review and analysis of an important public dataset.
- Review and analysis of an important public informatics tool.
- Reproducing and extending a published analysis.
- Building a database from public sources for a biomedical topic of interest.
- Adapt approaches, projects, and learning objectives from an existing, MOOC or other online course (e.g. Coursera, edX, Johns Hopkins, Indiana, Stanford, Hasso Plattner), with or without completing the course.
- Respond to an online data science challenge (e.g. Kaggle).
- Building an online app for researchers, clinicians, or patients.
- Create or improve an open source software package.
Bioinformatics
- Network Analysis in Systems Biology (coursera.org)
- Associating genes with diseases.
- Target Illumination GWAS Analytics (TIGA); see paper and repository.
- Knowledge Graph Analytics Platform (KGAP); see paper and repository.
- Biological network visualization., ex: protein-protein interactions.
- Systems Biology; Metabolic engineering for synthetic biology.
- Structure to function.
- RNAseq data:
- Sequence alignment.
Cheminformatics
- PubChem analysis, descriptive or predictive
- ChEMBL analysis, descriptive or predictive
- DrugCentral analysis, descriptive or predictive
- Badapple analysis, descriptive or predictive
Drug Discovery
- Bioactivity prediction by machine learning (see https://predictor.ncats.io/, https://atomscience.org/, https://drugcentral.org/Redial, https://deepchem.io/).
- TEMPLATE: Homology Modeling (adapted from Intro to Biocomputing Unit 2 Assignment 1 and Assignment 2)
- TEMPLATE: Virtual Screening (adapted from Intro to Biocomputing Unit 3 Assignment 1 and Assignment 2)
- Chemical Predictive Modeling (Abhik Seal).
- Knime for Cheminformatics (Abhik Seal).
Medical Informatics
- OHDSI (Observational Health Data Sciences and Informatics): replicate, vary or extend published studies.
- Open Medical Record System (OpenMRS) The global OpenMRS community works together to build the world’s leading open source enterprise electronic medical record system platform. https://wiki.openmrs.org/
- Clinical Data Analysis in R (Abhik Seal)
Computational modeling
- NetLogo (agent-based modeling)
Public Health & Epidemiology
- Public Health: Big Cities Health Coalition (BCHC) and Big Cities Health Inventory (BCHI)
- Healthcare Cost and Utilization: HCUP-US Databases
- HealthData.gov
- SEER-Medicare Health Outcomes Survey (SEER-MHOS) Linked Data Resource Surveillance, Epidemiology & End Results.
- Medicare Provider Utilization and Payment Data https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Physician-and-Other-Supplier.html
- CORD-19, COVID-19 Open Research Dataset (CORD).
- WHO Coronavirus (COVID-19) Dashboard | WHO Coronavirus (COVID-19) Dashboard With Vaccination Data.
- Johns Hopkins Coronavirus Resource Center.
- EMBL-EBI COVID-19 Data Portal
Fitness, Wellness, & Health
- The Open Artificial Pancreas System project OpenAPS.org is an open and transparent effort to make safe and effective basic Artificial Pancreas System (APS) technology widely available to more quickly improve and save as many lives as possible and reduce the burden of Type 1 diabetes. OpenAPS means basic overnight closed loop APS technology is more widely available to anyone with compatible medical devices who is willing to build their own system.
- ResearchKit & CareKit from Apple. CareKit allows developers to build apps that leverage a variety of customizable modules. CareKit apps will let users regularly track care plans, monitor their progress, and share their insights with care teams. CareKit is open source, developers can build upon existing modules and contribute new code to help users world wide create a bigger—and better—picture of their health.
Natural language processing (NLP) and text mining
- PubMed named entity recognition (NER); see JensenLab Tools including Tagger.
- Twitter sentiment analysis
- Clustering by topic modeling
- See code and projects from Jason Timm,
Databases and datasets
- The UCI Machine Learning Repository is a very useful source of datasets including many relevant to biomedicine and health. Most include good metadata. Note that there are many approaches and methods available for data analysis, so “machine learning” is not a requirement and may not be advantageous depending on the goal.
- MHEALTH Dataset Data Set body motion and vital signs.
- Kaggle (over 50,000 public datasets and 400,000 public notebooks),.
- Aggregate Analysis of ClincalTrials.gov (AACT) Database | Clinical Trials Transformation Initiative,
- Hetionet – An integrative network of biomedical knowledge assembled from 29 different databases of genes, compounds, diseases, and more. The network combines over 50 years of biomedical information into a single resource, consisting of 47,031 nodes (11 types) and 2,250,197 relationships (24 types).
- ROBOKOP (Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways): Robokop is a biomedical reasoning system that interacts with many biomedical knowledge sources to answer questions. Robokop is one of several prototype systems under active development with NIH NCATS.
- Drug Central, online drug compendium.
- Illuminating the Druggable Genome (IDG): Pharos and Target Central Resource Database (TCRD).
- The openFDA FDA Adverse Event Reporting System (FAERS) is a database that contains information on adverse event and medication error reports submitted to FDA.
- New Mexico Decedent Image Database
- Embase, a highly versatile, multipurpose and up-to-date biomedical research and literature database.