- For postdoctoral, graduate, and advanced undergraduate students in Engineering, Sciences, and Medicine, and professionals in industry.
- Spring 2019, Mondays and Wednesdays 11:50am–1:10pm, LCB 115.
- Biotechnologies, e.g., DNA sequencing, for high-throughput acquisition of different types of molecular biological data, e.g., omics, imaging, and patient clinical information.
- Databases, e.g., the Cancer Genome Atlas (TCGA) at the Genomic Data Commons (GDC), and other data sources, e.g., supplementary materials of scientific publications.
- Algorithms, from the generalized singular value decomposition (GSVD) and pseudoinverse projection to the tensor GSVD and regression, neural networks, and deep learning, with a special emphasis on statistics for medicine, e.g., survival analyses.
- Applications toward better understanding of biology and practice of medicine, e.g., personalized cancer diagnostics, prognostics, and therapeutics.
- Proving mathematical theorems and programming symbolic computations.
- Designing algorithms and programming numerical computations.
- Working with databases and modeling biomedical data.
- In-class presentations of scientific journal articles and patents.
- Participation in guest lectures and seminars on campus and discussions of conference reports.
- End-of-class celebration.
- Spring 2019 Calendar
- Safety
- Health, Wellness, and Counseling
- Student Code

Prerequisites: Some experience programming and instructor approval.

100% grade = 30% labs, 30% presentation, 30% class project, 10% class participation; class attendance is required.

Topics:

We will cover concepts in data science and machine learning, and their applications to discovery of principles from biomedical data, with a special emphasis on integration and comparison of different types of data from multiple sources.

Skills:

Activities:

January 7:

- Welcome!

- So much more to discover:

Paper 1: Biparental Inheritance of Mitochondrial DNA in Humans, Luo et al.,

- The Utah origin of the human genome project:

The Alta Summit,

- Data science in personalized medicine:

How Bright Promise in Cancer Testing Fell Apart,

- The singular value decomposition (SVD) in the news:

If You Liked This, You're Sure to Love That,

January 9:

- Slides 1: Examples of high-throughput biotechnologies

- Genomics after the human genome project:

Paper 2: A Vision for the Future of Genomics Research, Collins et al.,

Acknowledgements.

- Example of TCGA data:

Paper 3: Comprehensive Genomic Characterization Defines Human Glioblastoma Genes and Core Pathways, TCGA Research Network,

January 14:

- Slides 2: From data organization to analysis and interpretation

- In-Class Project 1: Download two interrelated omic profiles from TCGA via GDC, e.g., (

January 16:

- Paper 4: Cluster Analysis and Display of Genome-Wide Expression Patterns, Eisen et al.,

Paper 5: Systematic Determination of Genetic Network Architecture, Tavazoie et al.,

Paper 6: GOrilla: A Tool for Discovery and Visualization of Enriched GO Terms in Ranked Gene Lists, Eden et al.,

Paper 7: Discovering Motifs in Ranked Lists of DNA Sequences, Eden et al.,

- In-Class Project 2: Derive the hypergeometric distribution from first combinatorics principles.

January 21:

- Happy Dr. Martin Luther King, Jr. Day!

"Injustice anywhere is a threat to justice everywhere."

January 23:

- Lab 1: The Hypergeometric P-Value, from Mathematics to Data Analysis and Interpretation

Mathematica Code: Lab_1.nb

- Paper 8: GSVD Comparison of Patient-Matched Normal and Tumor aCGH Profiles Reveals Global Copy-Number Alterations Predicting Glioblastoma Multiforme Survival, Lee,* Alpert* et al.,

January 25, Friday, 11:45am–12:45pm, SMBB 2650, in lieu of any one Lab:

- Department of Bioengineering Distinguished Seminar

Owen J. McCarty

January 25, Friday, 2:00–3:00pm, WEB 3780, in lieu of any one Lab:

- Scientific Computing and Imaging (SCI) Institute Distinguished Seminar

Rick L. Stevens

January 28:

- Slides 3: Discovery of data patterns by using the SVD

- Paper 9: Molecular Characterisation of Soft Tissue Tumours: a Gene Expression Study, Nielsen et al.,

Supplement.

February 4:

- Mathematics of the SVD:

February 6:

- Lab 1 Due In-Class

February 11:

- Computation of the SVD:

- Notebook 1: SVD of Symbolic, Synthetic, and Measured Data

Mathematica Code: Notebook_1.nb

February 12:

- Happy International Darwin Day!

Timeline of the human genome project:

From Darwin and Mendel to the human genome project,

February 13:

- In-Class Work on Lab 2:

Compute and visualize the SVD of your data. Interpret your data based upon its SVD. Use at least two different approaches each for preprocessing and sorting your data and for assessing the statistical significance of your interpretation.

February 19, Tuesday, 12:30–1:30pm, HCI Research South 1st Floor Auditorium, in lieu of any one Lab:

- Huntsman Cancer Institute (HCI) and the Department of Oncological Sciences (OncSci) Special Seminar

Orly Alter

February 21, Thursday, 8:00–11:00pm, Utah State Capitol Rotunda, 350 North State Street, Salt Lake City, in lieu of any one Lab:

- 2019 Utah American Cancer Society (ACS) Cancer Action Network (CAN) Day at the Capitol

February 22, Friday, 12:30–1:30pm, WEB 3780, in lieu of any one Lab:

- Scientific Computing and Imaging (SCI) Institute Distinguished Seminar

David J. Odde

February 27:

- Mathematical variations on the SVD and PCA:

Independent component analysis (ICA):

Paper 10: Linear Modes of Gene Expression Determined by Independent Component Analysis, Liebermeister,

- Nonnegative matrix factorization (NMF):

Paper 11: Metagenes and Molecular Pattern Discovery Using Matrix Factorization, Brunet et al.,

February 29:

- Gene H. Golub's Birthday!

Paper 12: Calculating the Singular Values and Pseudo-Inverse of a Matrix, Golub and Kahan,

March 4:

- Slides 4: Data integration by using the pesudoinverse

- Paper 13: Integrative Analysis of Genome-Scale Data by Using Pseudoinverse Projection Predicts Novel Correlation between DNA Replication and RNA Transcription, Alter and Golub,

Paper 14: Reconstructing the Pathways of a Cellular System from Genome-Scale Signals by Using Matrix and Tensor Computations, Alter and Golub,

Paper 15: Distinct Physiological States of

Paper 16: Using Pre-Existing Microarray Datasets to Increase Experimental Power: Application to Insulin Resistance, Daigle et al.,

- Mathematics of the pseudoinverse:

March 5, Tuesday, 10:00am–12:00pm, WEB 3780:

- Lab 2 "Data Clinic"

March 6:

- Slides 5: From correlation to causal coordination by using the pesudoinverse

- Computation of the pseudoinverse:

Notebook 2: Pseudoinverse Projection of Measured Data

Mathematica Code: Notebook_2.nb

March 11 and 13:

- Happy Spring Break!

March 21, Thursday, 10:00am–12:00pm, WEB 3780:

- Lab 2 "Data Clinic"

March 25:

- "State of the Project" Presentations

March 27:

- "State of the Project" Presentations

April 1:

- Slides 6: Comparative Generalized SVD (GSVD)

- Computation of the GSVD:

- Notebook 3: GSVD of Synthetic Data

Mathematica Code: Notebook_3.nb

April 3:

- Paper 17: Generalized Singular Value Decomposition for Comparative Analysis of Genome-Scale Expression Datasets of Two Different Organisms, Alter et al.,

Paper 18: Combining Transcriptional Datasets Using the Generalized Singular Value Decomposition, by Schreiber et al.,

Paper 19: Exploring Metabolic Pathway Disruption in the Subchronic Phencyclidine Model of Schizophrenia with the Generalized Singular Value Decomposition, by Xiao et al.,

Paper 20: Mathematically Universal and Biologically Consistent Astrocytoma Genotype Encodes for Transformation and Predicts Survival Phenotype, Aiello, Ponnapalli, and Alter,

April 8:

- Paper 21: A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms, Ponnapalli et al.,

Paper 22: Multi-Tissue Analysis of Co-expression Networks by Higher-Order Generalized Singular Value Decomposition Identifies Functionally Coherent Transcriptional Modules, by Xiao et al.,

- Paper 23: Tensor GSVD of Patient- and Platform-Matched Tumor and Normal DNA Copy-Number Profiles Uncovers Chromosome Arm-Wide Patterns of Tumor-Exclusive Platform-Consistent Alterations Encoding for Cell Transformation and Predicting Ovarian Cancer Survival, Sankaranarayanan et al.,

Paper 24: TNF-Insulin Crosstalk at the Transcription Factor GATA6 is Revealed by a Model that Links Signaling and Transcriptomic Data Tensors, by Chitforoushzadeh et al.,

April 10 and 15:

- In-Class Work on Lab 3:

Select two or more datasets, and explain how you might compare or integrate these data by using, e.g., pseudoinverse projection or GSVD. Explain also and the mathematical variables, and if possible also the mathematical operations, operations of your integrative or comparative model might mean biologically.

April 16, Tuesday, 10:00am–12:00pm, WEB 3780:

- Class Project "Data Clinic"

Happy Summer Break!

- DNA from xkcd

See you in Fall 2019 in BIOEN 6900-003: Data Science for Bioengineers