Supplemental material for N. M. Bertagnolli, J. A. Drake, J. M. Tennessen and O. Alter, "SVD Identifies Transcript Length Distribution Functions from DNA Microarray Data and Reveals Evolutionary Forces Globally Affecting GBM Metabolism," Public Library of Science (PLoS) One 8 (11), article e78913 (November 2013); doi: 10.1371/journal.pone.0078913.
Highlight.
Abstract:
To search for evolutionary forces that might act upon transcript length, we use the singular value decomposition (SVD) to identify the length distribution functions of sets and subsets of human and yeast transcripts from profiles of mRNA abundance levels across gel electrophoresis migration distances that were previously measured by DNA microarrays. We show that the SVD identifies the transcript length distribution functions as "asymmetric generalized coherent states" from the DNA microarray data and with no a-priori assumptions. Comparing subsets of human and yeast transcripts of the same gene ontology annotations, we find that in both disparate eukaryotes, transcripts involved in protein synthesis or mitochondrial metabolism are significantly shorter than typical, and in particular, significantly shorter than those involved in glucose metabolism. Comparing the subsets of human transcripts that are overexpressed in glioblastoma multiforme (GBM) or normal brain tissue samples from The Cancer Genome Atlas, we find that GBM maintains normal brain overexpression of significantly short transcripts, enriched in transcripts that are involved in protein synthesis or mitochondrial metabolism, but suppresses normal overexpression of significantly longer transcripts, enriched in transcripts that are involved in glucose metabolism and brain activity. These global relations among transcript length, cellular metabolism and tumor development suggest a previously unrecognized physical mode for tumor and normal cells to differentially regulate metabolism in a transcript length-dependent manner. The identified distribution functions support a previous hypothesis from mathematical modeling of evolutionary forces that act upon transcript length in the manner of the restoring force of the harmonic oscillator.



A PDF format file, readable by Adobe Acrobat Reader.
Bertagnolli_et_al_PLoS_One_2013.pdf



A PDF format file, readable by Adobe Acrobat Reader.
Bertagnolli_et_al_PLoS_One_2013_Appendix.pdf



SVD identification of transcript length distribution functions from DNA microarray data.
A PDF format file, readable by Adobe Acrobat Reader. The corresponding Mathematica 8.0.1 code file, executable by Mathematica, is:



Human transcript lengths.
A tab-delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing the profiles of mRNA abundance levels from Hurowitz et al. as well as the Gene Ontology (GO) annotations from Ashburner et al. for the 4,109 human genes with no missing data across 50 agarose gel slices, spanning an electrophoretic migration range of 26–124 mm and the corresponding transcript length range of ≈6,400–500 nt. A transcript is additionally annotated as overexpressed in either the normal brain or the GBM tumor if it is in the group of c=250, 300, …, 500 most expressed among the 4,109 transcripts in at least 20% of the Cancer Genome Atlas (TCGA) normal brain or GBM tumor samples, respectively, from the TCGA Research Network and Verhaak et al.
Yeast transcript lengths.
A tab-delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing the profiles of mRNA abundance levels from Hurowitz and Brown, the GO annotations from Ashburner et al., and DNA damage response annotations from Jelinsky and Samson for the 3,620 Saccharomyces cerevisiae ORFs with no missing data across 30 agarose gel slices, spanning electrophoretic migration of 42–100 mm and transcript lengths of ≈4,500–300 nt.
Human gene lengths.
A tab-delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing the University of California at Santa Cruz (UCSC) human genome browser maximum and minimum gene lengths from Karolchik et al. and Kent et al., and GO annotations from Ashburner et al. for the 11,631 human genes. A gene is additionally annotated as overexpressed in either the normal brain or the GBM tumor if it is in the group of c=250, 300, …, 500 most expressed among the 11,631 genes in at least 20% of the TCGA normal brain or GBM tumor samples, respectively, from the TCGA Research Network and Verhaak et al. The normal brain and the GBM tumor gene expression data sets, reproducing the abundance levels of mRNA transcripts of the 11,631 human genes from ten TCGA normal brain tissue samples and 529 TCGA GBM tumor samples, respectively, are: