Pathophysiological significance and therapeutic targeting of germinal center kinase in diffuse large B-cell lymphoma.
Blood 128(2):239 (2016)
Diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin lymphoma, yet 40% to 50% of patients will eventually succumb to their disease, demonstrating a pressing need for novel therapeutic options. Gene expression profiling has identified messenger RNAs that lead to transfo...
Sparse regression and marginal testing using cluster prototypes.
Biostatistics 17(2):364 (2016)
We propose a new approach for sparse regression and marginal testing, for data with correlated features. Our procedure first clusters the features, and then chooses as the cluster prototype the most informative feature in that cluster. Then we apply either sparse regression (lasso) or marginal s...
QnAs with Robert Tibshirani.
PNAS 112(25):7621 (2015)
Biostatistics 16(2):326 (2015)
We consider the scenario where one observes an outcome variable and sets of features from multiple assays, all measured on the same set of samples. One approach that has been proposed for dealing with these type of data is "sparse multiple canonical correlation analysis" (sparse mCCA). All of th...
Molecular subtyping for clinically defined breast cancer subgroups.
Breast Cancer Research (Online Edition) 17(1):29 (2015)
Breast cancer is commonly classified into intrinsic molecular subtypes. Standard gene centering is routinely done prior to molecular subtyping, but it can produce inaccurate classifications when the distribution of clinicopathological characteristics in the study cohort differs from that of the ...
Pancancer analysis of DNA methylation-driven genes using MethylMix.
Genome Biology 16(1):17 (2015)
Aberrant DNA methylation is an important mechanism that contributes to oncogenesis. Yet, few algorithms exist that exploit this vast dataset to identify hypo- and hypermethylated genes in cancer. We developed a novel computational algorithm called MethylMix to identify differentially methylated ...
Quantitative SD-OCT Imaging Biomarkers as Indicators of Age-Related Macular Degeneration Progression.
Investigative Ophthalmology & Visual Science 55(11):7093 (2014)
We developed a statistical model based on quantitative characteristics of drusen to estimate the likelihood of conversion from early and intermediate age-related macular degeneration (AMD) to its advanced exudative form (AMD progression) in the short term (less than 5 years), a crucial task to e...
Active idiotypic vaccination versus control immunotherapy for follicular lymphoma.
Journal of Clinical Oncology 32(17):1797 (2014)
Idiotypes (Ids), the unique portions of tumor immunoglobulins, can serve as targets for passive and active immunotherapies for lymphoma. We performed a multicenter, randomized trial comparing a specific vaccine (MyVax), comprising Id chemically coupled to keyhole limpet hemocyanin (KLH) plus gra...
A multicentre study of primary breast diffuse large B-cell lymphoma in the rituximab era.
British Journal of Haematology 165(3):358 (2014)
Primary breast diffuse large B-cell lymphoma (DLBCL) is a rare subtype of non-Hodgkin lymphoma (NHL) with limited data on pathology and outcome. A multicentre retrospective study was undertaken to determine prognostic factors and the incidence of central nervous system (CNS) relapses. Data was r...
Increasing value and reducing waste in research design, conduct, and analysis.
The Lancet 383(9912):166 (2014)
Correctable weaknesses in the design, conduct, and analysis of biomedical and public health research studies can produce misleading results and waste valuable resources. Small effects can be difficult to distinguish from bias introduced by study design and analyses. An absence of detailed writte...
A shared transcriptional program in early breast neoplasias despite genetic and clinical distinctions.
Genome Biology 15(5):R71 (2014)
The earliest recognizable stages of breast neoplasia are lesions that represent a heterogeneous collection of epithelial proliferations currently classified based on morphology. Their role in the development of breast cancer is not well understood but insight into the critical events at this ear...
Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data.
Statistical Methods in Medical Research 22(5):519 (2013)
We discuss the identification of features that are associated with an outcome in RNA-Sequencing (RNA-Seq) and other sequencing-based comparative genomic experiments. RNA-Seq data takes the form of counts, so models based on the normal distribution are generally unsuitable. The problem is especia...
Classification of patients from time-course gene expression.
Biostatistics 14(1):87 (2013)
Classifying patients into different risk groups based on their genomic measurements can help clinicians design appropriate clinical treatment plans. To produce such a classification, gene expression data were collected on a cohort of burn patients, who were monitored across multiple time points....
Scientific research in the age of omics: the good, the bad, and the sloppy.
Journal of the American Medical Informatics Ass... 20(1):125 (2013)
It has been claimed that most research findings are false, and it is known that large-scale studies involving omics data are especially prone to errors in design, execution, and analysis. The situation is alarming because taxpayer dollars fund a substantial amount of biomedical research, and bec...
Genome-wide measurement of RNA folding energies.
Molecular Cell 48(2):169 (2012)
RNA structural transitions are important in the function and regulation of RNAs. Here, we reveal a layer of transcriptome organization in the form of RNA folding energies. By probing yeast RNA structures at different temperatures, we obtained relative melting temperatures (Tm) for RNA structures...
Normalization, testing, and false discovery rate estimation for RNA-sequencing data.
Biostatistics 13(3):523 (2012)
We discuss the identification of genes that are associated with an outcome in RNA sequencing and other sequence-based comparative genomic experiments. RNA-sequencing data take the form of counts, so models based on the Gaussian distribution are unsuitable. Moreover, normalization is challenging ...
Transcriptional profiling of long non-coding RNAs and novel transcribed regions across a diverse panel of archived human cancers.
Genome Biology 13(8):R75 (2012)
Molecular characterization of tumors has been critical for identifying important genes in cancer biology and for improving tumor classification and diagnosis. Long non-coding RNAs, as a new, relatively unstudied class of transcripts, provide a rich opportunity to identify both functional drivers...
A fused lasso latent feature model for analyzing multi-sample aCGH data.
Biostatistics 12(4):776 (2011)
Array-based comparative genomic hybridization (aCGH) enables the measurement of DNA copy number across thousands of locations in a genome. The main goals of analyzing aCGH data are to identify the regions of copy number variation (CNV) and to quantify the amount of CNV. Although there are many m...
Adaptive index models for marker-based risk stratification.
Biostatistics 12(1):68 (2011)
We use the term "index predictor" to denote a score that consists of K binary rules such as "age > 60" or "blood pressure > 120 mm Hg." The index predictor is the sum of these binary scores, yielding a value from 0 to K. Such indices as often used in clinical studies to stratify population risk:...
Supervised multidimensional scaling for visualization, classification, and bipartite ranking
Computational Statistics & Data Analysis 55(1):789 (2011)
Least squares multidimensional scaling (MDS) is a classical method for representing a
. One seeks a set of configuration points