Identification of outliers in gene expression data

Cardinal Scholar

Show simple item record

dc.contributor.advisor Rahamataullaha Imana, E. Eica. Ema. Farazi, Md Manzur Rahman 2015-05-14T13:56:18Z 2015-05-14T13:56:18Z 2015-05-02
dc.description.abstract This work reports the application of techniques that proved useful in analyzing a large gene expression data set. Because it appears likely that genomic instability in cancers can optimize gene expression for cell growth, the differences between normal and tumor expression patterns might help us understand what is being selected for as cancerous tissues evolve. Molecular heterogeneity of cancer, caused by various gene mutations, can yield extensive heterogeneity in gene expression profile in cancer samples. To detect cancer-related genes which are active only in a subset of cancer samples or cancer outliers, recently, Tomlins et al. (2005) argued that traditional analytical methods, for example, a two-sample t-statistic, which search for common activation of genes across a class of cancer samples, will fail to detect cancer genes which show differential expression in a subset of cancer samples or cancer outliers. They developed the “cancer outlier profile analysis” (COPA) method to detect cancer genes with such heterogeneous expression profiles within cancer samples and revealed subtypes of prostate cancer patients defined by recurrent chromosomal aberration. Inspired by the COPA statistic, some authors have proposed other methods for detecting cancer-related genes with cancer outlier profiles in the framework of multiple testing (Tibshirani-2007, Wu-2007, Lian-2008, Wang- 2010). Such cancer outlier analyses are affected from many problems specially if there is any outlier in the data set then classical measures of location and scale are seriously affected. So the test statistic using these parameters might not be appropriate to detect outliers. In this study, we try to robustify some existing methods. We propose three new techniques Expressed robust tstatistic (ERT), Modified Outlier robust t-statistic (MORT) and Least Sum Square of Ordered Subset Robust t-statistic (LSOSRT) for the identification of outliers. The usefulness of the proposed methods is then investigated by Monte Carlo simulation and real cancer data. We find our new methods efficient. en_US
dc.description.sponsorship Department of Mathematical Sciences
dc.subject.lcsh Outliers (Statistics)
dc.subject.lcsh Gene expression -- Statistical methods.
dc.subject.lcsh Cancer -- Genetic aspects -- Statistical methods.
dc.title Identification of outliers in gene expression data en_US Thesis (M.S.) en_US

Files in this item

This item appears in the following Collection(s)

  • Master's Theses [5318]
    Master's theses submitted to the Graduate School by Ball State University master's degree candidates in partial fulfillment of degree requirements.

Show simple item record

Search Cardinal Scholar


My Account