Classification of High Dimensional Small Sample Genetic Data by Forward Maximum Likelihood Ratio Stepwise Logistic Regression
Abstract
Analyzing genetic data has been one of the effective ways in early diagnosis of cancer. However, the method of machine learning usually needs the support of large data. For the small sample and with high dimensional genetic data, the accuracy of classification is generally not ideal by means of machine learning directly. This paper deals with the modeling of classification for high dimensional genetic data with only 62 samples. By extracting the principal components, it shows that the cumulative contribution rate of the first nine principal components can reach 80.73%, but the classification effect based on the principal components is not desirable. For the sake of reduction of the dimensions, the test of maximum Likelihood Ratio is applied for the selection of variables in logistic stepwise regression, which ensures that only the variables with significant influence on classification can enter the model. With the procedure, the final model fits well and is of good predicting performance.
Keywords
Classification, Logistic regression, Likelihood ratio, Principal components
DOI
10.12783/dtcse/icaic2019/29447
10.12783/dtcse/icaic2019/29447
Refbacks
- There are currently no refbacks.