Classification of High Dimensional Small Sample Genetic Data by Forward Maximum Likelihood Ratio Stepwise Logistic Regression

Li-hao WANG, Qiao-han CHU, Yi-ying ZHANG, Kun-ping ZHU

Abstract


Analyzing genetic data has been one of the effective ways in early diagnosis of cancer. However, the method of machine learning usually needs the support of large data. For the small sample and with high dimensional genetic data, the accuracy of classification is generally not ideal by means of machine learning directly. This paper deals with the modeling of classification for high dimensional genetic data with only 62 samples. By extracting the principal components, it shows that the cumulative contribution rate of the first nine principal components can reach 80.73%, but the classification effect based on the principal components is not desirable. For the sake of reduction of the dimensions, the test of maximum Likelihood Ratio is applied for the selection of variables in logistic stepwise regression, which ensures that only the variables with significant influence on classification can enter the model. With the procedure, the final model fits well and is of good predicting performance.

Keywords


Classification, Logistic regression, Likelihood ratio, Principal components


DOI
10.12783/dtcse/icaic2019/29447

Refbacks

  • There are currently no refbacks.