Multiple Imputation by Chained Equations for Social Data

Wei HAN, Chuan-jun JI, Yun-wen CHEN, Ji-feng HUANG, Wu-xiong ZHANG

Abstract


Most of the machine learning techniques requires high level data integrity to achieve ideal performance. However, in real application scenario with complex conditions, especially in social network, missing data is such a general problem that has certain impact on the effect of social behavior data mining using machine learning. To dress the missing data problem in social data mining, we proposed to combine Multiple Imputation by Chained Equations (MICE) with Random Forest algorithm, which applied imputation on missing data and then conducted model training and predicting. Experiments were conducted on the Titanic Passengers dataset for survival prediction. Experimental results showed that, comparing to the original Random Forest and other imputation methods combined with Random Forest, our approach achieved the best performance.

Keywords


Multiple Imputation by Chained Equations; Random Forest; Social Network; Missing Data


DOI
10.12783/dtcse/cst2017/12598

Refbacks

  • There are currently no refbacks.