Data-level Hybrid Strategy Selection for Disk Fault Prediction Model Based on Multivariate Gan


Shuangshuang Yuan, Peng Wu and Yuehui Chen, University of Jinan, China


Data class imbalance is a common problem in classification problems, where minority class samples are often more important and more costly to misclassify in a classification task. Therefore, it is very important to solve the data class imbalance classification problem. The SMART dataset exhibits an evident class imbalance, comprising a substantial quantity of healthy samples and a comparatively limited number of defective samples. This dataset serves as a reliable indicator of the disc's health status. In this paper, we obtain the best balanced disk SMART dataset for a specific classification model by mixing and integrating the data synthesised by multivariate generative adversarial networks (GAN) to balance the disk SMART dataset at the data level; and combine it with genetic algorithms to obtain higher disk fault classification prediction accuracy on a specific classification model.


Disk failure prediction, GAN, Genetic algorithm, Data imbalance, SMART data