该论文来源于网络,本站转载的论文均是优质论文,供学习和研究使用,文中立场与本网站无关,版权和著作权归原作者所有,如有不愿意被转载的情况,请通知我们删除已转载的信息,如果需要分享,请保留本段说明。 摘 要: 针对分类问题中的模型泛化能力,提出了基于Kmeans聚类的XGBoost基分类器集成算法,以提升整体算法的泛化能力。首先,训练数据集获得多个XGBoost模型;然后,通过Kmeans算法对不同模型的实验结果聚类;最后,对每个分类簇中泛化能力最优的分类器进行集成。在对某公司实际分类问题中应用该算法,结果表明,该算法的泛化能力有很大程度的提升。 关键词: Kmeans聚类; XGBoost; 集成算法; 泛化能力 中图分类号:TP391 文獻标识码:A 文章编号:1006-8228(2020)10-12-03 Abstract: Aiming at the model generalization ability of classification problem, a K-means clustering based XGBoost base classifier ensemble algorithm is proposed in this paper to improve the generalization ability of the whole algorithm. Firstly, training data sets to obtain multiple XGBoost models; then clustering the experiment results of different models with K-means algorithm; finally, integrating the classifiers with the best generalization ability in each cluster. The algorithm was applied to practical classification problems, the results show that the generalization ability of the algorithm is greatly improved. Key words: K-means clustering; XGBoost; ensemble algorithm; generalization ability 0 引言 近年来,随着数据科学的不断进步,XGBoost(eXtreme Gradient Boosting)算法被商业、网络、股票分析、电子产品等领域广泛应用[1]。XGBoost是一种在梯度提升算法(GBDT)基础上改进的学习算法[2],其特点为复杂度低、并行效果好、计算精度高[3],但其泛化能力有待提升。本文选择Bagging多模型融合思想, 采用多个XGBoost基分类器,使得每个基分类器只拟合部分样本下的部分特征属性,然后