| 单宇翔*,高扬华*,彭德玲**,江棨***,潘英奇**,孙国道**.面向特征选择的交互式可视分析系统设计与实现[J].高技术通讯(中文),2026,36(4):409~422 |
| 面向特征选择的交互式可视分析系统设计与实现 |
| Design and implementation of an interactive visual analytics system for feature selection |
| |
| DOI:10. 3772 / j. issn. 1002 - 0470. 2026. 04. 008 |
| 中文关键词: 特征选择; 可视分析; 多变量时间序列预测; 模型相似性度量; 分层节点链接图 |
| 英文关键词: feature selection, visual analytics, multivariate time series forecasting, model similarity measurement, hierarchical node-link diagram |
| 基金项目: |
| 作者 | 单位 | | 单宇翔* | (*浙江中烟工业有限责任公司杭州 310009)
(**浙江工业大学计算机科学与技术学院杭州 310023)
(***浙江科技大学计算机科学与技术学院杭州 310023) | | 高扬华* | | | 彭德玲** | | | 江棨*** | | | 潘英奇** | | | 孙国道** | |
|
| 摘要点击次数: 37 |
| 全文下载次数: 25 |
| 中文摘要: |
| 特征选择作为机器学习建模的关键环节,其质量直接影响模型最终的预测性能。然而,针对具体应用场景选择最优特征子集仍具挑战性。现有自动化方法虽能高效筛选特征组合,但缺乏可解释性;而传统人工选择方法虽然可解释性强,却效率低下。为应对上述挑战,本文提出一种面向时序预测任务的特征选取交互式可视分析方法,采用极端梯度提升(extreme gradient boosting,XGBoost)、轻量梯度提升机(light gradient boosting machine,LightGBM)和PathFormer多尺度Transformer模型构建集成基准预测框架,并提出了一种多维度模型相似性度量方法,用于量化特征组合对模型性能的影响。基于此,设计并实现了一个包含数据概览、特征子空间概览、详情视图及预测曲线对比等多个联动视图的可视分析系统,用以支持用户通过迭代探索进行特征重构与模型优化。核心视图采用分层节点链接图,实时记录并可视呈现用户的探索路径,揭示特征迭代过程中模型的演变关系。为解决该视图中连线交叉导致的视觉混乱问题,本文进一步引入基于整数线性规划的布局优化算法,有效提升了布局的清晰度与可读性。案例分析与用户研究验证了所提方法在提高时序预测建模效率与性能方面的有效性。 |
| 英文摘要: |
| Feature selection is a critical preprocessing step in machine learning that directly influences model accuracy, robustness, and interpretability. However, identifying optimal feature subsets from high-dimensional spaces remains a significant challenge. Existing automated methods, while computationally efficient, often lack interpretability, whereas traditional manual approaches offer transparency but suffer from low efficiency. To address this limitation, this paper proposes a human-in-the-loop interactive visual analytics framework for feature selection in time series forecasting tasks. The proposed approach integrates an ensemble prediction model combining XGBoost, LightGBM, and PathFormer, and introduces a multi-dimensional model similarity metric to quantify the impact of feature combinations on prediction performance. Building upon this foundation, we design and implement a visual analytics system featuring coordinated multiple views, including data overview, feature subspace exploration, detail-on-demand views, and prediction curve comparison. The core component employs a hierarchical node-link diagram that dynamically records users’ exploration paths and reveals the evolutionary relationships among models throughout the feature iteration process. To address visual clutter arising from edge crossings, we further incorporate an integer linear programming-based layout optimization algorithm that significantly enhances visualization clarity and readability. Case study and user evaluation demonstrate that the proposed framework effectively improves both modeling efficiency and prediction performance for users with diverse backgrounds in time series forecasting tasks. |
|
查看全文
查看/发表评论 下载PDF阅读器 |
| 关闭 |
|
|
|