语音驱动的三维高斯人脸运动生成方法

李小娟* **; 陈姝宇*

文章摘要

李小娟* **,陈姝宇*.语音驱动的三维高斯人脸运动生成方法[J].高技术通讯(中文),2026,36(3):221~229

语音驱动的三维高斯人脸运动生成方法

Audio-driven 3D facial motion generation via Gaussian splatting

DOI：10. 3772 / j. issn. 1002 - 0470. 2026. 03. 001

中文关键词: 高斯泼溅; 语音驱动; 参数化人脸模型

英文关键词: Gaussian splatting, audio-driven, parametric facial model

基金项目:

作者	单位
李小娟* **	(* 中国科学院计算技术研究所北京 100190) (** 中国科学院大学北京 100049)
陈姝宇*

摘要点击次数: 376

全文下载次数: 326

中文摘要:

随着数字人技术的不断发展,合成与音频一致的高真实感人脸运动已成为研究热点之一。现有的基于图像的人脸运动合成方法通常局限于拍摄视角,并且在面部细节的表达上效果较差,其关键问题是由于缺少有效的三维(3-dimensional,3D)表达。针对这一问题,本文提出了一种基于语音驱动的三维高斯人脸运动生成方法。首先,在人脸的表示上,本方法结合三维高斯泼溅与参数化人脸模型对动态人脸数据进行三维建模,将高斯表征与网格模型进行关系绑定。其次,在运动生成方面,本方法将音频运动映射为人脸模型顶点上的偏移量,并基于网格变形来实现动态人脸高斯变形。与已有方法相比,本文所提出的高斯人脸运动生成方法在三维一致性和图像质量上表现更佳,并且其生成和渲染效率显著提升。

英文摘要:

With the development of digital human technology, synthesizing realistic facial motions that align with audio has become a significant research focus. Existing image-based facial motion synthesis methods are often limited to specific camera angles and struggle with accurately expressing facial details. The core issue lies in the lack of effective 3D representation. To address this problem, this paper proposes an audio-driven 3D facial motion generation method via Gaussian splatting. The method first combines 3D Gaussian splatting with a parametric facial model to perform 3D modeling of dynamic facial data, establishing a relationship between the Gaussian representation and the mesh model. For motion generation, audio-driven movements are mapped to the vertex displacements on the facial model, and dynamic facial Gaussian deformation is achieved through mesh deformation. Compared to existing methods, the proposed Gaussian-based facial motion generation method demonstrates superior 3D consistency and image quality, along with significantly improved generation and rendering efficiency.

查看全文查看/发表评论下载PDF阅读器

关闭