Merging methods of speech visualization

Fagel, Sascha

Treffer 2 von 6

Merging methods of speech visualization

The author presents MASSY, the MODULAR AUDIOVISUAL SPEECH SYNTHESIZER. The system combines two approaches of visual speech synthesis. Two control models are implemented: a (data based) di-viseme model and a (rule based) dominance model where both produce control commands in a parameterized articulation space. Analogously two visualization methods are implemented: an image based (video-realistic) face model and a 3D synthetic head. Both face models can be driven by both the data based and the rule based articulation model. The high-level visual speech synthesis generates a sequence of control commands for the visible articulation. For every virtual articulator (articulation parameter) the 3D synthetic face model defines a set of displacement vectors for the vertices of the 3D objects of the head. The vertices of the 3D synthetic head then are moved by linear combinations of these displacement vectors to visualize articulation movements. For the image based video synthesis a single reference image is deformed to fit the facial properties derived from the control commands. Facial feature points and facial displacements have to be defined for the reference image. The algorithm can also use an image database with appropriately annotated facial properties. An example database was built automatically from video recordings. Both the 3D synthetic face and the image based face generate visual speech that is capable to increase the intelligibility of audible speech. Other well known image based audiovisual speech synthesis systems like MIKETALK and VIDEO REWRITE concatenate pre-recorded single images or video sequences, respectively. Parametric talking heads like BALDI control a parametric face with a parametric articulation model. The presented system demonstrates the compatibility of parametric and data based visual speech synthesis approaches.

Metadaten
Verfasserangaben:	Sascha Fagel
URN:	urn:nbn:de:hebis:30:3-309231
URL:	http://www.zas.gwz-berlin.de/191.html?&L=1%2527%252band%252bchar%28124%29%25252Buser%25252Bchar%28124%29%25253D0%252band%252b%2527%2527%25253D%2527
ISSN:	1435-9588
ISSN:	0947-7055
Titel des übergeordneten Werkes (Englisch):	Speech production and perception : experimental analyses and models / editors Susanne Fuchs, Pascal Perrier and Bernd Pompino-Marschall, Zentrum für Allgemeine Sprachwissenschaft, Sprachtypologie und Universalienforschung (Berlin): ZAS papers in linguistics ; Vol. 40 (2005)
Verlag:	Zentrum für Allgemeine Sprachwissenschaft, Sprachtypologie und Universalienforschung
Verlagsort:	Berlin
Dokumentart:	Teil eines Buches (Kapitel)
Sprache:	Englisch
Datum der Veröffentlichung (online):	14.11.2013
Jahr der Erstveröffentlichung:	2005
Veröffentlichende Institution:	Universitätsbibliothek Johann Christian Senckenberg
Datum der Freischaltung:	14.11.2013
GND-Schlagwort:	Computerlinguistik; Visualisierung; Lautsprache
Jahrgang:	40
Seitenzahl:	14
Erste Seite:	19
Letzte Seite:	32
HeBIS-PPN:	381257908
DDC-Klassifikation:	4 Sprache / 41 Linguistik / 410 Linguistik
Sammlungen:	Linguistik
Linguistik-Klassifikation:	Linguistik-Klassifikation: Nonverbale Kommunikation / Non-verbal communication
	Linguistik-Klassifikation: Computerlinguistik / Computational linguistics
Zeitschriften / Jahresberichte:	ZAS papers in linguistics : ZASPiL / ZASPiL 40 = Speech production and perception : Experimental analyses and models
Übergeordnete Einheit:	urn:nbn:de:hebis:30:3-306823
Lizenz (Deutsch):	Deutsches Urheberrecht

Open Access

Merging methods of speech visualization

Volltext Dateien herunterladen

Metadaten exportieren

Weitere Dienste