StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations

Sen Liu, Yiwei Guo, Xie Chen, Kai Yu

Shanghai Jiao Tong University, China

sen.liu@sjtu.edu.cn

dataset link

Abstract
While acoustic expressiveness has long been studied in expressive text-to-speech (ETTS), the inherent expressiveness in text lacks sufficient attention, especially for ETTS of artistic works. In this paper, we introduce StoryTTS, a highly ETTS dataset that contains rich expressiveness both in acoustic and textual perspective, from the recording of a Mandarin storytelling show. A systematic and comprehensive labeling framework is proposed for textual expressiveness. We analyze and define text expressiveness in StoryTTS to include five distinct dimensions through linguistics, rhetoric, etc. Then we employ a large language model and prompt it with a few manual annotation examples for batch annotation. The resulting corpus contains 61 hours of consecutive and highly prosodic speech equipped with accurate text transcriptions and rich textual expressiveness annotations. Therefore, StoryTTS can aid future ETTS research to fully mine the abundant intrinsic textual and acoustic features. Experiments are conducted to validate that TTS models can generate speech with improved expressiveness when integrating with the annotated textual labels in StoryTTS.
This page is the demo of audio samples for our paper.

Contents


Part I: The StoryTTS dataset


Chinese text 公孙弘冷笑了一声,心说:“还起得好呢,这回可让我们儒家找出破绽来了!”
LLM Annotations Sentence Pattern Scene Rhetorical Device Emotional Color Imitated Character
Declarative sentence Inner monologue None Pride and ridicule Gongsun Hong (middle-aged men)
GT Audio

Chinese text “难道你是平原郡门下的鸡鸣狗盗之徒吗?啊?”
LLM Annotations Sentence Pattern Scene Rhetorical Device Emotional Color Imitated Character
Interrogative sentence Role-playing Rhetorical question Mockery and questioning Dongfang Shuo (Protagonist 1)
GT Audio

Chinese text “多好看呐!这大刀一举,咔嚓一下,这人头就掉下来了!”
LLM Annotations Sentence Pattern Scene Rhetorical Device Emotional Color Imitated Character
Exclamatory sentence Role-playing Onomatopoeia Excitement and curiosity Huo Qubing (Young men)
GT Audio

Chinese text “快坐下,你别逞能了。”
LLM Annotations Sentence Pattern Scene Rhetorical Device Emotional Color Imitated Character
Imperative sentence Role-playing None Worry and blame Wei Qing (Protagonist 2)
GT Audio

Chinese text 上回书说到大侠郭解,迁居茂陵。
LLM Annotations Sentence Pattern Scene Rhetorical Device Emotional Color Imitated Character
Declarative sentence Aside None Emotionless Color Aside (Aside)
GT Audio

Part II: Speech synthesis expressiveness comparison


Chinese Text: 这郭三可不干了,“皇上,您听他可骂人哪!”
GT(Reconstructed) Baseline + Sentence Pattern + Scene
+ Rhetoric Device + Emotion Color + Imitated Character +ALL

Chinese Text: 这时候东方蟹在旁边也学着杨得道的话:“爹爹,当心呐!”
GT(Reconstructed) Baseline + Sentence Pattern + Scene
+ Rhetoric Device + Emotion Color + Imitated Character +ALL

Chinese Text: “东方大人,你这是什么意思?” “什么意思?你认识!这是皇上赐给我的宝剑,我随时可以要你的命!”
GT(Reconstructed) Baseline + Sentence Pattern + Scene
+ Rhetoric Device + Emotion Color + Imitated Character +ALL

Chinese Text: 武帝这时候传话了:“得意呀。” “奴才在!” “请东方朔!”"
GT(Reconstructed) Baseline + Sentence Pattern + Scene
+ Rhetoric Device + Emotion Color + Imitated Character +ALL

Chinese Text: “皇上我求求您了我可不跟他玩?” “怎么了,难道说东方朔会把你吃了吗?”
GT(Reconstructed) Baseline + Sentence Pattern + Scene
+ Rhetoric Device + Emotion Color + Imitated Character +ALL


Part III: Case Study


Chinese Text: “皇上,您说吧,奴才我这等着呢。”
GT(Reconstructed) Baseline +ALL

Chinese Text: “金子你得给我。” “不行,我就是不给!”
GT(Reconstructed) Baseline +ALL