Festival unit selection voice 缺少双音素:# hash
Festival unit selection voice Missing diphone: # hash
一些背景:在尝试构建单位选择语音时,我遵循了此处的步骤:https://github.com/CSTR-Edinburgh/CSTR-Edinburgh.github.io/blob/master/_posts/2016-8-21-Multisyn_unit_selection.md and used a voice definition from here: https://raw.githubusercontent.com/CSTR-Edinburgh/merlin/master/egs/hybrid_synthesis/s1/voice_definition_files/unit_selection/cstr_us_awb_arctic_multisyn.scm。不幸的是,wavs 太吵了,所以我最终手动标记它们并跳过自动标记过程。
声音现在可以了,但还需要一些改进。经常发生的一个错误是节日报告 "Missing diphone" 任何暂停到 phone 的过渡,例如:
festival> (utt.relation.print (SayText "I can say anything I want.") 'Unit)
Missing diphone: #_ay
diphone still missing, backing off: #_ay
backed off: #_ay -> #_ax
diphone still missing, backing off: #_ax
backed off: #_ay -> #_#
diphone still missing, backing off: #_#
backed off: #_ay ->
Missing diphone: ey_eh
Interword so inserting silence.
diphone still missing, backing off: ey_#
backed off: ey_eh -> ax_#
diphone still missing, backing off: ax_#
backed off: ey_eh -> #_#
diphone still missing, backing off: #_#
backed off: ey_eh ->
Missing diphone: #_eh
diphone still missing, backing off: #_eh
backed off: #_eh -> #_ax
diphone still missing, backing off: #_ax
backed off: #_eh -> #_#
diphone still missing, backing off: #_#
backed off: #_eh ->
Missing diphone: t_#
diphone still missing, backing off: t_#
backed off: t_# -> #_#
diphone still missing, backing off: #_#
backed off: t_# ->
我尝试用 pau
和 h#
替换标签中的 sil
和 sp
(来自自动过程)(以便与festival/lib/radio_phones.scm),我也尝试用 #
替换它们,但这并没有改变任何东西。来源 wav/labs 肯定包含上述转换(例如,几个以 "I can" 开头)但节日似乎从未使用过这些。
如何让节日在源数据中使用暂停到 phone 转换?
谢谢!
发生的事情是当我 运行 基于 Multisyn 单元选择的脚本 build_utts 部分失败并跳过,因为手工标记的标签与 Festival 的标签不完全匹配有预测。例如,如果演讲者说 "extreme" 为 eh k s ...
,但 Festival 会计算 ih k s ...
,则 build_utts 脚本将失败并出现如下错误:
align missmatch at ih (0.000000) eh (2.810566)
我手动 运行 每个 utte运行ce 的 build_utts 脚本并相应地调整了标签。如果你像我一样愚蠢到尝试给自己贴上一些对我有帮助的提示:
- 考虑删除任何 phone 闭包,例如
t_cl
或 d_cl
,因为当它试图匹配 时,它们真的会把事情搞砸
确保在每个 utte运行ce 的开始和结束处有一个暂停(即 #
),因为 build_utts 脚本不会报错关于它,但是当 运行 Festival 中的声音时,您会收到如下错误:
-=-=-=-=-=- EST Error -=-=-=-=-=-
{FND} Feature end not defined
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
感谢@NikolayShmyrev 为我指明了正确的方向。他还建议使用 Ossian 而不是 Festival,后者使用 python 而不是 Festival 相当困难的代码。
一些背景:在尝试构建单位选择语音时,我遵循了此处的步骤:https://github.com/CSTR-Edinburgh/CSTR-Edinburgh.github.io/blob/master/_posts/2016-8-21-Multisyn_unit_selection.md and used a voice definition from here: https://raw.githubusercontent.com/CSTR-Edinburgh/merlin/master/egs/hybrid_synthesis/s1/voice_definition_files/unit_selection/cstr_us_awb_arctic_multisyn.scm。不幸的是,wavs 太吵了,所以我最终手动标记它们并跳过自动标记过程。
声音现在可以了,但还需要一些改进。经常发生的一个错误是节日报告 "Missing diphone" 任何暂停到 phone 的过渡,例如:
festival> (utt.relation.print (SayText "I can say anything I want.") 'Unit)
Missing diphone: #_ay
diphone still missing, backing off: #_ay
backed off: #_ay -> #_ax
diphone still missing, backing off: #_ax
backed off: #_ay -> #_#
diphone still missing, backing off: #_#
backed off: #_ay ->
Missing diphone: ey_eh
Interword so inserting silence.
diphone still missing, backing off: ey_#
backed off: ey_eh -> ax_#
diphone still missing, backing off: ax_#
backed off: ey_eh -> #_#
diphone still missing, backing off: #_#
backed off: ey_eh ->
Missing diphone: #_eh
diphone still missing, backing off: #_eh
backed off: #_eh -> #_ax
diphone still missing, backing off: #_ax
backed off: #_eh -> #_#
diphone still missing, backing off: #_#
backed off: #_eh ->
Missing diphone: t_#
diphone still missing, backing off: t_#
backed off: t_# -> #_#
diphone still missing, backing off: #_#
backed off: t_# ->
我尝试用 pau
和 h#
替换标签中的 sil
和 sp
(来自自动过程)(以便与festival/lib/radio_phones.scm),我也尝试用 #
替换它们,但这并没有改变任何东西。来源 wav/labs 肯定包含上述转换(例如,几个以 "I can" 开头)但节日似乎从未使用过这些。
如何让节日在源数据中使用暂停到 phone 转换?
谢谢!
发生的事情是当我 运行 基于 Multisyn 单元选择的脚本 build_utts 部分失败并跳过,因为手工标记的标签与 Festival 的标签不完全匹配有预测。例如,如果演讲者说 "extreme" 为 eh k s ...
,但 Festival 会计算 ih k s ...
,则 build_utts 脚本将失败并出现如下错误:
align missmatch at ih (0.000000) eh (2.810566)
我手动 运行 每个 utte运行ce 的 build_utts 脚本并相应地调整了标签。如果你像我一样愚蠢到尝试给自己贴上一些对我有帮助的提示:
- 考虑删除任何 phone 闭包,例如
t_cl
或d_cl
,因为当它试图匹配 时,它们真的会把事情搞砸
确保在每个 utte运行ce 的开始和结束处有一个暂停(即
#
),因为 build_utts 脚本不会报错关于它,但是当 运行 Festival 中的声音时,您会收到如下错误:-=-=-=-=-=- EST Error -=-=-=-=-=- {FND} Feature end not defined -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
感谢@NikolayShmyrev 为我指明了正确的方向。他还建议使用 Ossian 而不是 Festival,后者使用 python 而不是 Festival 相当困难的代码。