如何从 Watson Speech-to-Text 输出重建对话?
How to reconstruct a conversation from Watson Speech-to-Text output?
我从 Watson 的 Speech-to-Text 服务获得 JSON 输出,我已将其转换为列表,然后转换为 Pandas 数据框。
我正在尝试确定如何重建类似于以下内容的对话(带时间):
说话者 0:说了这个 [00.01 - 00.12]
演讲者 1:说 [00.12 - 00.22]
说话者 0:说了一些别的事情 [00.22 - 00.56]
我的数据框每个单词都有一行,单词有列,它的 start/end 时间和说话者标签(0 或 1)。
words = [['said', 0.01, 0.06, 0],['this', 0.06, 0.12, 0],['said', 0.12,
0.15, 1],['that', 0.15, 0.22, 1],['said', 0.22, 0.31, 0],['something',
0.31, 0.45, 0],['else', 0.45, 0.56, 0]]
理想情况下,我希望创建的是以下内容,其中将同一位演讲者所说的话组合在一起,并在下一位演讲者介入时被打断:
grouped_words = [[['said','this'], 0.01, 0.12, 0],[['said','that'] 0.12,
0.22, 1],[['said','something','else'] 0.22, 0.56, 0]
更新:根据要求,link 获得的 JSON 文件样本位于 https://github.com/cookie1986/STT_test
将说话人标签加载到 Pandas 数据框以获得漂亮的简单图形视图,然后识别说话人的转变应该非常简单。
speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)
输出:
from speaker to 0 1 2
0 0.01 0 0.06 said 0.01 0.06
1 0.06 0 0.12 this 0.06 0.12
2 0.12 1 0.15 said 0.12 0.15
3 0.15 1 0.22 that 0.15 0.22
4 0.22 0 0.31 said 0.22 0.31
5 0.31 0 0.45 something 0.31 0.45
6 0.45 0 0.56 else 0.45 0.56
从那里,您可以仅识别说话者的变化并通过快速循环折叠数据帧
ChangeSpeaker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']].index
Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
print(counter)
currentindex=ChangeSpeaker[counter]
try:
nextIndex=ChangeSpeaker[counter+1]-1
temp=speakers.loc[currentindex:nextIndex,:]
except:
temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))
您想从临时数据帧中的第一个值(因此是头)获取起点,然后从最后一个值获取终点。此外,要处理最后一个扬声器的情况(通常会出现数组越界错误,您可以使用 try/catch.
输出:
from to speaker transcript
0 0.01 0.12 0 [said, this]
0 0.12 0.22 1 [said, that]
0 0.22 0.56 0 [said, something, else]
这里有完整代码
import json
import pandas as pd
jsonconvo=json.loads("""{
"results": [
{
"alternatives": [
{
"timestamps": [
[
"said",
0.01,
0.06
],
[
"this",
0.06,
0.12
],
[
"said",
0.12,
0.15
],
[
"that",
0.15,
0.22
],
[
"said",
0.22,
0.31
],
[
"something",
0.31,
0.45
],
[
"else",
0.45,
0.56
]
],
"confidence": 0.85,
"transcript": "said this said that said something else "
}
],
"final": true
}
],
"result_index": 0,
"speaker_labels": [
{
"from": 0.01,
"to": 0.06,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.06,
"to": 0.12,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.12,
"to": 0.15,
"speaker": 1,
"confidence": 0.55,
"final": false
},
{
"from": 0.15,
"to": 0.22,
"speaker": 1,
"confidence": 0.55,
"final": false
},
{
"from": 0.22,
"to": 0.31,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.31,
"to": 0.45,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.45,
"to": 0.56,
"speaker": 0,
"confidence": 0.54,
"final": false
}
]
}""")
speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)
ChangeSpeaker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']].index
Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
print(counter)
currentindex=ChangeSpeaker[counter]
try:
nextIndex=ChangeSpeaker[counter+1]-1
temp=speakers.loc[currentindex:nextIndex,:]
except:
temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))
我从 Watson 的 Speech-to-Text 服务获得 JSON 输出,我已将其转换为列表,然后转换为 Pandas 数据框。
我正在尝试确定如何重建类似于以下内容的对话(带时间):
说话者 0:说了这个 [00.01 - 00.12]
演讲者 1:说 [00.12 - 00.22]
说话者 0:说了一些别的事情 [00.22 - 00.56]
我的数据框每个单词都有一行,单词有列,它的 start/end 时间和说话者标签(0 或 1)。
words = [['said', 0.01, 0.06, 0],['this', 0.06, 0.12, 0],['said', 0.12,
0.15, 1],['that', 0.15, 0.22, 1],['said', 0.22, 0.31, 0],['something',
0.31, 0.45, 0],['else', 0.45, 0.56, 0]]
理想情况下,我希望创建的是以下内容,其中将同一位演讲者所说的话组合在一起,并在下一位演讲者介入时被打断:
grouped_words = [[['said','this'], 0.01, 0.12, 0],[['said','that'] 0.12,
0.22, 1],[['said','something','else'] 0.22, 0.56, 0]
更新:根据要求,link 获得的 JSON 文件样本位于 https://github.com/cookie1986/STT_test
将说话人标签加载到 Pandas 数据框以获得漂亮的简单图形视图,然后识别说话人的转变应该非常简单。
speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)
输出:
from speaker to 0 1 2
0 0.01 0 0.06 said 0.01 0.06
1 0.06 0 0.12 this 0.06 0.12
2 0.12 1 0.15 said 0.12 0.15
3 0.15 1 0.22 that 0.15 0.22
4 0.22 0 0.31 said 0.22 0.31
5 0.31 0 0.45 something 0.31 0.45
6 0.45 0 0.56 else 0.45 0.56
从那里,您可以仅识别说话者的变化并通过快速循环折叠数据帧
ChangeSpeaker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']].index
Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
print(counter)
currentindex=ChangeSpeaker[counter]
try:
nextIndex=ChangeSpeaker[counter+1]-1
temp=speakers.loc[currentindex:nextIndex,:]
except:
temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))
您想从临时数据帧中的第一个值(因此是头)获取起点,然后从最后一个值获取终点。此外,要处理最后一个扬声器的情况(通常会出现数组越界错误,您可以使用 try/catch.
输出:
from to speaker transcript
0 0.01 0.12 0 [said, this]
0 0.12 0.22 1 [said, that]
0 0.22 0.56 0 [said, something, else]
这里有完整代码
import json
import pandas as pd
jsonconvo=json.loads("""{
"results": [
{
"alternatives": [
{
"timestamps": [
[
"said",
0.01,
0.06
],
[
"this",
0.06,
0.12
],
[
"said",
0.12,
0.15
],
[
"that",
0.15,
0.22
],
[
"said",
0.22,
0.31
],
[
"something",
0.31,
0.45
],
[
"else",
0.45,
0.56
]
],
"confidence": 0.85,
"transcript": "said this said that said something else "
}
],
"final": true
}
],
"result_index": 0,
"speaker_labels": [
{
"from": 0.01,
"to": 0.06,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.06,
"to": 0.12,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.12,
"to": 0.15,
"speaker": 1,
"confidence": 0.55,
"final": false
},
{
"from": 0.15,
"to": 0.22,
"speaker": 1,
"confidence": 0.55,
"final": false
},
{
"from": 0.22,
"to": 0.31,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.31,
"to": 0.45,
"speaker": 0,
"confidence": 0.55,
"final": false
},
{
"from": 0.45,
"to": 0.56,
"speaker": 0,
"confidence": 0.54,
"final": false
}
]
}""")
speakers=pd.DataFrame(jsonconvo['speaker_labels']).loc[:,['from','speaker','to']]
convo=pd.DataFrame(jsonconvo['results'][0]['alternatives'][0]['timestamps'])
speakers=speakers.join(convo)
ChangeSpeaker=speakers.loc[speakers['speaker'].shift()!=speakers['speaker']].index
Transcript=pd.DataFrame(columns=['from','to','speaker','transcript'])
for counter in range(0,len(ChangeSpeaker)):
print(counter)
currentindex=ChangeSpeaker[counter]
try:
nextIndex=ChangeSpeaker[counter+1]-1
temp=speakers.loc[currentindex:nextIndex,:]
except:
temp=speakers.loc[currentindex:,:]
Transcript=Transcript.append(pd.DataFrame([[temp.head(1)['from'].values[0],temp.tail(1)['to'].values[0],temp.head(1)['speaker'].values[0],temp[0].tolist()]],columns=['from','to','speaker','transcript']))