Python IBM Watson 语音转文本 API 将抄本转换为 CSV
Python IBM Watson Speech to Text API Convert Transcript to CSV
我正在使用 IBM Watson 语音在 Python 中发送文本 API 并将 JSON 响应存储为嵌套字典。我可以使用 pprint(data_response['results'][0]['alternatives'][0]['transcript'])
访问单个记录,但无法打印所有成绩单。我需要将整个成绩单转储到 .csv 中。我已经尝试使用生成器理解使用 中建议我使用 print(a["confidence"] for r in data_response["results"] for a in r["alternatives"])
的相同格式,但我一定不理解生成器理解是如何工作的。
下面是嵌套字典使用精美打印的样子:
{'result_index': 0,
'results': [{'alternatives': [{'confidence': 0.99, 'transcript': 'hello '}],
'final': True},
{'alternatives': [{'confidence': 0.9,
'transcript': 'good morning any this is '}],
'final': True},
{'alternatives': [{'confidence': 0.59,
'transcript': "I'm on a recorded morning "
'%HESITATION today start running '
"yeah it's really good how are "
"you %HESITATION it's one three "
'six thank you so much for '
'asking '}],
'final': True},
{'alternatives': [{'confidence': 0.87,
'transcript': 'I appreciate this opportunity '
'to get together with you and '
'%HESITATION you know learn more '
'about you your interest in '}],
'final': True},
编辑:这是我使用 @SeaChange 的响应将 .pkl 文件列表转换为 .csv 文件的最终解决方案,这有助于仅导出嵌套字典的转录部分。我确信有更有效的方法来转换文件,但它对我的应用程序非常有效。
# set the input path
input_path = "00_data\Watson Responses"
# set the output path
output_path = "00_data\Watson Scripts"
# set the list of all files in the input path with a file ending of pkl
files = [f for f in glob.glob(input_path + "**/*.pkl", recursive=True)]
# open each pkl file, convert the list to a dataframe, and export to a csv
for file in files:
base_name = os.path.basename(file)
f_name, f_ext = os.path.splitext(base_name)
pkl_file = open(join(dirname(__file__), input_path, base_name), 'rb')
data_response = pickle.load(pkl_file)
pkl_file.close()
transcripts = [a["transcript"] for r in data_response["results"] for a in r["alternatives"]]
dataframe = pd.DataFrame(transcripts)
dataframe.to_csv(os.path.join(output_path, f'{f_name}.csv'), index = False, header = False)
transcripts = [a["transcript"] for r in data_response["results"] for a in r["alternatives"]]
这为您提供了所有成绩单的列表。那时它仅取决于您希望如何格式化输出文件。如果您希望每个成绩单都在一个新行上,您可以为此使用 writelines。
我正在使用 IBM Watson 语音在 Python 中发送文本 API 并将 JSON 响应存储为嵌套字典。我可以使用 pprint(data_response['results'][0]['alternatives'][0]['transcript'])
访问单个记录,但无法打印所有成绩单。我需要将整个成绩单转储到 .csv 中。我已经尝试使用生成器理解使用 print(a["confidence"] for r in data_response["results"] for a in r["alternatives"])
的相同格式,但我一定不理解生成器理解是如何工作的。
下面是嵌套字典使用精美打印的样子:
{'result_index': 0,
'results': [{'alternatives': [{'confidence': 0.99, 'transcript': 'hello '}],
'final': True},
{'alternatives': [{'confidence': 0.9,
'transcript': 'good morning any this is '}],
'final': True},
{'alternatives': [{'confidence': 0.59,
'transcript': "I'm on a recorded morning "
'%HESITATION today start running '
"yeah it's really good how are "
"you %HESITATION it's one three "
'six thank you so much for '
'asking '}],
'final': True},
{'alternatives': [{'confidence': 0.87,
'transcript': 'I appreciate this opportunity '
'to get together with you and '
'%HESITATION you know learn more '
'about you your interest in '}],
'final': True},
编辑:这是我使用 @SeaChange 的响应将 .pkl 文件列表转换为 .csv 文件的最终解决方案,这有助于仅导出嵌套字典的转录部分。我确信有更有效的方法来转换文件,但它对我的应用程序非常有效。
# set the input path
input_path = "00_data\Watson Responses"
# set the output path
output_path = "00_data\Watson Scripts"
# set the list of all files in the input path with a file ending of pkl
files = [f for f in glob.glob(input_path + "**/*.pkl", recursive=True)]
# open each pkl file, convert the list to a dataframe, and export to a csv
for file in files:
base_name = os.path.basename(file)
f_name, f_ext = os.path.splitext(base_name)
pkl_file = open(join(dirname(__file__), input_path, base_name), 'rb')
data_response = pickle.load(pkl_file)
pkl_file.close()
transcripts = [a["transcript"] for r in data_response["results"] for a in r["alternatives"]]
dataframe = pd.DataFrame(transcripts)
dataframe.to_csv(os.path.join(output_path, f'{f_name}.csv'), index = False, header = False)
transcripts = [a["transcript"] for r in data_response["results"] for a in r["alternatives"]]
这为您提供了所有成绩单的列表。那时它仅取决于您希望如何格式化输出文件。如果您希望每个成绩单都在一个新行上,您可以为此使用 writelines。