Python IBM Watson 语音转文本 API 将抄本转换为 CSV

Question

我正在使用 IBM Watson 语音在 Python 中发送文本 API 并将 JSON 响应存储为嵌套字典。我可以使用 pprint(data_response['results'][0]['alternatives'][0]['transcript']) 访问单个记录，但无法打印所有成绩单。我需要将整个成绩单转储到 .csv 中。我已经尝试使用生成器理解使用中建议我使用 print(a["confidence"] for r in data_response["results"] for a in r["alternatives"]) 的相同格式，但我一定不理解生成器理解是如何工作的。

下面是嵌套字典使用精美打印的样子：

{'result_index': 0,
 'results': [{'alternatives': [{'confidence': 0.99, 'transcript': 'hello '}],
              'final': True},
             {'alternatives': [{'confidence': 0.9,
                                'transcript': 'good morning any this is '}],
              'final': True},
             {'alternatives': [{'confidence': 0.59,
                                'transcript': "I'm on a recorded morning "
                                              '%HESITATION today start running '
                                              "yeah it's really good how are "
                                              "you %HESITATION it's one three "
                                              'six thank you so much for '
                                              'asking '}],
              'final': True},
             {'alternatives': [{'confidence': 0.87,
                                'transcript': 'I appreciate this opportunity '
                                              'to get together with you and '
                                              '%HESITATION you know learn more '
                                              'about you your interest in '}],
              'final': True},

编辑：这是我使用 @SeaChange 的响应将 .pkl 文件列表转换为 .csv 文件的最终解决方案，这有助于仅导出嵌套字典的转录部分。我确信有更有效的方法来转换文件，但它对我的应用程序非常有效。

# set the input path
input_path = "00_data\Watson Responses"

# set the output path
output_path = "00_data\Watson Scripts"

# set the list of all files in the input path with a file ending of pkl
files = [f for f in glob.glob(input_path + "**/*.pkl", recursive=True)]

# open each pkl file, convert the list to a dataframe, and export to a csv
for file in files:
    base_name = os.path.basename(file)
    f_name, f_ext = os.path.splitext(base_name)
    pkl_file = open(join(dirname(__file__), input_path, base_name), 'rb')
    data_response = pickle.load(pkl_file)
    pkl_file.close()
    transcripts = [a["transcript"] for r in data_response["results"] for a in r["alternatives"]]
    dataframe = pd.DataFrame(transcripts)
    dataframe.to_csv(os.path.join(output_path, f'{f_name}.csv'), index = False, header = False)

Answer 1

transcripts = [a["transcript"] for r in data_response["results"] for a in r["alternatives"]]

这为您提供了所有成绩单的列表。那时它仅取决于您希望如何格式化输出文件。如果您希望每个成绩单都在一个新行上，您可以为此使用 writelines。

writelines

Python IBM Watson 语音转文本 API 将抄本转换为 CSV

Python IBM Watson Speech to Text API Convert Transcript to CSV

python

json

dictionary

export-to-csv

ibm-watson