在文本文件中定位动态字符串

Question

问题：

你好，我最近一直在努力编程。我已设法从 Google Speech to Text 接收到以下输出，但我不知道如何从该块中提取数据。

摘录 1：

[VoiceMain]: 成功初始化

{"result":[]} {"result":[{"alternative":[{"transcript":"hello","confidence":0.46152416},{"transcript":"how low" },{"transcript":"how lo"},{"transcript":"how long"},{"transcript":"Polo"}],"final" :true}],"result_index":0}

[VoiceMain]: 成功初始化

{"result":[]} {"result":[{"alternative":[{"transcript":"hello"},{"transcript":"how long"},{"transcript" :"how low"},{"transcript":"howlong"}],"final":true}],"result_index":0}

Objective:

我的目标是从每个块的第一个抄本中提取字符串 "hello"（不带引号）并将其设置为一个变量。当我不知道该短语是什么时，问题就出现了。代替 "hello"，该短语可以是任意长度的字符串。即使它是一个不同的字符串，我仍然想将它设置为与短语 "hello" 设置为相同的变量。

此外，我想提取"confidence"这个词后面的数字。在这种情况下，它是 0.46152416。数据类型与置信度变量无关。置信度变量似乎更难从块中提取，因为它可能存在也可能不存在。如果它不存在，则必须忽略它。但是，如果它存在，则必须检测到它并将其存储为变量。

另请注意，此文本块存储在名为 "CurlOutput.txt".

的文件中

非常感谢与解决此问题相关的所有帮助或建议。

Answer 1

您可以使用正则表达式执行此操作，但我假设您稍后会想在代码中将其用作字典。因此，这里有一个 python 方法来将此结果构建为字典。

import json

with open('CurlOutput.txt') as f:
    lines = f.read().splitlines()
    flag = '{"result":[]} '
    for line in lines: # Loop through each lin in file
        if flag in line: # check if this is a line with data on it
            results = json.loads(line.replace(flag, ''))['result'] # Load data as a dict
            # If you just want to change first index of alternative
            # results[0]['alternative'][0]['transcript'] = 'myNewString'

            # If you want to check all alternative for confidence and transcript
            for result in results[0]['alternative']: # Loop over each alternative
                transcript = result['transcript']
                confidence = None
                if 'confidence' in result:
                    confidence = result['confidence']
                # now do whatever you want with confidence and transcript.

在文本文件中定位动态字符串

Locating a dynamic string in a text file

text

file

extraction

python-3.x