Python 计算嵌套字典中特定键的平均值(IBM Watson Speech to Text API 结果)

Python calculate mean values of specific key in a nested dictionary (IBM Watson Speech to Text API results)

我正在跨多个音频文件比较 IBM Watson 文本到语音的基线置信度。我可以使用 pprint(data_response['results'][0]['alternatives'][0]['confidence']) 访问单个记录的置信度,但不能 return 多个置信度。我需要计算整个成绩单的平均置信度。我已经研究过嵌套字典的迭代,但到目前为止我读过的所有地方都说只有 returns 键而不是值。

应该使用什么方法来获得所有置信水平的平均值?

下面是嵌套字典使用精美打印的样子:

{'result_index': 0,
 'results': [{'alternatives': [{'confidence': 0.99, 'transcript': 'hello '}],
              'final': True},
             {'alternatives': [{'confidence': 0.9,
                                'transcript': 'good morning any this is '}],
              'final': True},
             {'alternatives': [{'confidence': 0.59,
                                'transcript': "I'm on a recorded morning "
                                              '%HESITATION today start running '
                                              "yeah it's really good how are "
                                              "you %HESITATION it's one three "
                                              'six thank you so much for '
                                              'asking '}],
              'final': True},
             {'alternatives': [{'confidence': 0.87,
                                'transcript': 'I appreciate this opportunity '
                                              'to get together with you and '
                                              '%HESITATION you know learn more '
                                              'about you your interest in '}],
              'final': True},

您可以使用 statistics.mean 来计算所有置信水平的平均值:

from statistics import mean

data_response = {
    "result_index": 0,
    "results": [
        {
            "alternatives": [{"confidence": 0.99, "transcript": "hello "}],
            "final": True,
        },
        {
            "alternatives": [
                {"confidence": 0.9, "transcript": "good morning any this is "}
            ],
            "final": True,
        },
        {
            "alternatives": [
                {
                    "confidence": 0.59,
                    "transcript": "I'm on a recorded morning "
                    "%HESITATION today start running "
                    "yeah it's really good how are "
                    "you %HESITATION it's one three "
                    "six thank you so much for "
                    "asking ",
                }
            ],
            "final": True,
        },
        {
            "alternatives": [
                {
                    "confidence": 0.87,
                    "transcript": "I appreciate this opportunity "
                    "to get together with you and "
                    "%HESITATION you know learn more "
                    "about you your interest in ",
                }
            ],
            "final": True,
        },
    ],
}

m = mean(
    a["confidence"] for r in data_response["results"] for a in r["alternatives"]
)
print(m)

打印:

0.8375