子串字符串列 Pandas Python

Substring string column Pandas Python

我有一个包含两列的 pandas 数据框:票号和历史记录。

History 是具有以下结构的字符串。我需要创建第三列,其中包括将状态从 New 更改为 Open 的作者姓名。可能吗?

[
                {
                    "id": "1,
                    "author": {
                        "name": "user1",
                        "emailAddress": "user1@test.com",
                        "displayName": "user1"
                    },
                    "created": "2021-06-09T12:54:22.915+0000",
                    "items": [
                        {
                            "field": "name",
                            "from": "1",
                            "fromString": null,
                            "to": "2",
                            "toString": "test"
                        }
                    ]
                },
                {
                    "id": "2",
                    "author": {
                       
                        "name": "user2",
                        "emailAdress": "user2@test.com",                       
                        "displayName": "user2"                          
                    },
                    "created": "2021-06-11T09:33:18.692+0000",
                    "items": [
                        {
                            "field": "status",
                            "from": 3,
                            "fromString": "New",
                            "to": "7",
                            "toString": "Open"
                        }
                    ]
                }]

如果您的数据框被命名为 df,历史列(第 2 列)被命名为 history 并且历史列中的项目实际上是 json 具有类似结构的字符串您提供的一个,您可以执行以下操作:

import json

def extract_author(json_string):
    records = json.loads(json_string)
    for record in records:
        items = record['items'][0]
        if (items['field'] == 'status'
            and items['fromString'] == 'New'
            and items['toString'] == 'Open'):
            return record['author']['name']
    return None

df['author'] = df['history'].map(extract_author)