子串字符串列 Pandas Python
Substring string column Pandas Python
我有一个包含两列的 pandas 数据框:票号和历史记录。
History 是具有以下结构的字符串。我需要创建第三列,其中包括将状态从 New 更改为 Open 的作者姓名。可能吗?
[
{
"id": "1,
"author": {
"name": "user1",
"emailAddress": "user1@test.com",
"displayName": "user1"
},
"created": "2021-06-09T12:54:22.915+0000",
"items": [
{
"field": "name",
"from": "1",
"fromString": null,
"to": "2",
"toString": "test"
}
]
},
{
"id": "2",
"author": {
"name": "user2",
"emailAdress": "user2@test.com",
"displayName": "user2"
},
"created": "2021-06-11T09:33:18.692+0000",
"items": [
{
"field": "status",
"from": 3,
"fromString": "New",
"to": "7",
"toString": "Open"
}
]
}]
如果您的数据框被命名为 df
,历史列(第 2 列)被命名为 history
并且历史列中的项目实际上是 json 具有类似结构的字符串您提供的一个,您可以执行以下操作:
import json
def extract_author(json_string):
records = json.loads(json_string)
for record in records:
items = record['items'][0]
if (items['field'] == 'status'
and items['fromString'] == 'New'
and items['toString'] == 'Open'):
return record['author']['name']
return None
df['author'] = df['history'].map(extract_author)
我有一个包含两列的 pandas 数据框:票号和历史记录。
History 是具有以下结构的字符串。我需要创建第三列,其中包括将状态从 New 更改为 Open 的作者姓名。可能吗?
[
{
"id": "1,
"author": {
"name": "user1",
"emailAddress": "user1@test.com",
"displayName": "user1"
},
"created": "2021-06-09T12:54:22.915+0000",
"items": [
{
"field": "name",
"from": "1",
"fromString": null,
"to": "2",
"toString": "test"
}
]
},
{
"id": "2",
"author": {
"name": "user2",
"emailAdress": "user2@test.com",
"displayName": "user2"
},
"created": "2021-06-11T09:33:18.692+0000",
"items": [
{
"field": "status",
"from": 3,
"fromString": "New",
"to": "7",
"toString": "Open"
}
]
}]
如果您的数据框被命名为 df
,历史列(第 2 列)被命名为 history
并且历史列中的项目实际上是 json 具有类似结构的字符串您提供的一个,您可以执行以下操作:
import json
def extract_author(json_string):
records = json.loads(json_string)
for record in records:
items = record['items'][0]
if (items['field'] == 'status'
and items['fromString'] == 'New'
and items['toString'] == 'Open'):
return record['author']['name']
return None
df['author'] = df['history'].map(extract_author)