如何在不循环的情况下将 python JSON 列表转换为数据框列

Question

我正在使用 python 并试图找出如何在不使用循环的情况下执行以下操作。

我有一个包含多个列的数据框，其中一个包含 JSON 个对象列表。我想要做的是将 JSON 字符串列转换为数据框中它们自己的列。例如我有以下数据框：

name	age	group
John	35	[{"testid": "001", "marks": 67}, {"testid": "002", "marks": 70}]
Ann	20	[{"testid": "001", "marks": 75}, {"testid": "002", "marks": 80}, {"testid": "003", "marks": 87}]
Emma	25	[{"testid": "001", "marks": 90}, {"testid": "002", "marks": 99}]

我想按如下方式获得 testid = 001 和 testid = 002 的分数。

name	age	test_id1	test_id2
John	35	67	70
Ann	20	75	80
Emma	25	90	99

这是我的数据集

[
   {
      "name":"John",
      "age":35,
      "group":[
         {
            "testid":"001",
            "marks":67
         },
         {
            "testid":"002",
            "marks":70
         }
      ]
   },
   {
      "name":"Ann",
      "age":20,
      "group":[
         {
            "testid":"001",
            "marks":75
         },
         {
            "testid":"002",
            "marks":80
         },
         {
            "testid":"003",
            "marks":87
         }
      ]
   },
   {
      "name":"Emma",
      "age":25,
      "group":[
         {
            "testid":"001",
            "marks":90
         },
         {
            "testid":"002",
            "marks":99
         }
      ]
   }
]

非常感谢任何想法。谢谢。

Answer 1

查看内联评论。使用 apply() 为您进行迭代。你只需要编写函数即可。

data='''name|age|group
John|35|[{"testid": "001", "marks": 67}, {"testid": "002", "marks": 70}]
Ann|20|[{"testid": "001", "marks": 75}, {"testid": "002", "marks": 80}, {"testid": "003", "marks": 87}]
Emma|25|[{"testid": "001", "marks": 90}, {"testid": "002", "marks": 99}]'''
df = pd.read_csv(io.StringIO(data), sep='|', engine='python')

# create function for apply()
def expand_json(xname, x):
    for i, j in enumerate(json.loads(x), 1):
        # print(i, j)
        col = 'test_id'+str(i)
        # print(col)
        # print(j['marks'])
        df.loc[df.name==xname, col] = j['marks']
        
#dftemp is a throw away so nothing prints to the screen. The function writes to the main df

dftemp = df.apply(lambda x: expand_json(x['name'], x['group']), axis=1)
print(df)

   name  age                                                                                             group  test_id1  test_id2  test_id3
0  John   35                                  [{"testid": "001", "marks": 67}, {"testid": "002", "marks": 70}]    67.000    70.000       NaN
1   Ann   20  [{"testid": "001", "marks": 75}, {"testid": "002", "marks": 80}, {"testid": "003", "marks": 87}]    75.000    80.000    87.000
2  Emma   25                                  [{"testid": "001", "marks": 90}, {"testid": "002", "marks": 99}]    90.000    99.000       NaN

Answer 2

列表推导在提取数据时很方便；作为旁注，如果可以的话，可能会在将类似数据的数据放入数据帧之前进行提取（这样做效率更高）：

outcome = [[entry[num]['marks']
           for num in range(len(entry)) 
           if entry[num]['testid'] in ('001', '002')] 
           for entry in df.group]

print(outcome)
[[67, 70], [75, 80], [90, 99]]

压缩数据，并分配给数据框中的新列名：

test_id1, test_id2 = zip(*outcome)

df.filter(['name', 'age']).assign(test_id1 = test_id1, test_id2 = test_id2)

   name  age  test_id1  test_id2
0  John   35        67        70
1   Ann   20        75        80
2  Emma   25        90        99

如何在不循环的情况下将 python JSON 列表转换为数据框列

How to convert python JSON list to dataframe columns without looping

pandas

dataframe

python-3.8