在 pandas 中创建新列,这是列表中特定位置的值

Create new column in pandas which is the value of specific location from list

我有类似于以下的数据框:

>>>index    val
0    5      1231
1    3      741
2    0      132
3    8      912
....

除此之外,我还有以下列表:

lst=['day',605.12,607.34,609.11,611.3,613.45,617.4,618.9,621.2...]

我想在我的数据框中创建新列,因此新列中的值将来自“索引”列中的位置,因此结果应如下所示:

>>>index    val     value_from_list
0    5      1231       613.45
1    3      741        609.11
2    0      132        'day'
3    8      912        621.2
....

我试过这样做:

df['value_from_list']=lst[df['index']]

但这不正确并给出了错误

TypeError: list indices must be integers or slices, not Series

如何根据索引从列表中获取新的列值?

尝试通过 pd.Series()map():

df['val_from_list']=df['index'].map(pd.Series(lst))
#you can also use replace() method in place of map()

通过 pd.Series()merge():

df=df.merge(pd.Series(lst).reset_index(name='val_from_list'),on='index')

我认为在这种情况下使用 apply 函数是最简单的解决方案

import pandas as pd
d = [{"a":1,"b":2},{"a":3,"b":4}]
df = pd.DataFrame(d)

l = [10,20]

df['new'] = df.apply(lambda x: l[x.name], axis=1)
Out[1]: 
   a  b  new
0  1  2   10
1  3  4   20

回答我自己关于建议解决方案之间速度差异的问题,确实 map 版本可能是最快的。测试环境:

import pandas as pd
from random import random
from time import time
size = 10000000

test_df = pd.DataFrame([{'index': random(), 'l': random()} for i in range(size)])
test_list = [random() for i in range(size)]


def map_version(df, l):
    df['val_from_list']=df['index'].map(pd.Series(l))

def merge_version(df, l):
    df=df.merge(pd.Series(l).reset_index(name='val_from_list'),on='index')

def apply_version(df, l):
    df['new'] = df.apply(lambda x: l[x.name], axis=1)
    
    

start_time = time()
map_version(test_df,test_list)
print("Map Version: ",time()-start_time)
start_time = time()
merge_version(test_df,test_list)
print("Merge Version: ",time()-start_time)
start_time = time()
apply_version(test_df,test_list)
print("Apply Version: ",time()-start_time)

n=10⁸ 的结果:

Map Version:  17.509589910507202
Merge Version:  23.45218276977539
Apply Version:  37.030272483825684