在 pandas 中创建新列,这是列表中特定位置的值
Create new column in pandas which is the value of specific location from list
我有类似于以下的数据框:
>>>index val
0 5 1231
1 3 741
2 0 132
3 8 912
....
除此之外,我还有以下列表:
lst=['day',605.12,607.34,609.11,611.3,613.45,617.4,618.9,621.2...]
我想在我的数据框中创建新列,因此新列中的值将来自“索引”列中的位置,因此结果应如下所示:
>>>index val value_from_list
0 5 1231 613.45
1 3 741 609.11
2 0 132 'day'
3 8 912 621.2
....
我试过这样做:
df['value_from_list']=lst[df['index']]
但这不正确并给出了错误
TypeError: list indices must be integers or slices, not Series
如何根据索引从列表中获取新的列值?
尝试通过 pd.Series()
和 map()
:
df['val_from_list']=df['index'].map(pd.Series(lst))
#you can also use replace() method in place of map()
或
通过 pd.Series()
和 merge()
:
df=df.merge(pd.Series(lst).reset_index(name='val_from_list'),on='index')
我认为在这种情况下使用 apply
函数是最简单的解决方案
import pandas as pd
d = [{"a":1,"b":2},{"a":3,"b":4}]
df = pd.DataFrame(d)
l = [10,20]
df['new'] = df.apply(lambda x: l[x.name], axis=1)
Out[1]:
a b new
0 1 2 10
1 3 4 20
回答我自己关于建议解决方案之间速度差异的问题,确实 map
版本可能是最快的。测试环境:
import pandas as pd
from random import random
from time import time
size = 10000000
test_df = pd.DataFrame([{'index': random(), 'l': random()} for i in range(size)])
test_list = [random() for i in range(size)]
def map_version(df, l):
df['val_from_list']=df['index'].map(pd.Series(l))
def merge_version(df, l):
df=df.merge(pd.Series(l).reset_index(name='val_from_list'),on='index')
def apply_version(df, l):
df['new'] = df.apply(lambda x: l[x.name], axis=1)
start_time = time()
map_version(test_df,test_list)
print("Map Version: ",time()-start_time)
start_time = time()
merge_version(test_df,test_list)
print("Merge Version: ",time()-start_time)
start_time = time()
apply_version(test_df,test_list)
print("Apply Version: ",time()-start_time)
n=10⁸ 的结果:
Map Version: 17.509589910507202
Merge Version: 23.45218276977539
Apply Version: 37.030272483825684
我有类似于以下的数据框:
>>>index val
0 5 1231
1 3 741
2 0 132
3 8 912
....
除此之外,我还有以下列表:
lst=['day',605.12,607.34,609.11,611.3,613.45,617.4,618.9,621.2...]
我想在我的数据框中创建新列,因此新列中的值将来自“索引”列中的位置,因此结果应如下所示:
>>>index val value_from_list
0 5 1231 613.45
1 3 741 609.11
2 0 132 'day'
3 8 912 621.2
....
我试过这样做:
df['value_from_list']=lst[df['index']]
但这不正确并给出了错误
TypeError: list indices must be integers or slices, not Series
如何根据索引从列表中获取新的列值?
尝试通过 pd.Series()
和 map()
:
df['val_from_list']=df['index'].map(pd.Series(lst))
#you can also use replace() method in place of map()
或
通过 pd.Series()
和 merge()
:
df=df.merge(pd.Series(lst).reset_index(name='val_from_list'),on='index')
我认为在这种情况下使用 apply
函数是最简单的解决方案
import pandas as pd
d = [{"a":1,"b":2},{"a":3,"b":4}]
df = pd.DataFrame(d)
l = [10,20]
df['new'] = df.apply(lambda x: l[x.name], axis=1)
Out[1]:
a b new
0 1 2 10
1 3 4 20
回答我自己关于建议解决方案之间速度差异的问题,确实 map
版本可能是最快的。测试环境:
import pandas as pd
from random import random
from time import time
size = 10000000
test_df = pd.DataFrame([{'index': random(), 'l': random()} for i in range(size)])
test_list = [random() for i in range(size)]
def map_version(df, l):
df['val_from_list']=df['index'].map(pd.Series(l))
def merge_version(df, l):
df=df.merge(pd.Series(l).reset_index(name='val_from_list'),on='index')
def apply_version(df, l):
df['new'] = df.apply(lambda x: l[x.name], axis=1)
start_time = time()
map_version(test_df,test_list)
print("Map Version: ",time()-start_time)
start_time = time()
merge_version(test_df,test_list)
print("Merge Version: ",time()-start_time)
start_time = time()
apply_version(test_df,test_list)
print("Apply Version: ",time()-start_time)
n=10⁸ 的结果:
Map Version: 17.509589910507202
Merge Version: 23.45218276977539
Apply Version: 37.030272483825684