pandas 将数据框列单元格初始化为空列表
pandas initialize dataframe column cells as empty lists
我需要将DataFrame
列中的单元格初始化为lists
。
df['some_col'] = [[] for _ in no_of_rows]
我想知道在时间效率方面是否有更好的方法?
尝试 apply
:
df1['some_col'] = ''
df1['some_col'] = df1['some_col'].apply(list)
样本:
df1 = pd.DataFrame({'a': pd.Series([1,2])})
print (df1)
a
0 1
1 2
df1['some_col'] = ''
df1['some_col'] = df1['some_col'].apply(list)
print (df1)
a some_col
0 1 []
1 2 []
由于您正在寻找时间效率,低于一些基准。我认为 list
理解创建空的 list
of list
对象已经相当快了,但是你可以使用 itertools.repeat
挤出边际改进。在 insert
部分,apply
慢了 3 倍,因为它循环:
import pandas as pd
from itertools import repeat
df = pd.DataFrame({"A":np.arange(100000)})
%timeit df['some_col'] = [[] for _ in range(len(df))]
100 loops, best of 3: 8.75 ms per loop
%timeit df['some_col'] = [[] for i in repeat(None, len(df))]
100 loops, best of 3: 8.02 ms per loop
%%timeit
df['some_col'] = ''
df['some_col'] = df['some_col'].apply(list)
10 loops, best of 3: 25 ms per loop
我需要将DataFrame
列中的单元格初始化为lists
。
df['some_col'] = [[] for _ in no_of_rows]
我想知道在时间效率方面是否有更好的方法?
尝试 apply
:
df1['some_col'] = ''
df1['some_col'] = df1['some_col'].apply(list)
样本:
df1 = pd.DataFrame({'a': pd.Series([1,2])})
print (df1)
a
0 1
1 2
df1['some_col'] = ''
df1['some_col'] = df1['some_col'].apply(list)
print (df1)
a some_col
0 1 []
1 2 []
由于您正在寻找时间效率,低于一些基准。我认为 list
理解创建空的 list
of list
对象已经相当快了,但是你可以使用 itertools.repeat
挤出边际改进。在 insert
部分,apply
慢了 3 倍,因为它循环:
import pandas as pd
from itertools import repeat
df = pd.DataFrame({"A":np.arange(100000)})
%timeit df['some_col'] = [[] for _ in range(len(df))]
100 loops, best of 3: 8.75 ms per loop
%timeit df['some_col'] = [[] for i in repeat(None, len(df))]
100 loops, best of 3: 8.02 ms per loop
%%timeit
df['some_col'] = ''
df['some_col'] = df['some_col'].apply(list)
10 loops, best of 3: 25 ms per loop