为 pandas 数据框中的每一行应用 linspace
Applying linspace for every row in a pandas dataframe
如果我有一个数据框,例如像这样说 df
date number_range id
0 [2010-01-01, 2010-03-01] [5, 10] 1
1 [2010-02-01, 2010-06-01] [1, 3] 1
2 [2010-07-01, 2010-11-01] [12-50] 1
我想通过查找日期差异然后将 linspace 应用于所有行来将 numpy.linspace 应用于上述内容。例如,第 0 行的日期差异为 2,应用 linspace(5,10,2),第 1 行的差异为 4,应用 linspace(1,3,4)。
final result
-------------
date number_range id linspace
0 [2010-01-01, 2010-03-01] [5, 10] 1 [5, 10]
1 [2010-02-01, 2010-06-01] [1, 3] 1 [1, 1.66667, 2.3333, 3]
2 [2010-07-01, 2010-12-01] [12-50] 1 [12, 21.5, 31, 40.5, 50]
我试过
df.apply(lambda row: np.linspace(row['start_value'], row['end value'], row['diff'])
但我一直收到类型错误,说 'Series' 对象不能被解释为整数......我已经尝试做一个 diff.astype(int) 有同样的错误......不知道去哪里从那里开始。
确保您了解 row
是什么:
In [133]: def foo(row):
...: print(row)
...:
In [134]: df.apply(foo)
0 [2010-01-01, 2010-03-01]
1 [2010-02-01, 2010-06-01]
2 [2010-07-01, 2010-11-01]
Name: date, dtype: object
0 [5, 10]
1 [1, 3]
2 [12, 50]
Name: number_range, dtype: object
0 1
1 1
2 1
Name: id, dtype: int64
Out[134]:
date None
number_range None
id None
dtype: object
In [136]: def foo(row):
...: print(row['start_value'], row['end_value'], row['diff'])
...:
In [137]: df.apply(foo)
Traceback (most recent call last):
...
KeyError: 'start_value'
根据另一个答案中建议的axis=1
:
In [148]: def foo(row):
...: print(type(row))
...: print(row)
...: print(row['date'])
...: print(row['number_range'])
In [149]: df.apply(foo, axis=1)
<class 'pandas.core.series.Series'>
date [2010-01-01, 2010-03-01]
number_range [5, 10]
id 1
Name: 0, dtype: object
['2010-01-01', '2010-03-01']
[5, 10]
<class 'pandas.core.series.Series'>
...
['2010-02-01', '2010-06-01']
[1, 3]
<class 'pandas.core.series.Series'>
....
['2010-07-01', '2010-11-01']
[12, 50]
现在我们可以从 number_range
列中拉出端点:
In [150]: def foo(row):
...: nr = row['number_range']
...: return np.linspace(nr[0],nr[1],3)
...:
In [151]: df.apply(foo, axis=1)
Out[151]:
0 [5.0, 7.5, 10.0]
1 [1.0, 2.0, 3.0]
2 [12.0, 31.0, 50.0]
dtype: object
我可以用一个 linspace
:
生成相同的数字
In [159]: df['number_range'].to_numpy()
Out[159]: array([list([5, 10]), list([1, 3]), list([12, 50])], dtype=object)
In [160]: nr = np.stack(df['number_range'].to_numpy())
In [161]: nr
Out[161]:
array([[ 5, 10],
[ 1, 3],
[12, 50]])
In [162]: np.linspace(nr[:,0],nr[:,1],3).T
Out[162]:
array([[ 5. , 7.5, 10. ],
[ 1. , 2. , 3. ],
[12. , 31. , 50. ]])
我选择 3
用于所有行;我没有试图弄清楚你从哪里得到 2、4 和 5。
您可以将 apply()
与 axis=1
一起使用,如下所示:(if diff = [2,4,5])
df['linspace'] = df.apply(lambda x: np.round(
np.linspace(x['number_range'][0], x['number_range'][1], x['diff']),3),
axis=1)
print(df)
或者首先,您可以创建 start_value
和 end_value
作为您的问题,然后创建 linspace
如下所示:
df[['start_value','end_value']] = pd.DataFrame(df['number_range'].to_list())
df['linspace'] = df.apply(lambda x: np.round(
np.linspace(x['start_value'], x['end_value'], x['diff']),3), axis=1)
print(df)
输出:
date number_range diff linspace
0 [2010-01-01, 2010-03-01] [5, 10] 2 [5.0, 10.0]
1 [2010-02-01, 2010-06-01] [1, 3] 4 [1.0, 1.667, 2.333, 3.0]
2 [2010-07-01, 2010-11-01] [12, 50] 5 [12.0, 21.5, 31.0, 40.5, 50.0]
如果我有一个数据框,例如像这样说 df
date number_range id
0 [2010-01-01, 2010-03-01] [5, 10] 1
1 [2010-02-01, 2010-06-01] [1, 3] 1
2 [2010-07-01, 2010-11-01] [12-50] 1
我想通过查找日期差异然后将 linspace 应用于所有行来将 numpy.linspace 应用于上述内容。例如,第 0 行的日期差异为 2,应用 linspace(5,10,2),第 1 行的差异为 4,应用 linspace(1,3,4)。
final result
-------------
date number_range id linspace
0 [2010-01-01, 2010-03-01] [5, 10] 1 [5, 10]
1 [2010-02-01, 2010-06-01] [1, 3] 1 [1, 1.66667, 2.3333, 3]
2 [2010-07-01, 2010-12-01] [12-50] 1 [12, 21.5, 31, 40.5, 50]
我试过
df.apply(lambda row: np.linspace(row['start_value'], row['end value'], row['diff'])
但我一直收到类型错误,说 'Series' 对象不能被解释为整数......我已经尝试做一个 diff.astype(int) 有同样的错误......不知道去哪里从那里开始。
确保您了解 row
是什么:
In [133]: def foo(row):
...: print(row)
...:
In [134]: df.apply(foo)
0 [2010-01-01, 2010-03-01]
1 [2010-02-01, 2010-06-01]
2 [2010-07-01, 2010-11-01]
Name: date, dtype: object
0 [5, 10]
1 [1, 3]
2 [12, 50]
Name: number_range, dtype: object
0 1
1 1
2 1
Name: id, dtype: int64
Out[134]:
date None
number_range None
id None
dtype: object
In [136]: def foo(row):
...: print(row['start_value'], row['end_value'], row['diff'])
...:
In [137]: df.apply(foo)
Traceback (most recent call last):
...
KeyError: 'start_value'
根据另一个答案中建议的axis=1
:
In [148]: def foo(row):
...: print(type(row))
...: print(row)
...: print(row['date'])
...: print(row['number_range'])
In [149]: df.apply(foo, axis=1)
<class 'pandas.core.series.Series'>
date [2010-01-01, 2010-03-01]
number_range [5, 10]
id 1
Name: 0, dtype: object
['2010-01-01', '2010-03-01']
[5, 10]
<class 'pandas.core.series.Series'>
...
['2010-02-01', '2010-06-01']
[1, 3]
<class 'pandas.core.series.Series'>
....
['2010-07-01', '2010-11-01']
[12, 50]
现在我们可以从 number_range
列中拉出端点:
In [150]: def foo(row):
...: nr = row['number_range']
...: return np.linspace(nr[0],nr[1],3)
...:
In [151]: df.apply(foo, axis=1)
Out[151]:
0 [5.0, 7.5, 10.0]
1 [1.0, 2.0, 3.0]
2 [12.0, 31.0, 50.0]
dtype: object
我可以用一个 linspace
:
In [159]: df['number_range'].to_numpy()
Out[159]: array([list([5, 10]), list([1, 3]), list([12, 50])], dtype=object)
In [160]: nr = np.stack(df['number_range'].to_numpy())
In [161]: nr
Out[161]:
array([[ 5, 10],
[ 1, 3],
[12, 50]])
In [162]: np.linspace(nr[:,0],nr[:,1],3).T
Out[162]:
array([[ 5. , 7.5, 10. ],
[ 1. , 2. , 3. ],
[12. , 31. , 50. ]])
我选择 3
用于所有行;我没有试图弄清楚你从哪里得到 2、4 和 5。
您可以将 apply()
与 axis=1
一起使用,如下所示:(if diff = [2,4,5])
df['linspace'] = df.apply(lambda x: np.round(
np.linspace(x['number_range'][0], x['number_range'][1], x['diff']),3),
axis=1)
print(df)
或者首先,您可以创建 start_value
和 end_value
作为您的问题,然后创建 linspace
如下所示:
df[['start_value','end_value']] = pd.DataFrame(df['number_range'].to_list())
df['linspace'] = df.apply(lambda x: np.round(
np.linspace(x['start_value'], x['end_value'], x['diff']),3), axis=1)
print(df)
输出:
date number_range diff linspace
0 [2010-01-01, 2010-03-01] [5, 10] 2 [5.0, 10.0]
1 [2010-02-01, 2010-06-01] [1, 3] 4 [1.0, 1.667, 2.333, 3.0]
2 [2010-07-01, 2010-11-01] [12, 50] 5 [12.0, 21.5, 31.0, 40.5, 50.0]