numpy vectorize np.prod 无法构造具有超过 32 个操作数的 ufunc
numpy vectorize np.prod Cannot construct a ufunc with more than 32 operands
我知道这里有一个类似的问题:Python numpy.vectorize: ValueError: Cannot construct a ufunc with more than 32 operands
但我的情况不同。
我有一个 32 列的 df,你可以通过 运行 下面的代码得到它:
import numpy as np
import pandas as pd
from io import StringIO
dfs = """
M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 age
1 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 1 2 3 4 3.2
2 7 5 4 5 8 3 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 1 2 3 4 4.5
3 4 8 9 3 5 2 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 1 2 3 4 6.7
"""
df = pd.read_csv(StringIO(dfs.strip()), sep='\s+', )
df
根据业务逻辑我构建了一个向量化函数,如果函数的参数总数小于32就可以正常工作:
M=["M0","M1","M2","M3","M4","M5","M6","M7","M8","M9","M10","M11","M12","M13","M14","M15","M16","M17","M18","M19",
"M20","M21","M22","M23","M24","M25","M26","M27","M28","M29"]
def func2(df, M):
return [df[i].values for i in M]
def func(age,*Ms):
newcol=np.prod(Ms[0:age])
return newcol
vfunc = np.frompyfunc(func, len(M)+1, 1)
df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))
为了便于理解,func2只是为了让代码更简洁,它生成了func的所有参数,没有func2的代码将如下所示:
def func(age,M0,M1,M2,...,M29):
newcol=np.prod(Ms[0:age])
return newcol
vfunc = np.frompyfunc(func, 31, 1)
df['newcol']=vfunc(df['age'].values.astype(int), df['M1'].values,...,df['M29'].values)
真正的问题是一旦参数数量等于或大于 32,如下所示:
M=["M0","M1","M2","M3","M4","M5","M6","M7","M8","M9","M10","M11","M12","M13","M14","M15","M16","M17","M18","M19",
"M20","M21","M22","M23","M24","M25","M26","M27","M28","M29","M30"] # M30 is the only difference from the above function
def func2(df, M):
return [df[i].values for i in M]
def func(age,*Ms):
newcol=np.prod(Ms[0:age])
return newcol
vfunc = np.frompyfunc(func, len(M)+1, 1)
df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))
我收到错误:
ValueError Traceback (most recent call last)
<ipython-input-66-9a042ad44f9b> in <module>()
76 return newcol
77
---> 78 vfunc = np.frompyfunc(func, len(M)+1, 1)
79
80 df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))
ValueError: Cannot construct a ufunc with more than 32 operands (requested number were: inputs = 32 and outputs = 1)
在我真正的业务逻辑中,我有超过 100 列需要使用 np.pro 来计算,所以这真的让我很困惑。有朋友可以帮忙吗?
这是一种实现结果的方法。 Select所有filter
的M列,使用where
将列位置高于age列的所有值替换为nan,然后沿列prod
。
df['newcol'] = (
# keep only Mx columns
df.filter(like='M')
# keep only the values when the position of the column
# is less than the age
.where(lambda x: (np.arange(x.shape[1])+1)<df['age'].to_numpy()[:, None])
# multiply all the non-nan values per row
.prod(axis=1)
)
print(df)
我知道这里有一个类似的问题:Python numpy.vectorize: ValueError: Cannot construct a ufunc with more than 32 operands
但我的情况不同。
我有一个 32 列的 df,你可以通过 运行 下面的代码得到它:
import numpy as np
import pandas as pd
from io import StringIO
dfs = """
M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 age
1 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 1 2 3 4 3.2
2 7 5 4 5 8 3 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 1 2 3 4 4.5
3 4 8 9 3 5 2 1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 1 2 3 4 6.7
"""
df = pd.read_csv(StringIO(dfs.strip()), sep='\s+', )
df
根据业务逻辑我构建了一个向量化函数,如果函数的参数总数小于32就可以正常工作:
M=["M0","M1","M2","M3","M4","M5","M6","M7","M8","M9","M10","M11","M12","M13","M14","M15","M16","M17","M18","M19",
"M20","M21","M22","M23","M24","M25","M26","M27","M28","M29"]
def func2(df, M):
return [df[i].values for i in M]
def func(age,*Ms):
newcol=np.prod(Ms[0:age])
return newcol
vfunc = np.frompyfunc(func, len(M)+1, 1)
df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))
为了便于理解,func2只是为了让代码更简洁,它生成了func的所有参数,没有func2的代码将如下所示:
def func(age,M0,M1,M2,...,M29):
newcol=np.prod(Ms[0:age])
return newcol
vfunc = np.frompyfunc(func, 31, 1)
df['newcol']=vfunc(df['age'].values.astype(int), df['M1'].values,...,df['M29'].values)
真正的问题是一旦参数数量等于或大于 32,如下所示:
M=["M0","M1","M2","M3","M4","M5","M6","M7","M8","M9","M10","M11","M12","M13","M14","M15","M16","M17","M18","M19",
"M20","M21","M22","M23","M24","M25","M26","M27","M28","M29","M30"] # M30 is the only difference from the above function
def func2(df, M):
return [df[i].values for i in M]
def func(age,*Ms):
newcol=np.prod(Ms[0:age])
return newcol
vfunc = np.frompyfunc(func, len(M)+1, 1)
df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))
我收到错误:
ValueError Traceback (most recent call last)
<ipython-input-66-9a042ad44f9b> in <module>()
76 return newcol
77
---> 78 vfunc = np.frompyfunc(func, len(M)+1, 1)
79
80 df['newcol']=vfunc(df['age'].values.astype(int), *func2(df,M))
ValueError: Cannot construct a ufunc with more than 32 operands (requested number were: inputs = 32 and outputs = 1)
在我真正的业务逻辑中,我有超过 100 列需要使用 np.pro 来计算,所以这真的让我很困惑。有朋友可以帮忙吗?
这是一种实现结果的方法。 Select所有filter
的M列,使用where
将列位置高于age列的所有值替换为nan,然后沿列prod
。
df['newcol'] = (
# keep only Mx columns
df.filter(like='M')
# keep only the values when the position of the column
# is less than the age
.where(lambda x: (np.arange(x.shape[1])+1)<df['age'].to_numpy()[:, None])
# multiply all the non-nan values per row
.prod(axis=1)
)
print(df)