Python ValueError: Bin edges must be unique
Python ValueError: Bin edges must be unique
在 python 与 pandas 一起工作,我试图将对照组和治疗组分配给不同的客户组。
我有一个大数据集。我不举数据示例,而是向您展示枢轴,因为它总结了最重要的数据。
pd.pivot_table(df,index=['Test Group'],values=["Customer_ID"],aggfunc=lambda x: len(x.unique()))
我得到那些计数
测试组 Customer_ID
Innovators 4634
Early Adopters 2622
Early Majority 8653
Late Majority 7645
Laggards 7645
Lost 4354
Prospective 653
我运行下面的代码:
percentages = {'Innovators':[0.0,1.0],\
'Early Adopters':[0.2,0.8], \
'Early Majority':[0.1,0.9],\
'Late Majority':[0.0,1.0],\
'Laggards':[0.2,0.8],\
'Lost':[0.1,0.9],\
'Prospective':[0.1,0.9]}
def assigner(gp):
...: group = gp['Test Group'].iloc[0]
...: cut = pd.qcut(
np.arange(gp.shape[0]),
q=np.cumsum([0] + percentages[group]),
labels=range(len(percentages[group]))
).get_values()
...: return pd.Series(cut[np.random.permutation(gp.shape[0])], index=gp.index, name='flag')
df['flag'] = df.groupby('Test Group', group_keys=False).apply(assigner)
ValueError: Bin edges must be unique: array([ 0, 0, 2621], dtype=int64).
You can drop duplicate edges by setting the 'duplicates' kwarg
... 并继续出现此错误
我找到了这个答案,可能会有帮助How to qcut with non unique bin edges?;但 rank dows 不适用于 np
def assigner(gp):
...: group = gp['Campaign Test Description'].iloc[0]
...: cut = pd.qcut(
np.arange(gp.shape[0]).rank(method='first'),
q=np.cumsum([0] + percentages[group]),
labels=range(len(percentages[group]))
).get_values()
...: return pd.Series(cut[np.random.permutation(gp.shape[0])], index=gp.index, name='flag')
AttributeError: 'numpy.ndarray' object has no attribute 'rank'
我尝试删除重复项
def assigner(gp):
...: group = gp['Campaign Test Description'].iloc[0]
...: cut = pd.qcut(
np.arange(gp.shape[0]),
q=np.cumsum([0] + percentages[group]),
labels=range(len(percentages[group])),duplicates='drop'
).get_values()
...: return pd.Series(cut[np.random.permutation(gp.shape[0])], index=gp.index, name='flag')
ValueError: Bin labels must be one fewer than the number of bin edges
仍然出现错误
您正在进行 train/test 拆分,这在机器学习中很常用。这是一种方法(仔细检查我的百分比是否正确):
df_pct = pd.DataFrame({ 'ID': ['Innovators','Early Adopters' ,'Early Majority','Late Majority','Laggards','Lost','Prospective'], 'test_cutoff':[1,0.8,0.9,0.1,0.8,0.9,0.9]})
df=df.merge(df_pct)
df['is_test'] = np.random.uniform(0, 1, len(df)) >= df['test_cutoff']
此外,您的 'Late Majority' 百分比加起来不等于 100。
在 python 与 pandas 一起工作,我试图将对照组和治疗组分配给不同的客户组。
我有一个大数据集。我不举数据示例,而是向您展示枢轴,因为它总结了最重要的数据。
pd.pivot_table(df,index=['Test Group'],values=["Customer_ID"],aggfunc=lambda x: len(x.unique()))
我得到那些计数 测试组 Customer_ID
Innovators 4634
Early Adopters 2622
Early Majority 8653
Late Majority 7645
Laggards 7645
Lost 4354
Prospective 653
我运行下面的代码:
percentages = {'Innovators':[0.0,1.0],\
'Early Adopters':[0.2,0.8], \
'Early Majority':[0.1,0.9],\
'Late Majority':[0.0,1.0],\
'Laggards':[0.2,0.8],\
'Lost':[0.1,0.9],\
'Prospective':[0.1,0.9]}
def assigner(gp):
...: group = gp['Test Group'].iloc[0]
...: cut = pd.qcut(
np.arange(gp.shape[0]),
q=np.cumsum([0] + percentages[group]),
labels=range(len(percentages[group]))
).get_values()
...: return pd.Series(cut[np.random.permutation(gp.shape[0])], index=gp.index, name='flag')
df['flag'] = df.groupby('Test Group', group_keys=False).apply(assigner)
ValueError: Bin edges must be unique: array([ 0, 0, 2621], dtype=int64).
You can drop duplicate edges by setting the 'duplicates' kwarg
... 并继续出现此错误
我找到了这个答案,可能会有帮助How to qcut with non unique bin edges?;但 rank dows 不适用于 np
def assigner(gp):
...: group = gp['Campaign Test Description'].iloc[0]
...: cut = pd.qcut(
np.arange(gp.shape[0]).rank(method='first'),
q=np.cumsum([0] + percentages[group]),
labels=range(len(percentages[group]))
).get_values()
...: return pd.Series(cut[np.random.permutation(gp.shape[0])], index=gp.index, name='flag')
AttributeError: 'numpy.ndarray' object has no attribute 'rank'
我尝试删除重复项
def assigner(gp):
...: group = gp['Campaign Test Description'].iloc[0]
...: cut = pd.qcut(
np.arange(gp.shape[0]),
q=np.cumsum([0] + percentages[group]),
labels=range(len(percentages[group])),duplicates='drop'
).get_values()
...: return pd.Series(cut[np.random.permutation(gp.shape[0])], index=gp.index, name='flag')
ValueError: Bin labels must be one fewer than the number of bin edges
仍然出现错误
您正在进行 train/test 拆分,这在机器学习中很常用。这是一种方法(仔细检查我的百分比是否正确):
df_pct = pd.DataFrame({ 'ID': ['Innovators','Early Adopters' ,'Early Majority','Late Majority','Laggards','Lost','Prospective'], 'test_cutoff':[1,0.8,0.9,0.1,0.8,0.9,0.9]})
df=df.merge(df_pct)
df['is_test'] = np.random.uniform(0, 1, len(df)) >= df['test_cutoff']
此外,您的 'Late Majority' 百分比加起来不等于 100。