在 类 内交换值
Swap values within classes
请问如何在 类 内交换值?
如图table:
- - - - - - - - - - 之前 - - - - - - - - - - - - - - - - 之后 - - - - - - - - - -
我想这样做是因为它是过度采样的数据。这是非常重复的,这会导致机器学习工具过度拟合。
好的,试试这个:
# Setup example dataframe
df = pd.DataFrame({"Class" : [1,2,1,3,1,2,1,3,1,2,1,3,1,2,1,3],
1:[1,1,1,0,1,0,1,0,1,0,1,0,1,0,1,1],
2:[0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0],
3:[0,0,1,1,1,0,1,1,0,0,1,1,1,0,1,1],
4:[1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1],
5:[0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1],
6:[0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1]}).set_index("Class")
# Do a filter on class, and store the positions/index of matching contents
class_to_edit=3
swappable_indices = np.where(df.index==class_to_edit)[0]
# Extract the column to edit
column_to_edit=1
column_values = df[column_to_edit].values
# Decide how many values to swap, and randomly assign swaps
# No guarantee here that the swaps will not contain the same values i.e. you could
# end up swapping 1's for 1's and 0's for 0's here - it's entirely random.
number_of_swaps = 2
swap_pairs = np.random.choice(swappable_indices,number_of_swaps*2, replace=False)
# Using the swap pairs, build a map of substitutions,
# starting with a vanilla no-swap map, then updating it with the generated swaps
swap_map={e:e for e in range(0,len(column_values))}
swap_map.update({swappable_indices[e]:swappable_indices[e+1] for e in range(0,len(swap_pairs),2)})
swap_map.update({swappable_indices[e+1]:swappable_indices[e] for e in range(0,len(swap_pairs),2)})
# Having built the swap-map, apply it to the data in the column,
column_values=[column_values[swap_map[e]] for e,v in enumerate(column_values)]
# and then plug the column back into the dataframe
df[column_to_edit]=column_values
它有点肮脏,我相信有一种更简洁的方法可以在单行列表理解中构建替换映射 - 但应该可以解决问题。
或者,np.permute 函数可能会在添加一些噪音方面取得一些成果(尽管不是通过执行离散交换)。
[edit] 为了进行测试,请尝试刚性稍差的数据集,这是一个更随机生成的示例。如果你想在数据集中强加一些顺序,只需编辑掉你想用固定值替换的列。
df = pd.DataFrame({"Class" : [1,2,1,3,1,2,1,3,1,2,1,3,1,2,1,3],
1:np.random.choice([0,1],16),
2:np.random.choice([0,1],16),
3:np.random.choice([0,1],16),
4:np.random.choice([0,1],16),
5:np.random.choice([0,1],16),
6:np.random.choice([0,1],16)}).set_index("Class")
请问如何在 类 内交换值?
如图table:
- - - - - - - - - - 之前 - - - - - - - - - - - - - - - - 之后 - - - - - - - - - -
我想这样做是因为它是过度采样的数据。这是非常重复的,这会导致机器学习工具过度拟合。
好的,试试这个:
# Setup example dataframe
df = pd.DataFrame({"Class" : [1,2,1,3,1,2,1,3,1,2,1,3,1,2,1,3],
1:[1,1,1,0,1,0,1,0,1,0,1,0,1,0,1,1],
2:[0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0],
3:[0,0,1,1,1,0,1,1,0,0,1,1,1,0,1,1],
4:[1,0,1,1,1,0,1,1,1,0,1,1,1,0,1,1],
5:[0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1],
6:[0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1]}).set_index("Class")
# Do a filter on class, and store the positions/index of matching contents
class_to_edit=3
swappable_indices = np.where(df.index==class_to_edit)[0]
# Extract the column to edit
column_to_edit=1
column_values = df[column_to_edit].values
# Decide how many values to swap, and randomly assign swaps
# No guarantee here that the swaps will not contain the same values i.e. you could
# end up swapping 1's for 1's and 0's for 0's here - it's entirely random.
number_of_swaps = 2
swap_pairs = np.random.choice(swappable_indices,number_of_swaps*2, replace=False)
# Using the swap pairs, build a map of substitutions,
# starting with a vanilla no-swap map, then updating it with the generated swaps
swap_map={e:e for e in range(0,len(column_values))}
swap_map.update({swappable_indices[e]:swappable_indices[e+1] for e in range(0,len(swap_pairs),2)})
swap_map.update({swappable_indices[e+1]:swappable_indices[e] for e in range(0,len(swap_pairs),2)})
# Having built the swap-map, apply it to the data in the column,
column_values=[column_values[swap_map[e]] for e,v in enumerate(column_values)]
# and then plug the column back into the dataframe
df[column_to_edit]=column_values
它有点肮脏,我相信有一种更简洁的方法可以在单行列表理解中构建替换映射 - 但应该可以解决问题。
或者,np.permute 函数可能会在添加一些噪音方面取得一些成果(尽管不是通过执行离散交换)。
[edit] 为了进行测试,请尝试刚性稍差的数据集,这是一个更随机生成的示例。如果你想在数据集中强加一些顺序,只需编辑掉你想用固定值替换的列。
df = pd.DataFrame({"Class" : [1,2,1,3,1,2,1,3,1,2,1,3,1,2,1,3],
1:np.random.choice([0,1],16),
2:np.random.choice([0,1],16),
3:np.random.choice([0,1],16),
4:np.random.choice([0,1],16),
5:np.random.choice([0,1],16),
6:np.random.choice([0,1],16)}).set_index("Class")