迭代 Pandas DataFrame 的副本 partitions/groups
Iterate over duplicate partitions/groups of a Pandas DataFrame
我有一个这样的df
id val1 val2 val3
0 1 1 2
1 1 NaN 2
2 1 4 2
3 1 4 2
4 2 1 1
5 3 NaN 3
6 3 7 3
7 3 7 3
然后
temp_df = df.loc[df.duplicated(subset=['val1','val3'], keep=False)]
给我这个
id val1 val2 val3
0 1 1 2
1 1 NaN 2
2 1 4 2
3 1 4 2
5 3 NaN 3
6 3 7 3
7 3 7 3
如何遍历每个包含重复值的 partition/group?
for partition in temp_df......:
print(partition)
id val1 val2 val3
0 1 1 2
1 1 NaN 2
2 1 4 2
3 1 4 2
id val1 val2 val3
5 3 NaN 3
6 3 7 3
7 3 7 3
目标是用分区列的模式来估算 NaN 值。例如mode(1, 4, 4) = 4
所以我想把第一个分区的NaN值填成4。同理,我想把第二个分区的NaN值填成7。
更新
使用groupby_apply
:
df['val2'] = df.groupby(['val1', 'val3'])['val2'] \
.apply(lambda x: x.fillna(x.mode().squeeze()))
print(df)
# Output:
id val1 val2 val3
0 0 1 1.0 2
1 1 1 4.0 2
2 2 1 4.0 2
3 3 1 4.0 2
4 4 2 1.0 1
5 5 3 7.0 3
6 6 3 7.0 3
7 7 3 7.0 3
旧答案
IIUC,按val2
对dataframe排序后使用groupby
然后向前填充:
df['val2'] = df.sort_values('val2').groupby(['val1', 'val3'])['val2'].ffill()
print(df)
# Output:
id val1 val2 val3
0 0 1 1.1 2.2
1 1 1 1.1 2.2
2 3 2 1.3 1.0
3 4 3 1.5 6.2
4 5 3 1.5 6.2
我有一个这样的df
id val1 val2 val3
0 1 1 2
1 1 NaN 2
2 1 4 2
3 1 4 2
4 2 1 1
5 3 NaN 3
6 3 7 3
7 3 7 3
然后
temp_df = df.loc[df.duplicated(subset=['val1','val3'], keep=False)]
给我这个
id val1 val2 val3
0 1 1 2
1 1 NaN 2
2 1 4 2
3 1 4 2
5 3 NaN 3
6 3 7 3
7 3 7 3
如何遍历每个包含重复值的 partition/group?
for partition in temp_df......:
print(partition)
id val1 val2 val3
0 1 1 2
1 1 NaN 2
2 1 4 2
3 1 4 2
id val1 val2 val3
5 3 NaN 3
6 3 7 3
7 3 7 3
目标是用分区列的模式来估算 NaN 值。例如mode(1, 4, 4) = 4
所以我想把第一个分区的NaN值填成4。同理,我想把第二个分区的NaN值填成7。
更新
使用groupby_apply
:
df['val2'] = df.groupby(['val1', 'val3'])['val2'] \
.apply(lambda x: x.fillna(x.mode().squeeze()))
print(df)
# Output:
id val1 val2 val3
0 0 1 1.0 2
1 1 1 4.0 2
2 2 1 4.0 2
3 3 1 4.0 2
4 4 2 1.0 1
5 5 3 7.0 3
6 6 3 7.0 3
7 7 3 7.0 3
旧答案
IIUC,按val2
对dataframe排序后使用groupby
然后向前填充:
df['val2'] = df.sort_values('val2').groupby(['val1', 'val3'])['val2'].ffill()
print(df)
# Output:
id val1 val2 val3
0 0 1 1.1 2.2
1 1 1 1.1 2.2
2 3 2 1.3 1.0
3 4 3 1.5 6.2
4 5 3 1.5 6.2