Python :删除数据框中的特定行并保留特定行

Python : Dropping specific rows in a dataframe and keep a specif one

假设我有这个数据框

Name = ['ID', 'Country', 'IBAN','ID_info_1', 'Dan_Age', 'ID_info_1','Dan_city','ID_info_1','Dan_country','ID_info_1', 'ID_info_2', 'ID_info_2','ID_info_2', 'Dan_sex', 'Dan_Age', 'Dan_country','Dan_sex' , 'Dan_city','Dan_country' ]
Value = ['TAMARA_CO', 'GERMANY','FR56', '12', '18','25','Berlin','34', '55','345','432', '43', 'GER', 'M', '22', 'FRA', 'M', 'Madrid', 'ESP']
Ccy = ['','','','EUR','EUR','EUR','','EUR','','','','EUR','EUR','USD','USD','','CHF', '','DKN']
Group = ['0','0','0','1','1','2','2','3','3','4','1','2','3','4','2','2','2','3','3']
df = pd.DataFrame({'Name':Name, 'Value' : Value, 'Ccy' : Ccy,'Group':Group})

print(df)

           Name      Value  Ccy Group
0            ID  TAMARA_CO          0
1       Country    GERMANY          0
2          IBAN       FR56          0
3     ID_info_1         12  EUR     1
4       Dan_Age         18  EUR     1
5     ID_info_1         25  EUR     2
6      Dan_city     Berlin          2
7     ID_info_1         34  EUR     3
8   Dan_country         55          3
9     ID_info_1        345          4
10    ID_info_2        432          1
11    ID_info_2         43  EUR     2
12    ID_info_2        GER  EUR     3
13      Dan_sex          M  USD     4
14      Dan_Age         22  USD     2
15  Dan_country        FRA          2
16      Dan_sex          M  CHF     2
17     Dan_city     Madrid          3
18  Dan_country        ESP  DKN     3

我想缩小这个数据框!我想通过保留列“组”中具有最高级别的行来仅减少包含字符串“信息”的行。所以在这个数据框中,这意味着我在第 4 组中保留行“ID_info_1”,在第 3 组中保留“ID_info_1”。此外,我想更改它们在“组”列为 1。

所以最后我想得到这个索引也被重置的新数据框

           Name      Value  Ccy Group
0            ID  TAMARA_CO          0
1       Country    GERMANY          0
2          IBAN       FR56          0
3     ID_info_1         12  EUR     1
4       Dan_Age         18  EUR     1
5      Dan_city     Berlin          2
6   Dan_country         55          3
7     ID_info_1        345          1
8     ID_info_2        GER  EUR     1
9       Dan_sex          M  USD     4
10      Dan_Age         22  USD     2
11  Dan_country        FRA          2
12      Dan_sex          M  CHF     2
13     Dan_city     Madrid          3
14  Dan_country        ESP  DKN     3

有人有有效的想法吗?

谢谢

您可以使用在名称列中搜索字符串 'info' 并在组列中搜索值的 lambda 函数创建掩码。

arr = []
mask = df.apply(lambda x: True if 'info' in x['Name'] else False, axis=1)
for info in df[mask]['Name'].unique():
    min_val = df.loc[df['Name'] == info]['Group'].min()
    arr += list(df[(df['Name'] == info) & (df['Group'] > min_val)].index)

df.drop(arr, inplace=True)
df.reset_index(inplace=True)


       Name      Value  Ccy     Group
0            ID  TAMARA_CO          0
1       Country    GERMANY          0
2          IBAN       FR56          0
3     ID_info_1         12  EUR     1
4       Dan_Age         18  EUR     1
5      Dan_city     Berlin          2
6   Dan_country         55          3
7     ID_info_2        432          1
8       Dan_sex          M  USD     4
9       Dan_Age         22  USD     2
10  Dan_country        FRA          2
11      Dan_sex          M  CHF     2
12     Dan_city     Madrid          3
13  Dan_country        ESP  DKN     3

我知道 df 看起来不像您想要的 100p,但这就是我理解您的问题的方式。如果我错了请告诉我。

编辑 重读问题并编辑了一些代码。

这个怎么样:

# select rows with "info"
di = df[df.Name.str.contains('info')]

# Find the rows below max for removal
di = di[di.groupby('Name')['Group'].transform('max') != di['Group']]

# Remove those rows and set a new index as requested
df = df.drop(di.index).reset_index(drop=True)

# Change group to one on remaining "info" rows
df.loc[df.Name.str.contains('info'), 'Group'] = 1