删除重复项并排除特定列并取最低值
drop duplicates and exclude specific columns and take the lowest value
我有这个示例数据集
CPU_Sub_Series RAM Screen_Size Resolution Price
Intel i5 8 15.6 1920x1080 699
Intel i5 8 15.6 1920x1080 569
Intel i5 8 15.6 1920x1080 789
Ryzen 5 16 16.0 2560x1600 999
Ryzen 5 32 16.0 2560x1600 1299
我只想检查并删除重复数据,价格列除外,然后保留价格列中的最低值。
所以,输出列是这样的:
CPU_Sub_Series RAM Screen_Size Resolution Price
Intel i5 8 15.6 1920x1080 569
Ryzen 5 16 16.0 2560x1600 999
Ryzen 5 32 16.0 2560x1600 1299
我应该先按价格排序吗?然后呢?
df.sort_values('Price')
?然后呢?
df.groupby(["CPU_Sub_Series","RAM","Screen_Size","Resolution"], as_index=False).min()
除了@Daniele Bianco 的回答,你还可以得到这样的结果(几乎类似的方法,但形式略有不同):
import pandas as pd
df = pd.DataFrame({
'CPU_Sub_Series': ['Intel i5', 'Intel i5', 'Intel i5', 'Ryzen 5', 'Ryzen 5'],
'RAM': [8, 8, 8, 16, 32],
'Screen_Size': [15.6, 15.6, 15.6, 16.0, 16.0],
'Resolution': ['1920x1080', '1920x1080', '1920x1080', '2560x1600', '2560x1600'],
'Price': [699, 569, 789, 999, 1299]
})
df = df.groupby(["CPU_Sub_Series", "RAM", "Screen_Size", "Resolution"])['Price'].min().reset_index()
print(df)
# CPU_Sub_Series RAM Screen_Size Resolution Price
#0 Intel i5 8 15.6 1920x1080 569
#1 Ryzen 5 16 16.0 2560x1600 999
#2 Ryzen 5 32 16.0 2560x1600 1299
我有这个示例数据集
CPU_Sub_Series RAM Screen_Size Resolution Price
Intel i5 8 15.6 1920x1080 699
Intel i5 8 15.6 1920x1080 569
Intel i5 8 15.6 1920x1080 789
Ryzen 5 16 16.0 2560x1600 999
Ryzen 5 32 16.0 2560x1600 1299
我只想检查并删除重复数据,价格列除外,然后保留价格列中的最低值。
所以,输出列是这样的:
CPU_Sub_Series RAM Screen_Size Resolution Price
Intel i5 8 15.6 1920x1080 569
Ryzen 5 16 16.0 2560x1600 999
Ryzen 5 32 16.0 2560x1600 1299
我应该先按价格排序吗?然后呢?
df.sort_values('Price')
?然后呢?
df.groupby(["CPU_Sub_Series","RAM","Screen_Size","Resolution"], as_index=False).min()
除了@Daniele Bianco 的回答,你还可以得到这样的结果(几乎类似的方法,但形式略有不同):
import pandas as pd
df = pd.DataFrame({
'CPU_Sub_Series': ['Intel i5', 'Intel i5', 'Intel i5', 'Ryzen 5', 'Ryzen 5'],
'RAM': [8, 8, 8, 16, 32],
'Screen_Size': [15.6, 15.6, 15.6, 16.0, 16.0],
'Resolution': ['1920x1080', '1920x1080', '1920x1080', '2560x1600', '2560x1600'],
'Price': [699, 569, 789, 999, 1299]
})
df = df.groupby(["CPU_Sub_Series", "RAM", "Screen_Size", "Resolution"])['Price'].min().reset_index()
print(df)
# CPU_Sub_Series RAM Screen_Size Resolution Price
#0 Intel i5 8 15.6 1920x1080 569
#1 Ryzen 5 16 16.0 2560x1600 999
#2 Ryzen 5 32 16.0 2560x1600 1299