使用 pandas 数据框基于在第一个字段中具有相同的值获取列的最多 2 个值
Getting a maximum of 2 values of a column based on having same values in the first field using pandas dataframe
我在数据框中有三个单独的列:Link_Id
、NEW
、Length
。
我想将相似的 Link_Id
组合在一起,然后收集它们的 Length
值,从那些具有最大 Length
值的值中,我想 return 它们的 ( Link_Id
) 和 (NEW
) 列值。
import pandas as pd
# List all columns you want to include in the dataframe. I include all with:
cols = ['LINK_ID', 'NEW', 'Length'] # Or list them manually: ['kommunnamn', 'kkod', ... ]
# A generator to yield one row at a time
datagen = ([f[col] for col in cols] for f in vlayer.getFeatures())
df = pd.DataFrame.from_records(data=datagen, columns=cols)
dff = df.groupby((df['LINK_ID'].shift() != df['LINK_ID']).cumsum())
for k, v in dff:
print(f'[group {k}]')
print(v)
result = df.groupby('LINK_ID').agg({'Length': ['max']})
IIUC,尝试:
result = df.loc[df.groupby("LINK_ID")["Length"].idxmax()]
我在数据框中有三个单独的列:Link_Id
、NEW
、Length
。
我想将相似的 Link_Id
组合在一起,然后收集它们的 Length
值,从那些具有最大 Length
值的值中,我想 return 它们的 ( Link_Id
) 和 (NEW
) 列值。
import pandas as pd
# List all columns you want to include in the dataframe. I include all with:
cols = ['LINK_ID', 'NEW', 'Length'] # Or list them manually: ['kommunnamn', 'kkod', ... ]
# A generator to yield one row at a time
datagen = ([f[col] for col in cols] for f in vlayer.getFeatures())
df = pd.DataFrame.from_records(data=datagen, columns=cols)
dff = df.groupby((df['LINK_ID'].shift() != df['LINK_ID']).cumsum())
for k, v in dff:
print(f'[group {k}]')
print(v)
result = df.groupby('LINK_ID').agg({'Length': ['max']})
IIUC,尝试:
result = df.loc[df.groupby("LINK_ID")["Length"].idxmax()]