如何在 python 数据框的同一列中连接相同的行名称
how to concatenate same row names in same column on python dataframe
我有一个像这样的简单数据框:
df = pd.DataFrame({'class':['a','b','c','d','e'],
'name':['Adi','leon','adi','leo','andy'],
'age':['9','8','9','9','8'],
'score':['40','90','35','95','85']})
那么结果就是这样
class name age score
a Adi 9 40
b leon 8 90
a adi 9 35
d leo 9 95
e andy 8 85
当他只有一个人并且分数'Adi'是75,而不是40和35
您可以在先将 name
列小写后使用 pandas.DataFrame.<b>groupby</b>
and pandas.DataFrame.<b>aggregate</b>
:
import pandas as pd
df = pd.DataFrame({
'class': ['a', 'b', 'c', 'd', 'e'],
'name': ['Adi', 'leon', 'adi', 'leo', 'andy'],
'age': ['9', '8', '9', '9', '8'],
'score': ['40', '90', '35', '95', '85']
})
df['name'] = df['name'].str.lower()
df['score'] = df['score'].astype(int)
aggregate_funcs = {
'class': lambda s: ', '.join(set(s)),
'age': lambda s: ', '.join(set(s)),
'score': sum
}
df = df.groupby(df['name']).aggregate(aggregate_funcs)
print(df)
输出:
class age score
name
adi c, a 9 75
andy e 8 85
leo d 9 95
leon b 8 90
drop_duplicates()
是最好的方法,如果你使用 pandas
df['name'] = df['name'].str.lower()
df['score'] = df['score'].astype(int)
df['score'] = df['score'].groupby(df['name']).transform(sum)
df.drop_duplicates(subset='name',keep='first',inplace=True)
输出:
class name age score
0 a adi 9 75
1 b leon 8 90
3 d leo 9 95
4 e andy 8 85
如果你设置 keep='last'
:
你将得到这个输出
class name age score
1 b leon 8 90
2 c adi 9 75
3 d leo 9 95
4 e andy 8 85
我有一个像这样的简单数据框:
df = pd.DataFrame({'class':['a','b','c','d','e'],
'name':['Adi','leon','adi','leo','andy'],
'age':['9','8','9','9','8'],
'score':['40','90','35','95','85']})
那么结果就是这样
class name age score
a Adi 9 40
b leon 8 90
a adi 9 35
d leo 9 95
e andy 8 85
当他只有一个人并且分数'Adi'是75,而不是40和35
您可以在先将 name
列小写后使用 pandas.DataFrame.<b>groupby</b>
and pandas.DataFrame.<b>aggregate</b>
:
import pandas as pd
df = pd.DataFrame({
'class': ['a', 'b', 'c', 'd', 'e'],
'name': ['Adi', 'leon', 'adi', 'leo', 'andy'],
'age': ['9', '8', '9', '9', '8'],
'score': ['40', '90', '35', '95', '85']
})
df['name'] = df['name'].str.lower()
df['score'] = df['score'].astype(int)
aggregate_funcs = {
'class': lambda s: ', '.join(set(s)),
'age': lambda s: ', '.join(set(s)),
'score': sum
}
df = df.groupby(df['name']).aggregate(aggregate_funcs)
print(df)
输出:
class age score
name
adi c, a 9 75
andy e 8 85
leo d 9 95
leon b 8 90
drop_duplicates()
是最好的方法,如果你使用 pandas
df['name'] = df['name'].str.lower()
df['score'] = df['score'].astype(int)
df['score'] = df['score'].groupby(df['name']).transform(sum)
df.drop_duplicates(subset='name',keep='first',inplace=True)
输出:
class name age score
0 a adi 9 75
1 b leon 8 90
3 d leo 9 95
4 e andy 8 85
如果你设置 keep='last'
:
class name age score
1 b leon 8 90
2 c adi 9 75
3 d leo 9 95
4 e andy 8 85