将最后一行的值添加到这一行
Add the value of the last row to to this row
我想在按名称分组时获取最后一行的值。例如,第 2 行中名称 Walter 的最后一次迭代,我想在 Col1 中获取 Dog + ", " + Cat,在 Col3 中获取 Beer + ", " + Wine。有很多列,所以我想根据 indexing/column 位置而不是列名来制作它。
+------+---------+-------+
| Col1 | Name | Col3 |
+------+---------+-------+
| Dog | Walter | Beer |
| Cat | Walter | Wine |
| Dog | Alfonso | Cider |
| Dog | Alfonso | Cider |
| Dog | Alfonso | Vodka |
+------+---------+-------+
这是我想要的输出:
+---------------+---------------------------+---------------------+
| Col1 | Name | Col3 |
+---------------+---------------------------+---------------------+
| Dog | Walter | Beer |
| Dog, Cat | Walter, Walter | Beer, Wine |
| Dog | Alfonso | Cider |
| Dog, Dog | Alfonso, Alfonso | Cider, Cider |
| Dog, Dog, Dog | Alfonso, Alfonso, Alfosno | Cider, Cider, Vodka |
+---------------+---------------------------+---------------------+
这是我试过的(但不起作用):
for i in df:
if df.loc[i,1] == df.loc[i+1,1]:
df.loc[i,0] + ", " + df.loc[i+1,0]
else:
df.loc[i+1,0]
我读到用 for 循环迭代 pandas 中的行是不受欢迎的,所以我想通过使用矢量化或应用(或其他一些有效的方法)来获得输出。
您可以使用 groupby
和 cumsum
。如果你不介意(取决于你之后的使用)在最后有一个额外的 comma/space ,你可以这样做:
print (df.groupby('Name')[['Col1', 'Col3']].apply(lambda x: (x + ', ').cumsum()))
Col1 Col3
0 Dog, Beer,
1 Dog, Cat, Beer, Wine,
2 Dog, Cider,
3 Dog, Dog, Cider, Cider,
4 Dog, Dog, Dog, Cider, Cider, Vodka,
但如果你想删除多余的 comma/space,只需将 str[:-2] 添加到每一列,如:
print (df.groupby('Name')[['Col1', 'Col3']].apply(lambda x: (x + ', ').cumsum())\
.apply(lambda x: x.str[:-2]))
Col1 Col3
0 Dog Beer
1 Dog, Cat Beer, Wine
2 Dog Cider
3 Dog, Dog Cider, Cider
4 Dog, Dog, Dog Cider, Cider, Vodka
您基本上想做的是 运行 每个组的交换聚合函数。 Pandas 有 comsum
用于常规加法,但不支持自定义交换函数。为此,您可能需要使用一些 numpy 函数:
df = pd.DataFrame({"col1": ["D", "C", "D", "D", "D"], "Name": ["W", "W", "A", "A", "A"],
"col3": ["B", "W", "C", "C", "V"] })
import numpy as np
def ser_accum(op,ser):
u_op = np.frompyfunc(op, 2, 1) # two inputs, one output
return u_op.accumulate(ser, dtype=np.object)
def plus(x,y):
return x + "," + y
def accum(df):
for col in df.columns:
df[col] = ser_accum(plus, df[col])
return df
df.groupby("Name").apply(accum)
结果如下:
col1 Name col3
0 D W B
1 D,C W,W B,W
2 D A C
3 D,D A,A C,C
4 D,D,D A,A,A C,C,V
如果您只关心 Col1
和 Col3
的 最后一行 结果,试试这个:
df.groupby('Name').agg(', '.join)
结果:
Col1 Col3
Name
Alfonso Dog, Dog, Dog Cider, Cider, Vodka
Walter Dog, Cat Beer, Wine
这是在索引上使用 accumulate
并使用 df.agg
方法的另一种方法:
from itertools import accumulate
import numpy as np
def fun(a):
l = [[i] for i in a.index]
acc = list(accumulate(l, lambda x, y: np.concatenate([x, y])))
return pd.concat([a.loc[idx].agg(','.join) for idx in acc],axis=1).T
out = pd.concat([fun(v) for k,v in df.groupby('Name',sort=False)])
print(out)
Col1 Name Col3
0 Dog Walter Beer
1 Dog,Cat Walter,Walter Beer,Wine
0 Dog Alfonso Cider
1 Dog,Dog Alfonso,Alfonso Cider,Cider
2 Dog,Dog,Dog Alfonso,Alfonso,Alfonso Cider,Cider,Vodka
您可以在最后添加一个带有drop=True
的重置索引来重置索引
我想在按名称分组时获取最后一行的值。例如,第 2 行中名称 Walter 的最后一次迭代,我想在 Col1 中获取 Dog + ", " + Cat,在 Col3 中获取 Beer + ", " + Wine。有很多列,所以我想根据 indexing/column 位置而不是列名来制作它。
+------+---------+-------+
| Col1 | Name | Col3 |
+------+---------+-------+
| Dog | Walter | Beer |
| Cat | Walter | Wine |
| Dog | Alfonso | Cider |
| Dog | Alfonso | Cider |
| Dog | Alfonso | Vodka |
+------+---------+-------+
这是我想要的输出:
+---------------+---------------------------+---------------------+
| Col1 | Name | Col3 |
+---------------+---------------------------+---------------------+
| Dog | Walter | Beer |
| Dog, Cat | Walter, Walter | Beer, Wine |
| Dog | Alfonso | Cider |
| Dog, Dog | Alfonso, Alfonso | Cider, Cider |
| Dog, Dog, Dog | Alfonso, Alfonso, Alfosno | Cider, Cider, Vodka |
+---------------+---------------------------+---------------------+
这是我试过的(但不起作用):
for i in df:
if df.loc[i,1] == df.loc[i+1,1]:
df.loc[i,0] + ", " + df.loc[i+1,0]
else:
df.loc[i+1,0]
我读到用 for 循环迭代 pandas 中的行是不受欢迎的,所以我想通过使用矢量化或应用(或其他一些有效的方法)来获得输出。
您可以使用 groupby
和 cumsum
。如果你不介意(取决于你之后的使用)在最后有一个额外的 comma/space ,你可以这样做:
print (df.groupby('Name')[['Col1', 'Col3']].apply(lambda x: (x + ', ').cumsum()))
Col1 Col3
0 Dog, Beer,
1 Dog, Cat, Beer, Wine,
2 Dog, Cider,
3 Dog, Dog, Cider, Cider,
4 Dog, Dog, Dog, Cider, Cider, Vodka,
但如果你想删除多余的 comma/space,只需将 str[:-2] 添加到每一列,如:
print (df.groupby('Name')[['Col1', 'Col3']].apply(lambda x: (x + ', ').cumsum())\
.apply(lambda x: x.str[:-2]))
Col1 Col3
0 Dog Beer
1 Dog, Cat Beer, Wine
2 Dog Cider
3 Dog, Dog Cider, Cider
4 Dog, Dog, Dog Cider, Cider, Vodka
您基本上想做的是 运行 每个组的交换聚合函数。 Pandas 有 comsum
用于常规加法,但不支持自定义交换函数。为此,您可能需要使用一些 numpy 函数:
df = pd.DataFrame({"col1": ["D", "C", "D", "D", "D"], "Name": ["W", "W", "A", "A", "A"],
"col3": ["B", "W", "C", "C", "V"] })
import numpy as np
def ser_accum(op,ser):
u_op = np.frompyfunc(op, 2, 1) # two inputs, one output
return u_op.accumulate(ser, dtype=np.object)
def plus(x,y):
return x + "," + y
def accum(df):
for col in df.columns:
df[col] = ser_accum(plus, df[col])
return df
df.groupby("Name").apply(accum)
结果如下:
col1 Name col3
0 D W B
1 D,C W,W B,W
2 D A C
3 D,D A,A C,C
4 D,D,D A,A,A C,C,V
如果您只关心 Col1
和 Col3
的 最后一行 结果,试试这个:
df.groupby('Name').agg(', '.join)
结果:
Col1 Col3
Name
Alfonso Dog, Dog, Dog Cider, Cider, Vodka
Walter Dog, Cat Beer, Wine
这是在索引上使用 accumulate
并使用 df.agg
方法的另一种方法:
from itertools import accumulate
import numpy as np
def fun(a):
l = [[i] for i in a.index]
acc = list(accumulate(l, lambda x, y: np.concatenate([x, y])))
return pd.concat([a.loc[idx].agg(','.join) for idx in acc],axis=1).T
out = pd.concat([fun(v) for k,v in df.groupby('Name',sort=False)])
print(out)
Col1 Name Col3
0 Dog Walter Beer
1 Dog,Cat Walter,Walter Beer,Wine
0 Dog Alfonso Cider
1 Dog,Dog Alfonso,Alfonso Cider,Cider
2 Dog,Dog,Dog Alfonso,Alfonso,Alfonso Cider,Cider,Vodka
您可以在最后添加一个带有drop=True
的重置索引来重置索引