通过在 Pandas Dataframe 中添加乘数索引来创建包含重复项的列表
Creat a list with repeated terms by adding a multiplier index in Pandas Dataframe
给定这样的数据框:
row1 = ['AAA', 'BBB', 'BBB', 'CCC', 'AAA', 'AAA']
row2 = ['CCC', 'CCC', 'BBB', 'AAA', 'AAA', 'AAA']
col = {'List': [row1, row2]}
df = pd.DataFrame(col)
这导致:
List
0
['AAA', 'BBB', 'BBB', 'CCC', 'AAA', 'AAA']
1
['CCC', 'CCC', 'BBB', 'AAA', 'AAA', 'AAA']
我想生成以下数据框:
List
0
['AAA', 'BBB x 2', 'CCC', 'AAA x 2']
1
['CCC x 2', 'BBB', 'AAA x 3']
其中最后一列 List 包含一个乘数索引,指示该术语在列表中连续出现的次数。
您能否提出解决此任务的 pandas 指令?
itertools groupby
可以满足您的需求。如果该组中有多个记录,自定义函数会将计数加入值。
from itertools import groupby
import pandas as pd
row1 = ['AAA', 'BBB', 'BBB', 'CCC', 'AAA', 'AAA']
row2 = ['CCC', 'CCC', 'BBB', 'AAA', 'AAA', 'AAA']
col = {'Lists': [row1, row2]}
df = pd.DataFrame(col)
def count_items(row):
output = []
for k, d in groupby(row):
x = list(d)
if len(x)>1:
output.append(' x '.join([k, str(len(x))]))
else:
output.append(k)
return output
df['Lists'] = df.Lists.apply(lambda x: count_items(x))
print(df)
输出
Lists
0 [AAA, BBB x 2, CCC, AAA x 2]
1 [CCC x 2, BBB, AAA x 3]
在您的情况下,您可能需要检查 explode
,然后我们创建包含 cumsum
和 shift
的子组
s = df.explode('List')
s = s.groupby([s.index,s['List'].shift().ne(s['List']).cumsum()])['List'].agg(['first','count'])
out = s['first'] +'x' + s['count'].astype(str)
out = out.mask(s['count']==1,s['first']).groupby(level=0).agg(list)
out
Out[202]:
0 [AAA, BBBx2, CCC, AAAx2]
1 [CCCx2, BBB, AAAx3]
dtype: object
给定这样的数据框:
row1 = ['AAA', 'BBB', 'BBB', 'CCC', 'AAA', 'AAA']
row2 = ['CCC', 'CCC', 'BBB', 'AAA', 'AAA', 'AAA']
col = {'List': [row1, row2]}
df = pd.DataFrame(col)
这导致:
List | |
---|---|
0 | ['AAA', 'BBB', 'BBB', 'CCC', 'AAA', 'AAA'] |
1 | ['CCC', 'CCC', 'BBB', 'AAA', 'AAA', 'AAA'] |
我想生成以下数据框:
List | |
---|---|
0 | ['AAA', 'BBB x 2', 'CCC', 'AAA x 2'] |
1 | ['CCC x 2', 'BBB', 'AAA x 3'] |
其中最后一列 List 包含一个乘数索引,指示该术语在列表中连续出现的次数。
您能否提出解决此任务的 pandas 指令?
itertools groupby
可以满足您的需求。如果该组中有多个记录,自定义函数会将计数加入值。
from itertools import groupby
import pandas as pd
row1 = ['AAA', 'BBB', 'BBB', 'CCC', 'AAA', 'AAA']
row2 = ['CCC', 'CCC', 'BBB', 'AAA', 'AAA', 'AAA']
col = {'Lists': [row1, row2]}
df = pd.DataFrame(col)
def count_items(row):
output = []
for k, d in groupby(row):
x = list(d)
if len(x)>1:
output.append(' x '.join([k, str(len(x))]))
else:
output.append(k)
return output
df['Lists'] = df.Lists.apply(lambda x: count_items(x))
print(df)
输出
Lists
0 [AAA, BBB x 2, CCC, AAA x 2]
1 [CCC x 2, BBB, AAA x 3]
在您的情况下,您可能需要检查 explode
,然后我们创建包含 cumsum
和 shift
s = df.explode('List')
s = s.groupby([s.index,s['List'].shift().ne(s['List']).cumsum()])['List'].agg(['first','count'])
out = s['first'] +'x' + s['count'].astype(str)
out = out.mask(s['count']==1,s['first']).groupby(level=0).agg(list)
out
Out[202]:
0 [AAA, BBBx2, CCC, AAAx2]
1 [CCCx2, BBB, AAAx3]
dtype: object