如何获取行中的项目数以及检查是否存在一个项目，并最终将第一行保留在 python 中？

Question

假设我有一个数据框如下：

 df = pd.DataFrame({
        'Column A': [12,12,12, 13, 15, 16, 141, 141, 141, 141],
         'Column B':['Apple' ,'Apple' ,'Orange' ,'Apple' , np.nan, 'Orange', 'Apple', np.nan, 'Apple', 'Apple']})

基于这些条件：

如果A列中的值重复，则统计B列中的单词'Orange'并将其粘贴到新的C列中（例如，12有3行， 'Orange' 是 1，这个 1 应该在新的 C 列中）。对于非重复行，只需粘贴相应的值即可。
如果A列中的值重复，则统计B列中的单词'Apple'并将其粘贴到新的D列中（例如，12有3行， 'Apple' 是 2，这个 2 应该在新的 D 列中）。对于非重复行，只需粘贴相应的值即可。
对于A列的重复行和非重复行，如果B列中出现'Orange'字样，则在Column中写'yes' else 'No' E.

我想要一个输出。我在 python jupyter notebook 中尝试，任何人都可以帮我得到这样的输出：

      | Column A | Column B |Column C |Column D |Column E 
----- | -------- | ---------|---------|---------|---------
 0    | 12       | Apple    |1        |2        |Yes   
 1    | 13       | Apple    |0        |1        |No 
 2    | 15       | NaN      |NaN      |NaN      |NaN     
 3    | 16       | Orange   |1        |0        |Yes      
 4    | 141      | Apple    |0        |3        |No

提前致谢:)

Answer 1

我认为您的问题没有强大而简单的解决方案，但请使用以下代码。

首先，定义一个函数count(x, a)，其中returns nan 如果x 包含nan，则a 在x 中出现的次数，否则。该函数将用于应用函数。

然后，使用 groupby 并应用列表功能。

temp = df.copy().groupby('Column A')['Column B'].apply(list)

之后，温度变成

Column A
12         [Apple, Apple, Orange]
13                        [Apple]
15                          [nan]
16                       [Orange]
141    [Apple, nan, Apple, Apple]
Name: Column B, dtype: object

因此，根据温度，我们可以计算出苹果和橘子的数量。

由于 df 有重复项，我删除了它们并添加了新列（C、D 和 E 列）。

df.drop_duplicates(subset = ['Column A'], keep = "first", inplace = True)
df['Column C'] = temp.apply(count, a = "Orange").values
df['Column D'] = temp.apply(count, a = "Apple").values
df['Column E'] = df['Column D'].apply(lambda x:1 if x>=1 else 0)

编辑

对不起。我错过了功能计数..

def count(x, a):
    if type(x[0]) == float:
        return np.nan
    else:
        return x.count(a)

如何获取行中的项目数以及检查是否存在一个项目，并最终将第一行保留在 python 中？

How to get count of items in rows as well as check one item is present or not, and finally keep the first row in python?

python

count

calculated-columns

dataframe

pandas