将子数据框列合并到 None 值 pandas 中的父数据框

Question

我有一个这样的 pandas 数据框

已编辑

Promotion ID	Month	Products
PID-1	June	`Refer below for sample1`
PID-2	July	`Refer below for sample2`

示例 1： |产品编号| |--| |产品1| |PROD2|

示例 2： |产品编号| |--| |产品1| |产品2| |PROD3|

我想将此数据框转换为以下内容

Promotion ID	Month	Products
PID-1	June	PROD1
		PROD2
PID-2	July	PROD 1
		PROD2
		PROD3

空格只能是None或NA值。有没有办法在 pandas 中执行此操作而无需遍历行？

Answer 1

您可以使用 explode 像这样展平您的数据框：

#generating data
df = pd.DataFrame([
    ['pid-1', 'June', '| Product Id|  |PROD1| |PROD2|'],
    ['pid-2', 'July', '| Product Id| |PROD1| |PROD2| |PROD3|']
], columns = ['Promotion ID', 'Month', 'Products'])

# extracting the product list
df['Products'] = df['Products']\
    .apply(lambda s: [x for x in re.split(' *\| *', s) if x != '' and x != 'Product Id'])
exploded_df = exploded_df = df.explode('Products', ignore_index=True)

此时 df 和 exploded_df 看起来像这样：

# df
  Promotion ID Month                Products
0        pid-1  June          [PROD1, PROD2]
1        pid-2  July  [PROD1, PROD2, PROD3]

# exploded_df
  Promotion ID Month Products
0        pid-1  June    PROD1
1        pid-1  June    PROD2
2        pid-2  July    PROD1
3        pid-2  July    PROD2
4        pid-2  July   PROD3

我会到此为止。恕我直言，只保留第一行的 Month 和 Promotion ID 的值只会让你更难喜欢。然而，由于您要求您可以使用 rank 和 loc 将 None 分配给所有不是第一组的行：

# rank needs a numeric column
exploded_df['index'] = exploded_df.index
# using rank to create a filter on rows that are not the first of their group
filter = exploded_df\
    .groupby(['Promotion ID'])['index']\
    .rank('dense').apply(lambda x: x > 1)
# getting rid of the index column
exploded_df = exploded_df.drop('index', axis=1)
# and voila
exploded_df.loc[filter, ['Month', 'Promotion ID']] = None

结果：

  Promotion ID Month Products
0         None  None    PROD1
1        pid-1  June    PROD2
2         None  None    PROD1
3        pid-2  July    PROD2
4        pid-2  July   PROD3

将子数据框列合并到 None 值 pandas 中的父数据框

Merge child data frame columns to parent data frame with None values in pandas

python

merge

dataframe

pandas

已编辑