将子数据框列合并到 None 值 pandas 中的父数据框

Merge child data frame columns to parent data frame with None values in pandas

我有一个这样的 pandas 数据框


Promotion ID Month Products
PID-1 June Refer below for sample1
PID-2 July Refer below for sample2

示例 1: |产品编号| |--| |产品1| |PROD2|

示例 2: |产品编号| |--| |产品1| |产品2| |PROD3|


Promotion ID Month Products
PID-1 June PROD1
PID-2 July PROD 1

空格只能是NoneNA值。有没有办法在 pandas 中执行此操作而无需遍历行?

您可以使用 explode 像这样展平您的数据框:

#generating data
df = pd.DataFrame([
    ['pid-1', 'June', '| Product Id|  |PROD1| |PROD2|'],
    ['pid-2', 'July', '| Product Id| |PROD1| |PROD2| |PROD3|']
], columns = ['Promotion ID', 'Month', 'Products'])

# extracting the product list
df['Products'] = df['Products']\
    .apply(lambda s: [x for x in re.split(' *\| *', s) if x != '' and x != 'Product Id'])
exploded_df = exploded_df = df.explode('Products', ignore_index=True)

此时 dfexploded_df 看起来像这样:

# df
  Promotion ID Month                Products
0        pid-1  June          [PROD1, PROD2]
1        pid-2  July  [PROD1, PROD2, PROD3]

# exploded_df
  Promotion ID Month Products
0        pid-1  June    PROD1
1        pid-1  June    PROD2
2        pid-2  July    PROD1
3        pid-2  July    PROD2
4        pid-2  July   PROD3

我会到此为止。恕我直言,只保留第一行的 MonthPromotion ID 的值只会让你更难喜欢。然而,由于您要求您可以使用 ranklocNone 分配给所有不是第一组的行:

# rank needs a numeric column
exploded_df['index'] = exploded_df.index
# using rank to create a filter on rows that are not the first of their group
filter = exploded_df\
    .groupby(['Promotion ID'])['index']\
    .rank('dense').apply(lambda x: x > 1)
# getting rid of the index column
exploded_df = exploded_df.drop('index', axis=1)
# and voila
exploded_df.loc[filter, ['Month', 'Promotion ID']] = None


  Promotion ID Month Products
0         None  None    PROD1
1        pid-1  June    PROD2
2         None  None    PROD1
3        pid-2  July    PROD2
4        pid-2  July   PROD3