Split/explode 单元格根据 pandas 数据框中的条件分成多行

Question

输入数据帧的代码是

import pandas as pd
df = pd.DataFrame([{'Column1': '((CC ) + (A11/ABC/ZZ) + (!AAA))','Column2': 'XYZ + XXX/YYY'}])

输入数据帧：-

+---------------------------------+---------------------------------+
|              Column1            |              Column2            +
+---------------------------------+---------------------------------+
| ((CC ) + (A11/ABC/ZZ) + (!AAA)) |           XYZ + XXX/YYY         |
+---------------------------------+---------------------------------+

输入列表：-

list = [AAA,BBB,CCC]

条件：-

'+' should remain as such (similar to AND condition)
'/' means split the data into multiple cells (similar to OR condition)
'!' means replace with other elements in the corresponding list (similar to NOT condition)

因为！符号，该行变为

+------------------------------------+---------------------------------+
|              Column1               |              Column2            +
+------------------------------------+---------------------------------+
| ((CC ) + (A11/ABC/ZZ) + (BBB/CCC)) |           XYZ + XXX/YYY         |
+------------------------------------+---------------------------------+

请帮助我使用 Pandas

将单行拆分为多行，如下所示

+---------------------------------+---------------------------------+
|              Column1            |              Column2            +
+---------------------------------+---------------------------------+
|          CC + A11 + BBB         |             XYZ + XXX           |
+---------------------------------+---------------------------------+
|          CC + ABC + BBB         |             XYZ + XXX           |
+---------------------------------+---------------------------------+
|          CC + ZZ + BBB          |             XYZ + XXX           |
+---------------------------------+---------------------------------+
|          CC + A11 + CCC         |             XYZ + XXX           |
+---------------------------------+---------------------------------+
|          CC + ABC + CCC         |             XYZ + XXX           |
+---------------------------------+---------------------------------+
|          CC + ZZ + CCC          |             XYZ + XXX           |
+---------------------------------+---------------------------------+
|          CC + A11 + BBB         |             XYZ + YYY           |
+---------------------------------+---------------------------------+
|          CC + ABC + BBB         |             XYZ + YYY           |
+---------------------------------+---------------------------------+
|          CC + ZZ + BBB          |             XYZ + YYY           |
+---------------------------------+---------------------------------+
|          CC + A11 + CCC         |             XYZ + YYY           |
+---------------------------------+---------------------------------+
|          CC + ABC + CCC         |             XYZ + YYY           |
+---------------------------------+---------------------------------+
|          CC + ZZ + CCC          |             XYZ + YYY           |
+---------------------------------+---------------------------------+

Answer 1

看看这是否符合您的要求。评论解释了它是如何工作的。

#!/usr/bin/env python
import pandas as pd # tested with pd.__version__ 0.19.2
df = pd.DataFrame([{'Column1': '((CC ) + (A11/ABC/ZZ) + (!AAA))',
                    'Column2': 'XYZ + XXX/YYY'}])   # your input dataframe
list = ['AAA', 'BBB', 'CCC']                        # your input list
to_replace = dict()
for item in list:   # prepare the dictionary for the '!' replacements
    to_replace["!"+item+'\b'] = '/'.join([i for i in list if i != item])
df = df.replace(to_replace, regex=True) # do all the '!' replacements
import re
def expanded(s):    # expand series s to multiple string list around '/'
    l = s.str.replace('[()]', '').tolist()
    while True:     # in each loop cycle, handle one A/B/C... expression
        xl = []     # expanded list for this cycle
        for s in l: # for each string in the list so far
            m = re.search(r'\w+(/\w+)+', s) # look for a A/B/C... expression
            if m:   # if there is, add the individual expansions to the list
                xl.extend([m.string[:m.start()]+i+m.string[m.end():]
                                            for i in m.group().split('/')])
            else:   # if not, we're done
                return l
        l = xl      # expanded list for this cycle is now the current list
def expand(c):      # expands the column named c to multiple rows
    new = expanded(df[c])                       # get the new contents
    xdf = pd.concat(len(new)/len(df[c])*[df])   # create required rows
    xdf[c] = sorted(new)                        # set the new contents
    return xdf                                  # return new dataframe
df = expand('Column1')
df = expand('Column2')
print df

输出：

           Column1    Column2
0  CC  + A11 + BBB  XYZ + XXX
0  CC  + A11 + CCC  XYZ + XXX
0  CC  + ABC + BBB  XYZ + XXX
0  CC  + ABC + CCC  XYZ + XXX
0   CC  + ZZ + BBB  XYZ + XXX
0   CC  + ZZ + CCC  XYZ + XXX
0  CC  + A11 + BBB  XYZ + YYY
0  CC  + A11 + CCC  XYZ + YYY
0  CC  + ABC + BBB  XYZ + YYY
0  CC  + ABC + CCC  XYZ + YYY
0   CC  + ZZ + BBB  XYZ + YYY
0   CC  + ZZ + CCC  XYZ + YYY

Split/explode 单元格根据 pandas 数据框中的条件分成多行

Split/explode cells into multiple rows based on conditions in pandas dataframe

python

split

explode

dataframe

pandas