从字符串中识别模式并更新数据框

Question

我有一个具有特定模式的列表，我想基于该格式创建和更新数据框。以下是列表：

text =  ['chocolate1','a;b;','c;d','icecream','e;f;','g;h', 'i;j', 'cookie', 'k;l', 'm;n']

如果仔细观察规律是：

我想提取每个巧克力名称并将其添加到巧克力编号列中。最终数据框如下所示：

|chocolate#|chocolateName|
|chocolate1|a|
|chocolate1|b|
|chocolate1|c|
|chocolate1|d|
|icecream|e|
|icecream|f|
|icecream|g|
|icecream|h|
|icecream|i|
|icecream|j|
|cookie|k|
|cookie|l|
|cookie|m|
|cookie|n|

根据我掌握的数据，我正在尝试一些事情。似乎没有任何效果。

new_text = []
for line in text.splitlines():
    if len(line.split())==0 or len(line.split())==1:
      continue
    else:
      new_text.append(line)
for i in new_text[13:]:
  if ';' not in i:
    title_index = new_text.index(i)
    print(title_index)
    break

Answer 1

试试这个：

import pandas as pd

# Create a pandas dataframe from list
text =  ['chocolate1','a;b;','c;d','icecream','e;f;','g;h', 'i;j', 'cookie', 'k;l', 'm;n']
s = pd.Series(text)
df = s.to_frame(name='letters')

# Create new column food where strings do not have ;
df['food'] = df.loc[~df['letters'].str.contains(';'), 'letters']
df['food'] = df['food'].ffill()

# remove rows that doesn't have ';' for letters
df = df[df['letters'].str.contains(';')].copy()

# Explode letters into rows of dataframe
df['letters'] = df['letters'].str.split(';')
df_out = df.explode('letters')

# Eliminate rows with blank for letters
df_out = df_out[df_out['letters'] != '']

print(df_out)

输出：

  letters        food
1       a  chocolate1
1       b  chocolate1
2       c  chocolate1
2       d  chocolate1
4       e    icecream
4       f    icecream
5       g    icecream
5       h    icecream
6       i    icecream
6       j    icecream
8       k      cookie
8       l      cookie
9       m      cookie
9       n      cookie

从字符串中识别模式并更新数据框

identifying patterns from string and updating the dataframe

python

text-processing

pandas