从字符串中识别模式并更新数据框
identifying patterns from string and updating the dataframe
我有一个具有特定模式的列表,我想基于该格式创建和更新数据框。
以下是列表:
text = ['chocolate1','a;b;','c;d','icecream','e;f;','g;h', 'i;j', 'cookie', 'k;l', 'm;n']
如果仔细观察规律是:
我想提取每个巧克力名称并将其添加到巧克力编号列中。
最终数据框如下所示:
|chocolate#|chocolateName|
|chocolate1|a|
|chocolate1|b|
|chocolate1|c|
|chocolate1|d|
|icecream|e|
|icecream|f|
|icecream|g|
|icecream|h|
|icecream|i|
|icecream|j|
|cookie|k|
|cookie|l|
|cookie|m|
|cookie|n|
根据我掌握的数据,我正在尝试一些事情。似乎没有任何效果。
new_text = []
for line in text.splitlines():
if len(line.split())==0 or len(line.split())==1:
continue
else:
new_text.append(line)
for i in new_text[13:]:
if ';' not in i:
title_index = new_text.index(i)
print(title_index)
break
试试这个:
import pandas as pd
# Create a pandas dataframe from list
text = ['chocolate1','a;b;','c;d','icecream','e;f;','g;h', 'i;j', 'cookie', 'k;l', 'm;n']
s = pd.Series(text)
df = s.to_frame(name='letters')
# Create new column food where strings do not have ;
df['food'] = df.loc[~df['letters'].str.contains(';'), 'letters']
df['food'] = df['food'].ffill()
# remove rows that doesn't have ';' for letters
df = df[df['letters'].str.contains(';')].copy()
# Explode letters into rows of dataframe
df['letters'] = df['letters'].str.split(';')
df_out = df.explode('letters')
# Eliminate rows with blank for letters
df_out = df_out[df_out['letters'] != '']
print(df_out)
输出:
letters food
1 a chocolate1
1 b chocolate1
2 c chocolate1
2 d chocolate1
4 e icecream
4 f icecream
5 g icecream
5 h icecream
6 i icecream
6 j icecream
8 k cookie
8 l cookie
9 m cookie
9 n cookie
我有一个具有特定模式的列表,我想基于该格式创建和更新数据框。 以下是列表:
text = ['chocolate1','a;b;','c;d','icecream','e;f;','g;h', 'i;j', 'cookie', 'k;l', 'm;n']
如果仔细观察规律是:
我想提取每个巧克力名称并将其添加到巧克力编号列中。 最终数据框如下所示:
|chocolate#|chocolateName|
|chocolate1|a|
|chocolate1|b|
|chocolate1|c|
|chocolate1|d|
|icecream|e|
|icecream|f|
|icecream|g|
|icecream|h|
|icecream|i|
|icecream|j|
|cookie|k|
|cookie|l|
|cookie|m|
|cookie|n|
根据我掌握的数据,我正在尝试一些事情。似乎没有任何效果。
new_text = []
for line in text.splitlines():
if len(line.split())==0 or len(line.split())==1:
continue
else:
new_text.append(line)
for i in new_text[13:]:
if ';' not in i:
title_index = new_text.index(i)
print(title_index)
break
试试这个:
import pandas as pd
# Create a pandas dataframe from list
text = ['chocolate1','a;b;','c;d','icecream','e;f;','g;h', 'i;j', 'cookie', 'k;l', 'm;n']
s = pd.Series(text)
df = s.to_frame(name='letters')
# Create new column food where strings do not have ;
df['food'] = df.loc[~df['letters'].str.contains(';'), 'letters']
df['food'] = df['food'].ffill()
# remove rows that doesn't have ';' for letters
df = df[df['letters'].str.contains(';')].copy()
# Explode letters into rows of dataframe
df['letters'] = df['letters'].str.split(';')
df_out = df.explode('letters')
# Eliminate rows with blank for letters
df_out = df_out[df_out['letters'] != '']
print(df_out)
输出:
letters food
1 a chocolate1
1 b chocolate1
2 c chocolate1
2 d chocolate1
4 e icecream
4 f icecream
5 g icecream
5 h icecream
6 i icecream
6 j icecream
8 k cookie
8 l cookie
9 m cookie
9 n cookie