如何从本书价格数据集数据中简单提取版本类型、月份和年份?

How extract Edition type ,Month and Year from this book price data set data in simple way?

import pandas as pd

df=pd.DataFrame({'Edition_TypeDate': 
                [''2016'','5 Oct 2017','2017','2 Aug 2009','Illustrated, Import','Import, 22 Feb 2018','Import, 14 Dec 2017','Import, 1 Mar 2018','Abridged, Audiobook, Box set',
'International Edition, 26 Apr 2012','Import, 2018','Box set, 15 Jun 2014','Unabridged, 6 Jul 2007']})

我的图书数据集中有其中一列。现在从这个专栏,我想要三个新专栏。

1.Edition_Type --> 包括 Import、Illustrated 或 null(如果未提及)

2.Edition_Month--->包括 Aug、Oct 或如果未提及则为空

3.Edition _Year--->包括 2016、2017、2018 或如果未提及则为 null

怎么做?帮我定义一个我可以应用到这个的函数。

您可以将 Series.str.extract 与带 | 的关键字一起用于正则表达式 or,多年来 (\d{4}$) 表示从字符串末尾获取 4 位数字:

df['Edition_Type'] = df['Edition_TypeDate'].str.extract(r'(Import|Illustrated)')
df['Edition_Month'] = df['Edition_TypeDate'].str.extract(r'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)')
df['Edition _Year'] = df['Edition_TypeDate'].str.extract(r'(\d{4}$)')
print (df)
                      Edition_TypeDate Edition_Type Edition_Month  \
0                                 2016          NaN           NaN   
1                           5 Oct 2017          NaN           Oct   
2                                 2017          NaN           NaN   
3                           2 Aug 2009          NaN           Aug   
4                  Illustrated, Import  Illustrated           NaN   
5                  Import, 22 Feb 2018       Import           Feb   
6                  Import, 14 Dec 2017       Import           Dec   
7                   Import, 1 Mar 2018       Import           Mar   
8         Abridged, Audiobook, Box set          NaN           NaN   
9   International Edition, 26 Apr 2012          NaN           Apr   
10                        Import, 2018       Import           NaN   
11                Box set, 15 Jun 2014          NaN           Jun   
12              Unabridged, 6 Jul 2007          NaN           Jul   

   Edition _Year  
0           2016  
1           2017  
2           2017  
3           2009  
4            NaN  
5           2018  
6           2017  
7           2018  
8            NaN  
9           2012  
10          2018  
11          2014