使用 Python(正则表达式)将数据仅放在数据框中的方括号中

To have data only in square bracket in a data frame using a Python (regex)

我一直在使用一个数据框,其中数据记录在方括号中包含有用信息,在方括号外包含无用信息。

示例数据框:

 Record        Data
      1          Rohan is [age:10] with [height:130 cm].
      2          Girish is [age:12] with [height:140 cm].
      3          Both kids live in [location:Punjab] and [location:Delhi].
      4          They love to play [Sport:Cricket] and [Sport:Football].

预期输出:

 Record        Data
      1          [age:10],[height:130 cm]
      2          [age:12],[height:140 cm]
      3          [location:Punjab],[location:Delhi]
      4          [Sport:Cricket],[Sport:Football]

我一直在尝试这个,但无法获得所需的输出。

df['b'] = df['Record'].str.findall('([[][a-z \s]+[]])', expand=False).str.strip()
print(df['b'])

这似乎行不通。

我是 Python 的新手。

我相信你需要 strings findall with join:

df['b'] = df['Data'].str.findall('(\[.*?\])').str.join(', ')
print (df)

   Record                                               Data  \
0       1            Rohan is [age:10] with [height:130 cm].   
1       2           Girish is [age:12] with [height:140 cm].   
2       3   Both kids live in [location:Punjab] and [Delhi].   
3       4  They love to play [Sport:Cricket] and [Sport:F...   

                                   b  
0          [age:10], [height:130 cm]  
1          [age:12], [height:140 cm]  
2         [location:Punjab], [Delhi]  
3  [Sport:Cricket], [Sport:Football] 

如果需要 lists 中的值:

df['b'] = df['Data'].str.findall('\[(.*?)\]')
print (df)

   Record                                               Data  \
0       1            Rohan is [age:10] with [height:130 cm].   
1       2           Girish is [age:12] with [height:140 cm].   
2       3   Both kids live in [location:Punjab] and [Delhi].   
3       4  They love to play [Sport:Cricket] and [Sport:F...   

                                 b  
0          [age:10, height:130 cm]  
1          [age:12, height:140 cm]  
2         [location:Punjab, Delhi]  
3  [Sport:Cricket, Sport:Football]