使用 Python(正则表达式)将数据仅放在数据框中的方括号中
To have data only in square bracket in a data frame using a Python (regex)
我一直在使用一个数据框,其中数据记录在方括号中包含有用信息,在方括号外包含无用信息。
示例数据框:
Record Data
1 Rohan is [age:10] with [height:130 cm].
2 Girish is [age:12] with [height:140 cm].
3 Both kids live in [location:Punjab] and [location:Delhi].
4 They love to play [Sport:Cricket] and [Sport:Football].
预期输出:
Record Data
1 [age:10],[height:130 cm]
2 [age:12],[height:140 cm]
3 [location:Punjab],[location:Delhi]
4 [Sport:Cricket],[Sport:Football]
我一直在尝试这个,但无法获得所需的输出。
df['b'] = df['Record'].str.findall('([[][a-z \s]+[]])', expand=False).str.strip()
print(df['b'])
这似乎行不通。
我是 Python 的新手。
我相信你需要 strings
findall
with join
:
df['b'] = df['Data'].str.findall('(\[.*?\])').str.join(', ')
print (df)
Record Data \
0 1 Rohan is [age:10] with [height:130 cm].
1 2 Girish is [age:12] with [height:140 cm].
2 3 Both kids live in [location:Punjab] and [Delhi].
3 4 They love to play [Sport:Cricket] and [Sport:F...
b
0 [age:10], [height:130 cm]
1 [age:12], [height:140 cm]
2 [location:Punjab], [Delhi]
3 [Sport:Cricket], [Sport:Football]
如果需要 lists
中的值:
df['b'] = df['Data'].str.findall('\[(.*?)\]')
print (df)
Record Data \
0 1 Rohan is [age:10] with [height:130 cm].
1 2 Girish is [age:12] with [height:140 cm].
2 3 Both kids live in [location:Punjab] and [Delhi].
3 4 They love to play [Sport:Cricket] and [Sport:F...
b
0 [age:10, height:130 cm]
1 [age:12, height:140 cm]
2 [location:Punjab, Delhi]
3 [Sport:Cricket, Sport:Football]
我一直在使用一个数据框,其中数据记录在方括号中包含有用信息,在方括号外包含无用信息。
示例数据框:
Record Data
1 Rohan is [age:10] with [height:130 cm].
2 Girish is [age:12] with [height:140 cm].
3 Both kids live in [location:Punjab] and [location:Delhi].
4 They love to play [Sport:Cricket] and [Sport:Football].
预期输出:
Record Data
1 [age:10],[height:130 cm]
2 [age:12],[height:140 cm]
3 [location:Punjab],[location:Delhi]
4 [Sport:Cricket],[Sport:Football]
我一直在尝试这个,但无法获得所需的输出。
df['b'] = df['Record'].str.findall('([[][a-z \s]+[]])', expand=False).str.strip()
print(df['b'])
这似乎行不通。
我是 Python 的新手。
我相信你需要 strings
findall
with join
:
df['b'] = df['Data'].str.findall('(\[.*?\])').str.join(', ')
print (df)
Record Data \
0 1 Rohan is [age:10] with [height:130 cm].
1 2 Girish is [age:12] with [height:140 cm].
2 3 Both kids live in [location:Punjab] and [Delhi].
3 4 They love to play [Sport:Cricket] and [Sport:F...
b
0 [age:10], [height:130 cm]
1 [age:12], [height:140 cm]
2 [location:Punjab], [Delhi]
3 [Sport:Cricket], [Sport:Football]
如果需要 lists
中的值:
df['b'] = df['Data'].str.findall('\[(.*?)\]')
print (df)
Record Data \
0 1 Rohan is [age:10] with [height:130 cm].
1 2 Girish is [age:12] with [height:140 cm].
2 3 Both kids live in [location:Punjab] and [Delhi].
3 4 They love to play [Sport:Cricket] and [Sport:F...
b
0 [age:10, height:130 cm]
1 [age:12, height:140 cm]
2 [location:Punjab, Delhi]
3 [Sport:Cricket, Sport:Football]