字符串中的正则表达式单词提取将其存储到 python 列表中
Regex words extraction within a string store it into a python list
我是正则表达式的新手,我想提取 python 字符串中的特定单词。这是字符串:
'1. feature name: occupation_Transport-moving<br>coefficient: 0.1776<br>2. feature name: education<br>coefficient: 0.0726<br>3. feature name: occupation_Machine-op-inspct<br>coefficient: 0.0661<br>4. feature name: occupation_Armed-Forces<br>coefficient: 0.0006<br>5. feature name: workclass_Without-pay<br>coefficient: -0.0194<br>6. feature name: occupation_Handlers-cleaners<br>coefficient: -0.1256<br>7. feature name: occupation_Farming-fishing<br>coefficient: -0.3938<br>8. feature name: GDP Group<br>coefficient: -0.4138<br>9. feature name: occupation_Other-service<br>coefficient: -0.4294<br>10. feature name: occupation_Priv-house-serv<br>coefficient: -0.6560<br>'
我要找的结果:
[occupation_Transport-moving,education,occupation_Machine-op-inspct,occupation_Armed-Forces,workclass_Without-pay,occupation_Handlers-cleaners,occupation_Farming-fishing,GDP Group,occupation_Other-service,occupation_Priv-house-serv]
我已经试过了,但它 return 从 :
开始的整个字符串:
re.findall(':\s(.*)<',txt)
提前感谢您的帮助。
使用
:\s*([^:.<]+)<
参见regex proof。
解释
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
[^:.<]+ any character except: ':', '.', '<' (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
< '<'
我是正则表达式的新手,我想提取 python 字符串中的特定单词。这是字符串:
'1. feature name: occupation_Transport-moving<br>coefficient: 0.1776<br>2. feature name: education<br>coefficient: 0.0726<br>3. feature name: occupation_Machine-op-inspct<br>coefficient: 0.0661<br>4. feature name: occupation_Armed-Forces<br>coefficient: 0.0006<br>5. feature name: workclass_Without-pay<br>coefficient: -0.0194<br>6. feature name: occupation_Handlers-cleaners<br>coefficient: -0.1256<br>7. feature name: occupation_Farming-fishing<br>coefficient: -0.3938<br>8. feature name: GDP Group<br>coefficient: -0.4138<br>9. feature name: occupation_Other-service<br>coefficient: -0.4294<br>10. feature name: occupation_Priv-house-serv<br>coefficient: -0.6560<br>'
我要找的结果:
[occupation_Transport-moving,education,occupation_Machine-op-inspct,occupation_Armed-Forces,workclass_Without-pay,occupation_Handlers-cleaners,occupation_Farming-fishing,GDP Group,occupation_Other-service,occupation_Priv-house-serv]
我已经试过了,但它 return 从 :
开始的整个字符串:
re.findall(':\s(.*)<',txt)
提前感谢您的帮助。
使用
:\s*([^:.<]+)<
参见regex proof。
解释
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
[^:.<]+ any character except: ':', '.', '<' (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
< '<'