从文本中提取特定部分 - Python
Extract A Specific Part From Text - Python
我想提取以
开头的文本的一部分
"Hello" 并以 "goodbye"
结尾
示例:
从
中提取句子Hello i'm Gabi, :D goodbye
asdasd dwref ADSADSADA Hello i'm Gabi :D goodbye asd asl sodjasdji asdoija
您可以使用非常基本的正则表达式:
(有关其工作原理的演示和说明:https://regex101.com/r/bO0rL7/2)
import re
string = "asdasd dwref ADSADSADA Hello i'm Gabi :D goodbye asd asl sodjasdji asdoija"
match = re.findall(r'hello .+ goodbye', string, flags=re.IGNORECASE)
if match:
print(match[0])
>> "Hello i'm Gabi :D goodbye"
除非你想实现 NLP,并且不熟悉正则表达式,否则简单的方法如下:
import sys
s = "asdasd dwref ADSADSADA Hello i'm Gabi :D goodbye asd asl sodjasdji asdoija"
hello = s.find("Hello")
goodbye = s.find("goodbye")
if hello == -1 or goodbye == -1:
print("Not found")
sys.exit(0)
goodbye += len("goodbye")
print(s[hello:goodbye])
我想提取以
开头的文本的一部分"Hello" 并以 "goodbye"
结尾示例:
从
中提取句子Hello i'm Gabi, :D goodbye
asdasd dwref ADSADSADA Hello i'm Gabi :D goodbye asd asl sodjasdji asdoija
您可以使用非常基本的正则表达式:
(有关其工作原理的演示和说明:https://regex101.com/r/bO0rL7/2)
import re
string = "asdasd dwref ADSADSADA Hello i'm Gabi :D goodbye asd asl sodjasdji asdoija"
match = re.findall(r'hello .+ goodbye', string, flags=re.IGNORECASE)
if match:
print(match[0])
>> "Hello i'm Gabi :D goodbye"
除非你想实现 NLP,并且不熟悉正则表达式,否则简单的方法如下:
import sys
s = "asdasd dwref ADSADSADA Hello i'm Gabi :D goodbye asd asl sodjasdji asdoija"
hello = s.find("Hello")
goodbye = s.find("goodbye")
if hello == -1 or goodbye == -1:
print("Not found")
sys.exit(0)
goodbye += len("goodbye")
print(s[hello:goodbye])