如何在 python 中替换和插入新的子字符串?
How to replace and insert a new substring in python?
这是一个有效的代码,用以前修改过的另一个子字符串替换一个子字符串可能不是很有效的代码
输入字符串:
text = ["part1 Pirates (2006)",
"part2 Pirates (2006)"
]
输出字符串:
Pirates PT1 (2006)
Pirates PT2 (2006)
必须用 'PT' 替换 'part1' 'part2 等子字符串
并将其复制到标题和年份子字符串之间
代码:
#'''''''''''''''''''''''''
# are there parenthesis?
#
def parenth(stringa):
count = 0
for i in stringa:
if i == "(":
count += 1
elif i == ")":
count -= 1
if count < 0:
return False
return count == 0
#'''''''''''''''''''''''''
# extract 'year' from
# the string
#
def getYear(stringa):
if parenth(stringa) is True:
return stringa[stringa.find("(")+1:stringa.find(")")]
#Start
for title in text:
#Does the year exist ? try to Get it ---------> '2006'
yearStr = getYear(title)
#Get integer next to 'part' substring -------> '1'
intPartStr = re.findall(r'part(\d+)', title)
#Delete 'part' Substring --------------------> 'Pirates (2006)
partStr = re.sub(r'part(\d+)',"",title)
#Build a new string -------------------------> "PT1 (2006)"
newStr = "PT" + intPartStr[0] + " (" + yearStr + ")"
#Update title with new String newStr --------> "Pirates PT1 (2006)"
result = re.sub(r'\(([0-9]+)\)',newStr,partStr)
#End
print (result)
但是当列表是这样的时候
text = ["pt1 Pirates (2006)",
"part 2 Pirates (2006)"
]
我不知道如何提取 'part' 、 'pt' 或 'part 2' 等
旁边的整数
编辑:
我以为这个字符串是相同的,但事实并非如此,抱歉
如何解决?
"part 2 the day sports stood still (2021)"
\w+ 没有抓取所有的词
使用正则表达式:
例如:
import re
text = ["part1 Pirates (2006)", "part2 Pirates (2006)", "pt1 Pirates (2006)","part 2 Pirates (2006)" ]
ptrn = re.compile(r"(part|pt)\s*(\d+)")
for i in text:
m = ptrn.match(i)
if m:
# print(m.group(2)) # Integer part.
nstring = ptrn.sub(f"PT {m.group(2)}", i)
print(nstring)
您可以同时进行所有替换:
import re
text = [
"part1 Pirates (2006)",
"part2 Pirates (2006)",
"pt1 Pirates (2006)",
"part 2 Pirates (2006)",
"part 1 The day sports stood still (2021)"
]
pattern = r'(?:part|pt)\s?(\d+)\s?(\b[\w\s]+\b)\s?\((\d+)\)'
substitute = r' PT ()'
for title in text:
title = re.sub(pattern, substitute, title)
# if you want the result in a new array:
text_formatted = [re.sub(pattern, substitute, title) for title in text]
正则表达式解释:
(?:part|pt)\s?(\d+)
忽略文本并捕获值(组 1)
(\b[\w\s]+\b)
夺冠(第2组)
\((\d+)\)
在括号中捕获年份(组 3)
' PT ()'
用组号 重新创建你的字符串
你可以简单的用组来替换,比如我们提取组1中的片段,组2中的电影名,组3中的年份,例如:
import re
movies = [
"part1 Pirates (2006)",
"part2 Pirates (2006)",
"part 3 Pirates (2006)",
]
pattern = r"^([a-zA-Z]+\s?)(\d) (\w+) (\(\d{4}\))$"
replacement = r" PT "
replaced = [
re.sub(pattern, replacement, movie) for movie in movies
]
print(replaced)
>>> ['Pirates PT1 (2006)', 'Pirates PT2 (2006)', 'Pirates PT3 (2006)']
这是一个有效的代码,用以前修改过的另一个子字符串替换一个子字符串可能不是很有效的代码
输入字符串:
text = ["part1 Pirates (2006)",
"part2 Pirates (2006)"
]
输出字符串:
Pirates PT1 (2006)
Pirates PT2 (2006)
必须用 'PT' 替换 'part1' 'part2 等子字符串 并将其复制到标题和年份子字符串之间 代码:
#'''''''''''''''''''''''''
# are there parenthesis?
#
def parenth(stringa):
count = 0
for i in stringa:
if i == "(":
count += 1
elif i == ")":
count -= 1
if count < 0:
return False
return count == 0
#'''''''''''''''''''''''''
# extract 'year' from
# the string
#
def getYear(stringa):
if parenth(stringa) is True:
return stringa[stringa.find("(")+1:stringa.find(")")]
#Start
for title in text:
#Does the year exist ? try to Get it ---------> '2006'
yearStr = getYear(title)
#Get integer next to 'part' substring -------> '1'
intPartStr = re.findall(r'part(\d+)', title)
#Delete 'part' Substring --------------------> 'Pirates (2006)
partStr = re.sub(r'part(\d+)',"",title)
#Build a new string -------------------------> "PT1 (2006)"
newStr = "PT" + intPartStr[0] + " (" + yearStr + ")"
#Update title with new String newStr --------> "Pirates PT1 (2006)"
result = re.sub(r'\(([0-9]+)\)',newStr,partStr)
#End
print (result)
但是当列表是这样的时候
text = ["pt1 Pirates (2006)",
"part 2 Pirates (2006)"
]
我不知道如何提取 'part' 、 'pt' 或 'part 2' 等
旁边的整数编辑:
我以为这个字符串是相同的,但事实并非如此,抱歉
如何解决?
"part 2 the day sports stood still (2021)"
\w+ 没有抓取所有的词
使用正则表达式:
例如:
import re
text = ["part1 Pirates (2006)", "part2 Pirates (2006)", "pt1 Pirates (2006)","part 2 Pirates (2006)" ]
ptrn = re.compile(r"(part|pt)\s*(\d+)")
for i in text:
m = ptrn.match(i)
if m:
# print(m.group(2)) # Integer part.
nstring = ptrn.sub(f"PT {m.group(2)}", i)
print(nstring)
您可以同时进行所有替换:
import re
text = [
"part1 Pirates (2006)",
"part2 Pirates (2006)",
"pt1 Pirates (2006)",
"part 2 Pirates (2006)",
"part 1 The day sports stood still (2021)"
]
pattern = r'(?:part|pt)\s?(\d+)\s?(\b[\w\s]+\b)\s?\((\d+)\)'
substitute = r' PT ()'
for title in text:
title = re.sub(pattern, substitute, title)
# if you want the result in a new array:
text_formatted = [re.sub(pattern, substitute, title) for title in text]
正则表达式解释:
(?:part|pt)\s?(\d+)
忽略文本并捕获值(组 1)(\b[\w\s]+\b)
夺冠(第2组)\((\d+)\)
在括号中捕获年份(组 3)' PT ()'
用组号 重新创建你的字符串
你可以简单的用组来替换,比如我们提取组1中的片段,组2中的电影名,组3中的年份,例如:
import re
movies = [
"part1 Pirates (2006)",
"part2 Pirates (2006)",
"part 3 Pirates (2006)",
]
pattern = r"^([a-zA-Z]+\s?)(\d) (\w+) (\(\d{4}\))$"
replacement = r" PT "
replaced = [
re.sub(pattern, replacement, movie) for movie in movies
]
print(replaced)
>>> ['Pirates PT1 (2006)', 'Pirates PT2 (2006)', 'Pirates PT3 (2006)']