ValueError: too many values to unpack (expected 2) , when I try to extract only 2 substrings from a regex pattern
ValueError: too many values to unpack (expected 2) , when I try to extract only 2 substrings from a regex pattern
这是代码,但错误的部分是在验证正则表达式模式结构后提取子字符串的位置
def name_and_img_identificator(input_text, text):
input_text = re.sub(r"([^n\u0300-\u036f]|n(?!\u0303(?![\u0300-\u036f])))[\u0300-\u036f]+", r"", normalize("NFD", input_text), 0, re.I)
input_text = normalize( 'NFC', input_text) # -> NFC
input_text_to_check = input_text.lower() #Convierte a minuscula todo
#regex_patron_01 = r"\s*\¿?(?:dime los|dime las|dime unos|dime unas|dime|di|cuales son los|cuales son las|cuales son|cuales|que animes|que|top)\s*((?:\w+\s*)+)\s*(?:de series anime|de anime series|de animes|de anime|animes|anime)\s*(?:similares al|similares a|similar al|similar a|parecidos al|parecidos a|parecido al|parecido a)\s*(?:la serie de anime|series de anime|la serie anime|la serie|anime|)\s*(llamada|conocida como|cuyo nombre es|la cual se llama|)\s*((?:\w+\s*)+)\s*\??"
#Regex in english
regex_patron_01 = r "\ s * \ ¿? (?: tell me the | tell me some| tell me | say | which are the | which are the | which are | which | which animes | which | top) \ s * ((?: \ w + \ s *) +) \ s * (?: anime series | anime series | anime | anime | anime | anime) \ s * (?: similar to | similar to | similar to | similar to | similar to | similar to | similar to | similar to) \ s * (?: the anime series | anime series | the anime series | the series | anime |) \ s * (called | known like | whose name is | which is called |) \ s * ((?: \ w + \ s *) +) \ s * \ ?? "
m = re.search(regex_patron_01, input_text_to_check, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
if m:
num, anime_name = m.groups()[2]
num = num.strip()
anime_name = anime_name.strip()
print(num)
print(anime_name)
return text
input_text_str = input("ingrese: ")
text = ""
print(name_and_img_identificator(input_text_str, text))
它给了我这个错误,事实是我不知道如何构造这个正则表达式模式,以便它只从该输入中提取这 2 个值(子字符串)
Traceback (most recent call last):
File "serie_recommendarion_for_chatbot.py", line 154, in <module>
print(serie_and_img_identificator(input_text_str, text))
File "anime_recommendarion_for_chatbot.py", line 142, in name_and_img_identificator
num, anime_name = m.groups()
ValueError: too many values to unpack (expected 2)
如果我这样输入:
'Dame el top 8 de animes parecidos a Gundam'
'Give me the top 8 anime like Gundam'
我需要你提取:
num = '8'
anime_name = 'Gundam'
在这种情况下,我该如何修正我的正则表达式序列?
您可以尝试提取前 2 个值,也许您缺少一个冒号。
num, anime_name = m.groups()[:2]
可能是这种情况,因为您遇到了 too many values to unpack
错误。
号码和姓名使用两种不同的模式。为了简单起见,我只包含了几个例子。
(?<=(which are the|which|top)\s)[0-9]+(?=\s(anime series|anime))
为姓名Test cases
(?<=(like|called|which is called)\s)[A-Za-z]+
剩下的工作就是用西班牙语实现模式。
在 Regex 游乐场试试这个:Link
所以没有太大变化,第一组仍然是动漫数量的量词,第二组是动漫本身的名称。我只是稍微简化了正则表达式(为了演示目的去掉了一些不必要的位)。其中大部分与您的版本没有变化,它实际上是非常可靠的正则表达式。
正则表达式:\b(\d+).*(?:called|that are like|known like|whose name is|which is called)\s*((?:\w+\s*)+)\s*\??
测试你原来的问题 - 我粗略地翻译成英文:-)
import re
from unicodedata import normalize
def name_and_img_identificator(input_text, text):
input_text = re.sub(r"([^n\u0300-\u036f]|n(?!\u0303(?![\u0300-\u036f])))[\u0300-\u036f]+", r"",
normalize("NFD", input_text), 0, re.I)
input_text = normalize('NFC', input_text) # -> NFC
input_text_to_check = input_text.lower() # Convierte a minuscula todo
# Regex in english
# original
# note: you have extra spaces here, which regex might not like.
# you can get rid of spaces and then it should hopefully be fine.
# regex_patron_01 = r "\ s * \ ¿? (?: tell me the | tell me some| tell me | say | which are the | which are the | which are | which | which animes | which | top) \ s * ((?: \ w + \ s *) +) \ s * (?: anime series | anime series | anime | anime | anime | anime) \ s * (?: similar to | similar to | similar to | similar to | similar to | similar to | similar to | similar to) \ s * (?: the anime series | anime series | the anime series | the series | anime |) \ s * (called | known like | whose name is | which is called |) \ s * ((?: \ w + \ s *) +) \ s * \ ?? "
# simplified
regex_patron_01 = r'\b(\d+).*(?:called|that are like|known like|whose name is|which is called)\s*((?:\w+\s*)+)\s*\??'
m = re.search(regex_patron_01, input_text_to_check,
re.IGNORECASE) # Con esto valido la regex haber si entra o no en el bloque de code
if m:
num, anime_name = m.groups()[:2]
num = num.strip()
anime_name = anime_name.strip()
print(num)
print(anime_name)
return text
#input_text_str = input("ingrese: ")
input_text_str = 'Tell me the top 8 animes that are like Gundam?'
text = ""
print(name_and_img_identificator(input_text_str, text))
正则表达式模式错误
- 您忘记添加
?:
以不捕获该组。变化:
regex_patron_01 = r"...(llamada|conocida como|cuyo nombre es|la cual se llama|)..."
收件人:
regex_patron_01 = r"...(?:llamada|conocida como|cuyo nombre es|la cual se llama|)..."
- 为了不捕获额外的空格或单词,您对
num
的捕获应该是非贪婪的,这样它就不会捕获像 "de"
这样的单词并让后续模式匹配它。变化:
regex_patron_01 = r"...((?:\w+\s*)+)..."
收件人:
regex_patron_01 = r"...((?:\w+?\s*?)+)..."
-
.groups()
已经包含匹配的字符串,因此访问索引只会给你一个字符串,这是错误的根本原因。变化:
num, anime_name = m.groups()[2]
收件人:
num, anime_name = m.groups()
通过上面的这些更改,它将成功:
8
gundam
改进
您的正则表达式太复杂并且包含许多硬编码的词,这些词因语言而异。我的建议是对它可以接受的字符串格式设置一个标准:
Any text here (num) any text here (anime_name)
这已经是您输入的格式:
Dame el top 8 de animes parecidos a Gundam
因此您可以删除那个长的正则表达式并替换为这个,输出将是相同的:
regex_patron_01 = r"^.*?(\d+).*\s(.+)$"
请注意,这要求 (anime_name)
是一个单词。为了支持多词,我们必须设置一个特殊字符来标记动漫名称的开头,例如冒号 :
Dame el top 8 de animes parecidos a: Gundam X
那么正则表达式将是:
regex_patron_01 = r"^.*?(\d+).*:\s(.+)$"
输出
8
gundam x
这是代码,但错误的部分是在验证正则表达式模式结构后提取子字符串的位置
def name_and_img_identificator(input_text, text):
input_text = re.sub(r"([^n\u0300-\u036f]|n(?!\u0303(?![\u0300-\u036f])))[\u0300-\u036f]+", r"", normalize("NFD", input_text), 0, re.I)
input_text = normalize( 'NFC', input_text) # -> NFC
input_text_to_check = input_text.lower() #Convierte a minuscula todo
#regex_patron_01 = r"\s*\¿?(?:dime los|dime las|dime unos|dime unas|dime|di|cuales son los|cuales son las|cuales son|cuales|que animes|que|top)\s*((?:\w+\s*)+)\s*(?:de series anime|de anime series|de animes|de anime|animes|anime)\s*(?:similares al|similares a|similar al|similar a|parecidos al|parecidos a|parecido al|parecido a)\s*(?:la serie de anime|series de anime|la serie anime|la serie|anime|)\s*(llamada|conocida como|cuyo nombre es|la cual se llama|)\s*((?:\w+\s*)+)\s*\??"
#Regex in english
regex_patron_01 = r "\ s * \ ¿? (?: tell me the | tell me some| tell me | say | which are the | which are the | which are | which | which animes | which | top) \ s * ((?: \ w + \ s *) +) \ s * (?: anime series | anime series | anime | anime | anime | anime) \ s * (?: similar to | similar to | similar to | similar to | similar to | similar to | similar to | similar to) \ s * (?: the anime series | anime series | the anime series | the series | anime |) \ s * (called | known like | whose name is | which is called |) \ s * ((?: \ w + \ s *) +) \ s * \ ?? "
m = re.search(regex_patron_01, input_text_to_check, re.IGNORECASE) #Con esto valido la regex haber si entra o no en el bloque de code
if m:
num, anime_name = m.groups()[2]
num = num.strip()
anime_name = anime_name.strip()
print(num)
print(anime_name)
return text
input_text_str = input("ingrese: ")
text = ""
print(name_and_img_identificator(input_text_str, text))
它给了我这个错误,事实是我不知道如何构造这个正则表达式模式,以便它只从该输入中提取这 2 个值(子字符串)
Traceback (most recent call last):
File "serie_recommendarion_for_chatbot.py", line 154, in <module>
print(serie_and_img_identificator(input_text_str, text))
File "anime_recommendarion_for_chatbot.py", line 142, in name_and_img_identificator
num, anime_name = m.groups()
ValueError: too many values to unpack (expected 2)
如果我这样输入: 'Dame el top 8 de animes parecidos a Gundam' 'Give me the top 8 anime like Gundam'
我需要你提取:
num = '8'
anime_name = 'Gundam'
在这种情况下,我该如何修正我的正则表达式序列?
您可以尝试提取前 2 个值,也许您缺少一个冒号。
num, anime_name = m.groups()[:2]
可能是这种情况,因为您遇到了 too many values to unpack
错误。
号码和姓名使用两种不同的模式。为了简单起见,我只包含了几个例子。
(?<=(which are the|which|top)\s)[0-9]+(?=\s(anime series|anime))
为姓名Test cases
(?<=(like|called|which is called)\s)[A-Za-z]+
剩下的工作就是用西班牙语实现模式。
在 Regex 游乐场试试这个:Link
所以没有太大变化,第一组仍然是动漫数量的量词,第二组是动漫本身的名称。我只是稍微简化了正则表达式(为了演示目的去掉了一些不必要的位)。其中大部分与您的版本没有变化,它实际上是非常可靠的正则表达式。
正则表达式:\b(\d+).*(?:called|that are like|known like|whose name is|which is called)\s*((?:\w+\s*)+)\s*\??
测试你原来的问题 - 我粗略地翻译成英文:-)
import re
from unicodedata import normalize
def name_and_img_identificator(input_text, text):
input_text = re.sub(r"([^n\u0300-\u036f]|n(?!\u0303(?![\u0300-\u036f])))[\u0300-\u036f]+", r"",
normalize("NFD", input_text), 0, re.I)
input_text = normalize('NFC', input_text) # -> NFC
input_text_to_check = input_text.lower() # Convierte a minuscula todo
# Regex in english
# original
# note: you have extra spaces here, which regex might not like.
# you can get rid of spaces and then it should hopefully be fine.
# regex_patron_01 = r "\ s * \ ¿? (?: tell me the | tell me some| tell me | say | which are the | which are the | which are | which | which animes | which | top) \ s * ((?: \ w + \ s *) +) \ s * (?: anime series | anime series | anime | anime | anime | anime) \ s * (?: similar to | similar to | similar to | similar to | similar to | similar to | similar to | similar to) \ s * (?: the anime series | anime series | the anime series | the series | anime |) \ s * (called | known like | whose name is | which is called |) \ s * ((?: \ w + \ s *) +) \ s * \ ?? "
# simplified
regex_patron_01 = r'\b(\d+).*(?:called|that are like|known like|whose name is|which is called)\s*((?:\w+\s*)+)\s*\??'
m = re.search(regex_patron_01, input_text_to_check,
re.IGNORECASE) # Con esto valido la regex haber si entra o no en el bloque de code
if m:
num, anime_name = m.groups()[:2]
num = num.strip()
anime_name = anime_name.strip()
print(num)
print(anime_name)
return text
#input_text_str = input("ingrese: ")
input_text_str = 'Tell me the top 8 animes that are like Gundam?'
text = ""
print(name_and_img_identificator(input_text_str, text))
正则表达式模式错误
- 您忘记添加
?:
以不捕获该组。变化:
regex_patron_01 = r"...(llamada|conocida como|cuyo nombre es|la cual se llama|)..."
收件人:
regex_patron_01 = r"...(?:llamada|conocida como|cuyo nombre es|la cual se llama|)..."
- 为了不捕获额外的空格或单词,您对
num
的捕获应该是非贪婪的,这样它就不会捕获像"de"
这样的单词并让后续模式匹配它。变化:
regex_patron_01 = r"...((?:\w+\s*)+)..."
收件人:
regex_patron_01 = r"...((?:\w+?\s*?)+)..."
-
.groups()
已经包含匹配的字符串,因此访问索引只会给你一个字符串,这是错误的根本原因。变化:
num, anime_name = m.groups()[2]
收件人:
num, anime_name = m.groups()
通过上面的这些更改,它将成功:
8
gundam
改进
您的正则表达式太复杂并且包含许多硬编码的词,这些词因语言而异。我的建议是对它可以接受的字符串格式设置一个标准:
Any text here (num) any text here (anime_name)
这已经是您输入的格式:
Dame el top 8 de animes parecidos a Gundam
因此您可以删除那个长的正则表达式并替换为这个,输出将是相同的:
regex_patron_01 = r"^.*?(\d+).*\s(.+)$"
请注意,这要求 (anime_name)
是一个单词。为了支持多词,我们必须设置一个特殊字符来标记动漫名称的开头,例如冒号 :
Dame el top 8 de animes parecidos a: Gundam X
那么正则表达式将是:
regex_patron_01 = r"^.*?(\d+).*:\s(.+)$"
输出
8
gundam x