使用 Python 中的 re 模块和 sub() 函数从已解析的 URL 中删除数字和 _ 符号

Question

我试图从解析 URL.

的字符串中排除“数字”和符号“-”和“_”

例如，

string1 = 'historical-fiction_4'
string_cleaned = re.sub("[^a-z]", "", string1)
print(string1)
print(string_cleaned)

historical-fiction_4
historicalfiction

使用 re.sub("[^a-z]") 我只得到了从 a 到 z 的字符串，但我想得到的不是字符串“historicalfiction”，而是“Historical Fiction”。

我的所有数据或多或少都是用这种结构“name1-name2_number”收集的。

如果有人可以帮助我改进 re.sub() 调用，我将不胜感激。非常感谢！

Answer 1

在我看来，您的逻辑是要将破折号替换为空格，但要完全去除下划线和数字。如果是，则使用两个单独的调用来替换：

inp = "historical-fiction_4"
output = re.sub(r'[0-9_]+', '', inp.replace("-", " "))
print(output)  # historical fiction

Answer 2

您可以使用 str.title() 将每个单词大写:

import re

string1 = "historical-fiction_4"

string1 = re.sub(r"[^a-z]", " ", string1).strip().title()
print(string1)

打印：

Historical Fiction

使用 Python 中的 re 模块和 sub() 函数从已解析的 URL 中删除数字和 _ 符号

Removing numbers and _ symbol from a parsed URL using re module and sub() function in Python

python

regex

string

python-re