从 Python 列表中删除 punctuation/symbols,句点、逗号除外

Removing punctuation/symbols from a list with Python except periods, commas

在 Python 中,我需要从列表中删除几乎所有标点符号,但保留句号和逗号。我应该创建一个函数来执行此操作还是创建一个变量?基本上我想删除除字母(我已经将大写字母转换为小写字母)和句点和逗号(可能还有撇号)之外的所有符号。

#Clean tokens up (remove symbols except ',' and '.')

def depunctuate()
   clean_tokens = []

   for i in lc_tokens:
       if (i not in [a-z.,])
       ...

您可以从 string.punctuation 构建一组不需要的标点符号 - 它提供包含标点符号的字符串,然后使用 列表理解 过滤掉包含在集合:

import string

to_delete = set(string.punctuation) - {'.', ','} # remove comma and fullstop
clean_tokens = [x for x in lc_tokens if x not in to_delete]
import string

# Create a set of all allowed characters.
# {...} is the syntax for a set literal in Python.
allowed = {",", "."}.union(string.ascii_lowercase)

# This is our starting string.
lc_tokens = 'hello, "world!"'

# Now we use list comprehension to only allow letters in our allowed set.
# The result of list comprehension is a list, so we use "".join(...) to
# turn it back into a string.
filtered = "".join([letter for letter in lc_tokens if letter in allowed])

# Our final result has everything but lowercase letters, commas, and
# periods removed.
assert filtered == "hello,world"