在 python 3.X 过滤对象
filter object at python 3.X
在python3.X中,我一直在写这些代码:
一个函数用于 "text_tokenizing",另一个函数用于 "remove extra characters"。
在 "remove_characters_after_tokenization" 函数中我使用了 "filter".
我的问题:当我 运行 我的项目时,我在控制台中看到了这一行:
<filter object at 0x00000277AA20DE48> <filter object at 0x00000277AA44D160> <filter object at 0x00000277AA44D470>
我该如何解决这个问题?
这是我的项目代码:
import nltk
import re
import string
from pprint import pprint
corpus = ["The brown fox wasn't that quick and he couldn't win the race",
"Hey that's a great deal! I just bought a phone for 9",
"@@You'll (learn) a **lot** in the book. Python is an amazing language !@@"]
# Declare a function for "Tokenizing Text"
def tokenize_text(text):
sentences = nltk.sent_tokenize(text)
word_tokens = [ nltk.word_tokenize(sentence) for sentence in sentences]
return word_tokens
# Declare a function for "Removing Special Characters"
def remove_characters_after_tokenization(tokens):
pattern = re.compile('[{}]'.format(re.escape(string.punctuation)))
filtered_tokens = list(filter(None, [pattern.sub('', token) for token in tokens]))
return filtered_tokens
token_list = [tokenize_text(text) for text in corpus]
pprint(token_list)
filtered_list_1 = list(filter(None,[remove_characters_after_tokenization(tokens)
for tokens in sentence_tokens])
for sentence_tokens in token_list)
print(type(filtered_list_1))
print(len(filtered_list_1))
print(filtered_list_1)
下一行为每个创建一个 filter
sentence_tokens 在 token_list:
filtered_list_1 = list(filter(None, [remove_characters_after_tokenization(tokens)
for tokens in sentence_tokens])
for sentence_tokens in token_list)
也许您想创建一个列表列表:
filtered_list_1 = list(filter(None, ([remove_characters_after_tokenization(tokens)
for tokens in sentence_tokens]
for sentence_tokens in token_list)))
在python3.X中,我一直在写这些代码:
一个函数用于 "text_tokenizing",另一个函数用于 "remove extra characters"。 在 "remove_characters_after_tokenization" 函数中我使用了 "filter".
我的问题:当我 运行 我的项目时,我在控制台中看到了这一行:
<filter object at 0x00000277AA20DE48> <filter object at 0x00000277AA44D160> <filter object at 0x00000277AA44D470>
我该如何解决这个问题?
这是我的项目代码:
import nltk
import re
import string
from pprint import pprint
corpus = ["The brown fox wasn't that quick and he couldn't win the race",
"Hey that's a great deal! I just bought a phone for 9",
"@@You'll (learn) a **lot** in the book. Python is an amazing language !@@"]
# Declare a function for "Tokenizing Text"
def tokenize_text(text):
sentences = nltk.sent_tokenize(text)
word_tokens = [ nltk.word_tokenize(sentence) for sentence in sentences]
return word_tokens
# Declare a function for "Removing Special Characters"
def remove_characters_after_tokenization(tokens):
pattern = re.compile('[{}]'.format(re.escape(string.punctuation)))
filtered_tokens = list(filter(None, [pattern.sub('', token) for token in tokens]))
return filtered_tokens
token_list = [tokenize_text(text) for text in corpus]
pprint(token_list)
filtered_list_1 = list(filter(None,[remove_characters_after_tokenization(tokens)
for tokens in sentence_tokens])
for sentence_tokens in token_list)
print(type(filtered_list_1))
print(len(filtered_list_1))
print(filtered_list_1)
下一行为每个创建一个 filter sentence_tokens 在 token_list:
filtered_list_1 = list(filter(None, [remove_characters_after_tokenization(tokens)
for tokens in sentence_tokens])
for sentence_tokens in token_list)
也许您想创建一个列表列表:
filtered_list_1 = list(filter(None, ([remove_characters_after_tokenization(tokens)
for tokens in sentence_tokens]
for sentence_tokens in token_list)))