如何创建 TextBlob 列表？

Question

这是我在 Whosebug 上的第一个 post，所以请原谅我可能犯的任何失礼行为。我也是 Python 的新手，所以欢迎任何提示。我的问题很简单，但无论我尝试了什么，我似乎都无法弄清楚。这是我的代码：

import os
from bs4 import BeautifulSoup
import string
import nltk
from nltk import WordNetLemmatizer
from nltk.corpus import stopwords
from nltk import FreqDist

# For TF-IDF calculations
import math
from textblob import TextBlob as tb

def tf(word, blob):
    return blob.words.count(word) / len(blob.words)

def n_containing(word, bloblist):
    return sum(1 for blob in bloblist if word in blob.words)

def idf(word, bloblist):
    return math.log(len(bloblist) / (1 + n_containing(word, bloblist)))

def tfidf(word, blob, bloblist):
    return tf(word, blob) * idf(word, bloblist)

rootDir ='D:\rootDir'
testPath = r'D:\testPath'
trainPath = r'D:\trainPath'

data = []
lemmatizer = WordNetLemmatizer()
stop = stopwords.words("english")
stop += ['also']

bloblist_train = [tb('')]  #Before EDIT: bloblist_train = tb('')
bloblist_test = [tb('')]   #Before EDIT: bloblist_test = tb('')

for currentDirPath, subDirs, files in os.walk(rootDir):
    for file in files:
        with open(os.path.join(currentDirPath, file)) as dataFile:
            inFile = dataFile.read()
            html = BeautifulSoup(inFile, "html.parser")
            text = html.get_text()
            text_no_punc = text.translate(str.maketrans("", "", string.punctuation))
            if testPath in currentDirPath:
                bloblist_test += (tb(text_no_punc))
            elif trainPath in currentDirPath:
                bloblist_train += tb(text_no_punc)
            words = text_no_punc.split()
            data = data + words

我正在使用 HTML 文档遍历更大的文件目录并解析它们，然后进一步尝试为每个单词找到 TF-IDF。为此，我混合使用了包和类，包括 BeautifulSoup、NLTK 和 TextBlob。我正在使用 TextBlob 查找 TF-IDF，但运行遇到了创建 TextBlob 列表的问题。我遇到问题的具体线路是：

if testPath in currentDirPath:
    bloblist_test += tb(text_no_punc)
elif trainPath in currentDirPath:
    bloblist_train += tb(text_no_punc)

代码目前只创建一个巨大的 TextBlob，所有文档都连接成一个 TextBlob。我想要每个文档的 TextBlob。我也尝试过以下方法

if testPath in currentDirPath:
    bloblist_test.append(tb(text_no_punc))
elif trainPath in currentDirPath:
    bloblist_train.append(tb(text_no_punc))

给出错误：

AttributeError: 'TextBlob' object has no attribute 'append'

我错过了什么？ Append 是我用来创建列表 python 字符串的方法，如下所示：

s1 = [1,2,3]
s2 = [4,5]
s1.append(s2)
# Output: [[1,2,3], [4,5]]

但是 TextBlob 显然不支持这个。

那么我该如何创建这些 Textblob 的列表？

编辑：

所以我自己取得了一些进展，但在格式化列表时仍然遇到问题。我没有将 bloblist_train 和 bloblist_test 初始化为 tb('')，而是将它们设置为 [tb('')] 因为就像我的问题所说的那样，它们应该包含 TextBlob 的 LIST，而不仅仅是TextBlob。所以现在看起来......它有效！只有一件事我似乎仍然无法正确处理：现在的方式是创建一个列表，其中一个空的 TextBlob 作为第一项（例如 [TextBlob(""), TextBlob("one two three")]）。

我意识到这是一个与我开始时略有不同的问题，所以如果有人认为我需要关闭这个问题并开始一个单独的问题，请告诉我。再说一次，我是新人。

如果没有，我觉得我缺少一个简单的关键字或句法解决方案，非常感谢您提供一些意见。

Answer 1

最终我自己找到了答案。我知道答案似乎微不足道，事实证明确实如此，但事实上它与 Textblob 一起工作，我认为它会改变答案的性质。好吧，它没有。我对经验丰富的 python 不假思索地通过了我的问题的人感到失望，因为我只需要做的就是：

bloblist_train = []
bloblist_test = []

for currentDirPath, subDirs, files in os.walk(rootDir):
    for file in files:
        with open(os.path.join(currentDirPath, file)) as dataFile:
        # .
        # .
        # .
            if testPath in currentDirPath:
                bloblist_test += [tb(text_no_punc]
            elif trainPath in currentDirPath:
                bloblist_train += [tb(text_no_punc)]

我在其他语言（如 C++）的领域中感到困惑，我必须在其中将我的变量初始化为它们的预期类型。相反，多亏了 python，我只需要声明一个列表，然后向其中添加内容。对于像我这样的其他初学者，请记住，python 不关心列表中的内容。它可以是您放入其中的任何对象的混合：int、char、string、Textblob 等。只需告诉它您有一个列表，然后添加即可。

如何创建 TextBlob 列表？

How to create a list of TextBlobs?

python

syntax

list

python-3.x

textblob