glob filenames | AttributeError: 'str' object has no attribute 'content'
glob filenames | AttributeError: 'str' object has no attribute 'content'
我是 运行 我自己的这个 Notebook 版本,其中 Apply DocumentClassifier 部分更改如下。
documents
中的对象 doc
是 str
dtype。我认为它不应该是。 doc
应该是什么数据类型?
Jupyter 实验室,内核:conda_mxnet_latest_p37
。
单元格:
import glob
docs_to_classify = glob.glob('full-set-of-gri-standards-2021-english/*.pdf')
with open('filt_gri.txt', 'r') as filehandle:
tags = [current_place.rstrip() for current_place in filehandle.readlines()]
doc_classifier = TransformersDocumentClassifier(model_name_or_path="cross-encoder/nli-distilroberta-base",
task="zero-shot-classification",
labels=tags,
batch_size=16)
classified_docs = doc_classifier.predict(docs_to_classify)
all_docs = convert_files_to_dicts(dir_path=doc_dir)
preprocessor_sliding_window = PreProcessor(split_overlap=3,
split_length=10,
split_respect_sentence_boundary=False,
split_by='passage')
输出:
INFO - haystack.modeling.utils - Using devices: CUDA
INFO - haystack.modeling.utils - Number of GPUs: 1
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-75f29230cd0e> in <module>
7 batch_size=16)
8
----> 9 classified_docs = doc_classifier.predict(docs_to_classify)
10
11 all_docs = convert_files_to_dicts(dir_path=doc_dir)
~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/haystack/nodes/document_classifier/transformers.py in predict(self, documents)
134 :return: List of Document enriched with meta information
135 """
--> 136 texts = [doc.content if self.classification_field is None else doc.meta[self.classification_field] for doc in documents]
137 batches = self.get_batches(texts, batch_size=self.batch_size)
138 if self.task == 'zero-shot-classification':
~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/haystack/nodes/document_classifier/transformers.py in <listcomp>(.0)
134 :return: List of Document enriched with meta information
135 """
--> 136 texts = [doc.content if self.classification_field is None else doc.meta[self.classification_field] for doc in documents]
137 batches = self.get_batches(texts, batch_size=self.batch_size)
138 if self.task == 'zero-shot-classification':
AttributeError: 'str' object has no attribute 'content'
请让我知道是否还有任何我应该添加到 post/ 澄清的内容。
我忘了在 classified_docs
:
之前包含行
# convert to Document using a fieldmap for custom content fields the classification should run on
docs_to_classify = [Document.from_dict(d) for d in docs_sliding_window]
我是 运行 我自己的这个 Notebook 版本,其中 Apply DocumentClassifier 部分更改如下。
documents
中的对象 doc
是 str
dtype。我认为它不应该是。 doc
应该是什么数据类型?
Jupyter 实验室,内核:conda_mxnet_latest_p37
。
单元格:
import glob
docs_to_classify = glob.glob('full-set-of-gri-standards-2021-english/*.pdf')
with open('filt_gri.txt', 'r') as filehandle:
tags = [current_place.rstrip() for current_place in filehandle.readlines()]
doc_classifier = TransformersDocumentClassifier(model_name_or_path="cross-encoder/nli-distilroberta-base",
task="zero-shot-classification",
labels=tags,
batch_size=16)
classified_docs = doc_classifier.predict(docs_to_classify)
all_docs = convert_files_to_dicts(dir_path=doc_dir)
preprocessor_sliding_window = PreProcessor(split_overlap=3,
split_length=10,
split_respect_sentence_boundary=False,
split_by='passage')
输出:
INFO - haystack.modeling.utils - Using devices: CUDA
INFO - haystack.modeling.utils - Number of GPUs: 1
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-10-75f29230cd0e> in <module>
7 batch_size=16)
8
----> 9 classified_docs = doc_classifier.predict(docs_to_classify)
10
11 all_docs = convert_files_to_dicts(dir_path=doc_dir)
~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/haystack/nodes/document_classifier/transformers.py in predict(self, documents)
134 :return: List of Document enriched with meta information
135 """
--> 136 texts = [doc.content if self.classification_field is None else doc.meta[self.classification_field] for doc in documents]
137 batches = self.get_batches(texts, batch_size=self.batch_size)
138 if self.task == 'zero-shot-classification':
~/anaconda3/envs/mxnet_latest_p37/lib/python3.7/site-packages/haystack/nodes/document_classifier/transformers.py in <listcomp>(.0)
134 :return: List of Document enriched with meta information
135 """
--> 136 texts = [doc.content if self.classification_field is None else doc.meta[self.classification_field] for doc in documents]
137 batches = self.get_batches(texts, batch_size=self.batch_size)
138 if self.task == 'zero-shot-classification':
AttributeError: 'str' object has no attribute 'content'
请让我知道是否还有任何我应该添加到 post/ 澄清的内容。
我忘了在 classified_docs
:
# convert to Document using a fieldmap for custom content fields the classification should run on
docs_to_classify = [Document.from_dict(d) for d in docs_sliding_window]