KeyBERT 包不适用于 Google Colab
KeyBERT package is not working on Google Colab
我在 Google Colab 上使用 KeyBERT 从文本中提取关键字。
from keybert import KeyBERT
model = KeyBERT('distilbert-base-nli-mean-tokens')
text_keywords = model.extract_keywords(my_long_text)
但是我收到以下错误:
OSError: 在模型名称列表中找不到模型名称 'distilbert-base-nli-mean-token'(distilbert-base-uncased、distilbert-base-uncased-distilled-squad)。我们假定 'distilbert-base-nli-mean-token' 是名为 config.json 的配置文件或包含此类文件的目录的路径或 url,但在此路径或 url 中找不到任何此类文件.
知道如何解决这个问题吗?
谢谢
Exception when trying to download http://sbert.net/models/distilbert-base-nli-mean-token.zip. Response 404
SentenceTransformer-Model http://sbert.net/models/distilbert-base-nli-mean-token.zip not found. Try to create it from scratch
Try to create Transformer Model distilbert-base-nli-mean-token with mean pooling
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/sentence_transformers/SentenceTransformer.py in __init__(self, model_name_or_path, modules, device)
78 zip_save_path = os.path.join(model_path_tmp, 'model.zip')
---> 79 http_get(model_url, zip_save_path)
80 with ZipFile(zip_save_path, 'r') as zip:
11 frames
/usr/local/lib/python3.7/dist-packages/sentence_transformers/util.py in http_get(url, path)
241 print("Exception when trying to download {}. Response {}".format(url, req.status_code), file=sys.stderr)
--> 242 req.raise_for_status()
243 return
/usr/local/lib/python3.7/dist-packages/requests/models.py in raise_for_status(self)
940 if http_error_msg:
--> 941 raise HTTPError(http_error_msg, response=self)
942
HTTPError: 404 Client Error: Not Found for url: https://public.ukp.informatik.tu-darmstadt.de/reimers/sentence-transformers/v0.2/distilbert-base-nli-mean-token.zip
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
133 that will be used by default in the :obj:`generate` method of the model. In order to get the tokens of the
--> 134 words that should not appear in the generated text, use :obj:`tokenizer.encode(bad_word,
135 add_prefix_space=True)`.
/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies)
181 except importlib_metadata.PackageNotFoundError:
--> 182 _timm_available = False
183
OSError: file distilbert-base-nli-mean-token not found
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
<ipython-input-59-d0fa7b6b7cd1> in <module>()
1 doc = full_text
----> 2 model = KeyBERT('distilbert-base-nli-mean-token')
/usr/local/lib/python3.7/dist-packages/keybert/model.py in __init__(self, model)
46 * https://www.sbert.net/docs/pretrained_models.html
47 """
---> 48 self.model = select_backend(model)
49
50 def extract_keywords(self,
/usr/local/lib/python3.7/dist-packages/keybert/backend/_utils.py in select_backend(embedding_model)
40 # Create a Sentence Transformer model based on a string
41 if isinstance(embedding_model, str):
---> 42 return SentenceTransformerBackend(embedding_model)
43
44 return SentenceTransformerBackend("xlm-r-bert-base-nli-stsb-mean-tokens")
/usr/local/lib/python3.7/dist-packages/keybert/backend/_sentencetransformers.py in __init__(self, embedding_model)
33 self.embedding_model = embedding_model
34 elif isinstance(embedding_model, str):
---> 35 self.embedding_model = SentenceTransformer(embedding_model)
36 else:
37 raise ValueError("Please select a correct SentenceTransformers model: \n"
/usr/local/lib/python3.7/dist-packages/sentence_transformers/SentenceTransformer.py in __init__(self, model_name_or_path, modules, device)
93 save_model_to = model_path
94 model_path = None
---> 95 transformer_model = Transformer(model_name_or_path)
96 pooling_model = Pooling(transformer_model.get_word_embedding_dimension())
97 modules = [transformer_model, pooling_model]
/usr/local/lib/python3.7/dist-packages/sentence_transformers/models/Transformer.py in __init__(self, model_name_or_path, max_seq_length, model_args, cache_dir, tokenizer_args, do_lower_case)
25 self.do_lower_case = do_lower_case
26
---> 27 config = AutoConfig.from_pretrained(model_name_or_path, **model_args, cache_dir=cache_dir)
28 self.auto_model = AutoModel.from_pretrained(model_name_or_path, config=config, cache_dir=cache_dir)
29 self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, cache_dir=cache_dir, **tokenizer_args)
/usr/local/lib/python3.7/dist-packages/transformers/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
144 after the :obj:`decoder_start_token_id`. Useful for multilingual models like :doc:`mBART
145 <../model_doc/mbart>` where the first generated token needs to be the target language token.
--> 146 - **forced_eos_token_id** (:obj:`int`, `optional`) -- The id of the token to force as the last generated token
147 when :obj:`max_length` is reached.
148 - **remove_invalid_values** (:obj:`bool`, `optional`) -- Whether to remove possible `nan` and `inf` outputs of
OSError: Model name 'distilbert-base-nli-mean-token' was not found in model name list (distilbert-base-uncased, distilbert-base-uncased-distilled-squad). We assumed 'distilbert-base-nli-mean-token' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.
我无法使用您提供的代码重现此问题,但根据提供的错误消息,我认为您只是在型号名称中缺少 's',因此只需确保型号名称如下:
distilbert-base-nli-mean-tokens
而不是
distilbert-base-nli-mean-token
另请参阅 this link 以了解所有可用的模型。
我在 Google Colab 上使用 KeyBERT 从文本中提取关键字。
from keybert import KeyBERT
model = KeyBERT('distilbert-base-nli-mean-tokens')
text_keywords = model.extract_keywords(my_long_text)
但是我收到以下错误:
OSError: 在模型名称列表中找不到模型名称 'distilbert-base-nli-mean-token'(distilbert-base-uncased、distilbert-base-uncased-distilled-squad)。我们假定 'distilbert-base-nli-mean-token' 是名为 config.json 的配置文件或包含此类文件的目录的路径或 url,但在此路径或 url 中找不到任何此类文件.
知道如何解决这个问题吗?
谢谢
Exception when trying to download http://sbert.net/models/distilbert-base-nli-mean-token.zip. Response 404
SentenceTransformer-Model http://sbert.net/models/distilbert-base-nli-mean-token.zip not found. Try to create it from scratch
Try to create Transformer Model distilbert-base-nli-mean-token with mean pooling
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/sentence_transformers/SentenceTransformer.py in __init__(self, model_name_or_path, modules, device)
78 zip_save_path = os.path.join(model_path_tmp, 'model.zip')
---> 79 http_get(model_url, zip_save_path)
80 with ZipFile(zip_save_path, 'r') as zip:
11 frames
/usr/local/lib/python3.7/dist-packages/sentence_transformers/util.py in http_get(url, path)
241 print("Exception when trying to download {}. Response {}".format(url, req.status_code), file=sys.stderr)
--> 242 req.raise_for_status()
243 return
/usr/local/lib/python3.7/dist-packages/requests/models.py in raise_for_status(self)
940 if http_error_msg:
--> 941 raise HTTPError(http_error_msg, response=self)
942
HTTPError: 404 Client Error: Not Found for url: https://public.ukp.informatik.tu-darmstadt.de/reimers/sentence-transformers/v0.2/distilbert-base-nli-mean-token.zip
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
133 that will be used by default in the :obj:`generate` method of the model. In order to get the tokens of the
--> 134 words that should not appear in the generated text, use :obj:`tokenizer.encode(bad_word,
135 add_prefix_space=True)`.
/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies)
181 except importlib_metadata.PackageNotFoundError:
--> 182 _timm_available = False
183
OSError: file distilbert-base-nli-mean-token not found
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
<ipython-input-59-d0fa7b6b7cd1> in <module>()
1 doc = full_text
----> 2 model = KeyBERT('distilbert-base-nli-mean-token')
/usr/local/lib/python3.7/dist-packages/keybert/model.py in __init__(self, model)
46 * https://www.sbert.net/docs/pretrained_models.html
47 """
---> 48 self.model = select_backend(model)
49
50 def extract_keywords(self,
/usr/local/lib/python3.7/dist-packages/keybert/backend/_utils.py in select_backend(embedding_model)
40 # Create a Sentence Transformer model based on a string
41 if isinstance(embedding_model, str):
---> 42 return SentenceTransformerBackend(embedding_model)
43
44 return SentenceTransformerBackend("xlm-r-bert-base-nli-stsb-mean-tokens")
/usr/local/lib/python3.7/dist-packages/keybert/backend/_sentencetransformers.py in __init__(self, embedding_model)
33 self.embedding_model = embedding_model
34 elif isinstance(embedding_model, str):
---> 35 self.embedding_model = SentenceTransformer(embedding_model)
36 else:
37 raise ValueError("Please select a correct SentenceTransformers model: \n"
/usr/local/lib/python3.7/dist-packages/sentence_transformers/SentenceTransformer.py in __init__(self, model_name_or_path, modules, device)
93 save_model_to = model_path
94 model_path = None
---> 95 transformer_model = Transformer(model_name_or_path)
96 pooling_model = Pooling(transformer_model.get_word_embedding_dimension())
97 modules = [transformer_model, pooling_model]
/usr/local/lib/python3.7/dist-packages/sentence_transformers/models/Transformer.py in __init__(self, model_name_or_path, max_seq_length, model_args, cache_dir, tokenizer_args, do_lower_case)
25 self.do_lower_case = do_lower_case
26
---> 27 config = AutoConfig.from_pretrained(model_name_or_path, **model_args, cache_dir=cache_dir)
28 self.auto_model = AutoModel.from_pretrained(model_name_or_path, config=config, cache_dir=cache_dir)
29 self.tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, cache_dir=cache_dir, **tokenizer_args)
/usr/local/lib/python3.7/dist-packages/transformers/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
144 after the :obj:`decoder_start_token_id`. Useful for multilingual models like :doc:`mBART
145 <../model_doc/mbart>` where the first generated token needs to be the target language token.
--> 146 - **forced_eos_token_id** (:obj:`int`, `optional`) -- The id of the token to force as the last generated token
147 when :obj:`max_length` is reached.
148 - **remove_invalid_values** (:obj:`bool`, `optional`) -- Whether to remove possible `nan` and `inf` outputs of
OSError: Model name 'distilbert-base-nli-mean-token' was not found in model name list (distilbert-base-uncased, distilbert-base-uncased-distilled-squad). We assumed 'distilbert-base-nli-mean-token' was a path or url to a configuration file named config.json or a directory containing such a file but couldn't find any such file at this path or url.
我无法使用您提供的代码重现此问题,但根据提供的错误消息,我认为您只是在型号名称中缺少 's',因此只需确保型号名称如下:
distilbert-base-nli-mean-tokens
而不是
distilbert-base-nli-mean-token
另请参阅 this link 以了解所有可用的模型。