google colab 中的 ScispaCy

ScispaCy in google colab

我正在尝试使用 ScispaCycolab 中构建 NER 临床数据模型。我已经安装了这样的包。

!pip install spacy
!pip install scispacy
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_md-0.2.4.tar.gz       #pip install <Model URL>```

然后我使用

导入了两个
import scispacy
import spacy
import en_core_sci_md

然后使用以下代码显示句子和实体

nlp = spacy.load("en_core_sci_md")
text ="""Myeloid derived suppressor cells (MDSC) are immature myeloid cells with immunosuppressive activity. They accumulate in tumor-bearing mice and humans with different types of cancer, including hepatocellular carcinoma (HCC)""" 
doc = nlp(text)
print(list(doc.sents))
print(doc.ents)

我收到以下错误

OSError: [E050] Can't find model 'en_core_sci_md'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

我不知道为什么会出现这个错误,我按照ScispaCy官方GitHub post的所有代码进行了操作。任何帮助,将不胜感激。 提前致谢。

我希望我还不算太晚...我相信你已经非常接近正确的方法了。

我会逐步写下我的答案,你可以选择在哪里停止。

步骤 1)

#Install en_core_sci_lg package from the website of spacy  (large corpus), but you can also use en_core_sci_md for the medium corpus.
       
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.4/en_core_sci_lg-0.2.4.tar.gz 

步骤 2)

# Import the large dataset
import en_core_sci_lg

步骤 3)

# Identify entities
nlp = en_core_sci_lg.load()
doc = nlp(text)
displacy_image = displacy.render(doc, jupyter = True, style = "ent")

步骤 4)

#Print only the entities
print(doc.ents)

步骤 5)

# Save the result 
save_res = [doc.ents]
save_res

步骤 6)

#Save the results to a dataframe
df_save_res = pd.DataFrame(save_res)
df_save_res

步骤 7)

# In case that you want to visualise the dependency parse
  displacy_image = displacy.render(doc, jupyter = True, style = "dep")