将 spaCy 模型训练为 Vertex AI 管道 "Component"
Training spaCy model as a Vertex AI Pipeline "Component"
我正在尝试 train a spaCy model , but turning the code into a Vertex AI Pipeline Component。我当前的代码是:
@component(
packages_to_install=[
"setuptools",
"wheel",
"spacy[cuda113,transformers,lookups]",
],
base_image="gcr.io/deeplearning-platform-release/base-cu113",
output_component_file="train.yaml"
)
def train(train_name: str, dev_name: str) -> NamedTuple("output", [("model_path", str)]):
"""
Trains a spacy model
Parameters:
----------
train_name : Name of the spaCy "train" set, used for model training.
dev_name: Name of the spaCy "dev" set, , used for model training.
Returns:
-------
output : Destination path of the saved model.
"""
import spacy
import subprocess
spacy.require_gpu() # <=== IMAGE FAILS TO BE COMPILED HERE
# NOTE: The remaining code has already been tested and proven to be functional.
# It has been edited since the project is private.
# Presets for training
subprocess.run(["python", "-m", "spacy", "init", "fill-config", "gcs/secret_path_to_config/base_config.cfg", "config.cfg"])
# Training model
location = "gcs/secret_model_destination_path/TestModel"
subprocess.run(["python", "-m", "spacy", "train", "config.cfg",
"--output", location,
"--paths.train", "gcs/secret_bucket/secret_path/{}.spacy".format(train_name),
"--paths.dev", "gcs/secret_bucket/secret_path/{}.spacy".format(dev_name),
"--gpu-id", "0"])
return (location,)
Vertex AI 日志显示以下是失败的主要原因:
库已成功安装,但我觉得缺少一些库/设置(据我所知 experience);但是我不知道如何让它“兼容Python-based Vertex AI Components”。顺便说一句,GPU 的使用在我的代码中是 强制性的。
有什么想法吗?
删除失败的行。 IE。 spacy.require_gpu() # <=== IMAGE FAILS TO BE COMPILED HERE
同时调整以删除 cuda 安装行 cuda113,
您的代码设置为使用 GPU,但对于学习练习,您不需要 GPU。我不知道,您也不知道如何指定启用 GPU 的 python 顶点 AI gcp 实例。因此删除了对 GPU 的要求。获得代码 运行 后,您可以返回并调整以添加 GPU。
好的,首先确保您已经在 google 云环境 上安装了 CUDA11.3 工具包,然后执行此操作使用以下命令:
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64/ /"
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda-11-2
# optional
python -m spacy download en_core_web_trf
安装其他 pip 包和依赖项pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
指向正确的 cuda 文件夹export CUDA_PATH="/usr/local/cuda-11"
安装 spacy 转换器信息 pip install -U spacy[cuda113,transformers]
这里还有更多 info: pip install cupy-cuda113
现在如果库和数据包安装正确,运行这个:
>>> import spacy
>>> spacy.require_gpu()
经过一些排练,我想我已经弄清楚我的代码遗漏了什么。实际上,train
组件定义是正确的(相对于最初发布的内容进行了一些小调整);但是 管道缺少 GPU 定义。我将首先包含一个虚拟示例代码,它使用 spaCy 训练 NER 模型,并通过 Vertex AI 管道编排一切:
from kfp.v2 import compiler
from kfp.v2.dsl import pipeline, component, Dataset, Input, Output, OutputPath, InputPath
from datetime import datetime
from google.cloud import aiplatform
from typing import NamedTuple
# Component definition
@component(
packages_to_install=[
"setuptools",
"wheel",
"spacy[cuda113,transformers,lookups]",
],
base_image="gcr.io/deeplearning-platform-release/base-cu113",
output_component_file="generate.yaml"
)
def generate_spacy_file(train_path: OutputPath(), dev_path: OutputPath()):
"""
Generates a small, dummy 'train.spacy' & 'dev.spacy' file
Returns:
-------
train_path : Relative location in GCS, for the "train.spacy" file.
dev_path: Relative location in GCS, for the "dev.spacy" file.
"""
import spacy
from spacy.training import Example
from spacy.tokens import DocBin
td = [ # Train (dummy) dataset, in 'spacy V2 presentation'
("Walmart is a leading e-commerce company", {"entities": [(0, 7, "ORG")]}),
("I reached Chennai yesterday.", {"entities": [(19, 28, "GPE")]}),
("I recently ordered a book from Amazon", {"entities": [(24,32, "ORG")]}),
("I was driving a BMW", {"entities": [(16,19, "PRODUCT")]}),
("I ordered this from ShopClues", {"entities": [(20,29, "ORG")]}),
("Fridge can be ordered in Amazon ", {"entities": [(0,6, "PRODUCT")]}),
("I bought a new Washer", {"entities": [(16,22, "PRODUCT")]}),
("I bought a old table", {"entities": [(16,21, "PRODUCT")]}),
("I bought a fancy dress", {"entities": [(18,23, "PRODUCT")]}),
("I rented a camera", {"entities": [(12,18, "PRODUCT")]}),
("I rented a tent for our trip", {"entities": [(12,16, "PRODUCT")]}),
("I rented a screwdriver from our neighbour", {"entities": [(12,22, "PRODUCT")]}),
("I repaired my computer", {"entities": [(15,23, "PRODUCT")]}),
("I got my clock fixed", {"entities": [(16,21, "PRODUCT")]}),
("I got my truck fixed", {"entities": [(16,21, "PRODUCT")]}),
]
dd = [ # Development (dummy) dataset (CV), in 'spacy V2 presentation'
("Flipkart started it's journey from zero", {"entities": [(0,8, "ORG")]}),
("I recently ordered from Max", {"entities": [(24,27, "ORG")]}),
("Flipkart is recognized as leader in market",{"entities": [(0,8, "ORG")]}),
("I recently ordered from Swiggy", {"entities": [(24,29, "ORG")]})
]
# Converting Train & Development datasets, from 'spaCy V2' to 'spaCy V3'
nlp = spacy.blank("en")
db_train = DocBin()
db_dev = DocBin()
for text, annotations in td:
example = Example.from_dict(nlp.make_doc(text), annotations)
db_train.add(example.reference)
for text, annotations in dd:
example = Example.from_dict(nlp.make_doc(text), annotations)
db_dev.add(example.reference)
db_train.to_disk(train_path + ".spacy") # <== Obtaining and storing "train.spacy"
db_dev.to_disk(dev_path + ".spacy") # <== Obtaining and storing "dev.spacy"
# ----------------------- ORIGINALLY POSTED CODE -----------------------
@component(
packages_to_install=[
"setuptools",
"wheel",
"spacy[cuda113,transformers,lookups]",
],
base_image="gcr.io/deeplearning-platform-release/base-cu113",
output_component_file="train.yaml"
)
def train(train_path: InputPath(), dev_path: InputPath(), output_path: OutputPath()):
"""
Trains a spacy model
Parameters:
----------
train_path : Relative location in GCS, for the "train.spacy" file.
dev_path: Relative location in GCS, for the "dev.spacy" file.
Returns:
-------
output : Destination path of the saved model.
"""
import spacy
import subprocess
spacy.require_gpu() # <=== IMAGE NOW MANAGES TO GET BUILT!
# Presets for training
subprocess.run(["python", "-m", "spacy", "init", "fill-config", "gcs/secret_path_to_config/base_config.cfg", "config.cfg"])
# Training model
subprocess.run(["python", "-m", "spacy", "train", "config.cfg",
"--output", output_path,
"--paths.train", "{}.spacy".format(train_path),
"--paths.dev", "{}.spacy".format(dev_path),
"--gpu-id", "0"])
# ----------------------------------------------------------------------
# Pipeline definition
@pipeline(
pipeline_root=PIPELINE_ROOT,
name="spacy-dummy-pipeline",
)
def spacy_pipeline():
"""
Builds a custom pipeline
"""
# Generating dummy "train.spacy" + "dev.spacy"
train_dev_sets = generate_spacy_file()
# With the output of the previous component, train a spaCy modeL
model = train(
train_dev_sets.outputs["train_path"],
train_dev_sets.outputs["dev_path"]
# ------ !!! THIS SECTION DOES THE TRICK !!! ------
).add_node_selector_constraint(
label_name="cloud.google.com/gke-accelerator",
value="NVIDIA_TESLA_T4"
).set_gpu_limit(1).set_memory_limit('32G')
# -------------------------------------------------
# Pipeline compilation
compiler.Compiler().compile(
pipeline_func=spacy_pipeline, package_path="pipeline_spacy_job.json"
)
# Pipeline run
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
run = aiplatform.PipelineJob( # Include your own naming here
display_name="spacy-dummy-pipeline",
template_path="pipeline_spacy_job.json",
job_id="ml-pipeline-spacydummy-small-{0}".format(TIMESTAMP),
parameter_values={},
enable_caching=True,
)
# Pipeline gets submitted
run.submit()
现在,解释;根据 Google:
By default, the component will run on as a Vertex AI CustomJob using an e2-standard-4 machine, with 4 core CPUs and 16GB memory.
因此,当 train
组件被编译时,它失败了,因为“它没有看到任何可用的 GPU 作为资源”;然而,在同一个 link 中,提到了 CPU 和 GPU 的所有可用设置。如您所见,在我的例子中,我在一 (1) 个 NVIDIA_TESLA_T4
GPU 卡下将 train
组件设置为 运行,并且我还将 CPU 内存增加到 32GB。通过这些修改,生成的管道如下所示:
如您所见,它编译成功,并且训练(并最终获得)一个功能性的 spaCy 模型。从这里,您可以调整此代码以满足您自己的需要。
我希望这对可能感兴趣的任何人有所帮助。
谢谢。
我正在尝试 train a spaCy model , but turning the code into a Vertex AI Pipeline Component。我当前的代码是:
@component(
packages_to_install=[
"setuptools",
"wheel",
"spacy[cuda113,transformers,lookups]",
],
base_image="gcr.io/deeplearning-platform-release/base-cu113",
output_component_file="train.yaml"
)
def train(train_name: str, dev_name: str) -> NamedTuple("output", [("model_path", str)]):
"""
Trains a spacy model
Parameters:
----------
train_name : Name of the spaCy "train" set, used for model training.
dev_name: Name of the spaCy "dev" set, , used for model training.
Returns:
-------
output : Destination path of the saved model.
"""
import spacy
import subprocess
spacy.require_gpu() # <=== IMAGE FAILS TO BE COMPILED HERE
# NOTE: The remaining code has already been tested and proven to be functional.
# It has been edited since the project is private.
# Presets for training
subprocess.run(["python", "-m", "spacy", "init", "fill-config", "gcs/secret_path_to_config/base_config.cfg", "config.cfg"])
# Training model
location = "gcs/secret_model_destination_path/TestModel"
subprocess.run(["python", "-m", "spacy", "train", "config.cfg",
"--output", location,
"--paths.train", "gcs/secret_bucket/secret_path/{}.spacy".format(train_name),
"--paths.dev", "gcs/secret_bucket/secret_path/{}.spacy".format(dev_name),
"--gpu-id", "0"])
return (location,)
Vertex AI 日志显示以下是失败的主要原因:
库已成功安装,但我觉得缺少一些库/设置(据我所知 experience);但是我不知道如何让它“兼容Python-based Vertex AI Components”。顺便说一句,GPU 的使用在我的代码中是 强制性的。
有什么想法吗?
删除失败的行。 IE。 spacy.require_gpu() # <=== IMAGE FAILS TO BE COMPILED HERE
同时调整以删除 cuda 安装行 cuda113,
您的代码设置为使用 GPU,但对于学习练习,您不需要 GPU。我不知道,您也不知道如何指定启用 GPU 的 python 顶点 AI gcp 实例。因此删除了对 GPU 的要求。获得代码 运行 后,您可以返回并调整以添加 GPU。
好的,首先确保您已经在 google 云环境 上安装了 CUDA11.3 工具包,然后执行此操作使用以下命令:
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/debian10/x86_64/ /"
sudo add-apt-repository contrib
sudo apt-get update
sudo apt-get -y install cuda-11-2
# optional
python -m spacy download en_core_web_trf
安装其他 pip 包和依赖项pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
指向正确的 cuda 文件夹export CUDA_PATH="/usr/local/cuda-11"
安装 spacy 转换器信息 pip install -U spacy[cuda113,transformers]
这里还有更多 info: pip install cupy-cuda113
现在如果库和数据包安装正确,运行这个:
>>> import spacy
>>> spacy.require_gpu()
经过一些排练,我想我已经弄清楚我的代码遗漏了什么。实际上,train
组件定义是正确的(相对于最初发布的内容进行了一些小调整);但是 管道缺少 GPU 定义。我将首先包含一个虚拟示例代码,它使用 spaCy 训练 NER 模型,并通过 Vertex AI 管道编排一切:
from kfp.v2 import compiler
from kfp.v2.dsl import pipeline, component, Dataset, Input, Output, OutputPath, InputPath
from datetime import datetime
from google.cloud import aiplatform
from typing import NamedTuple
# Component definition
@component(
packages_to_install=[
"setuptools",
"wheel",
"spacy[cuda113,transformers,lookups]",
],
base_image="gcr.io/deeplearning-platform-release/base-cu113",
output_component_file="generate.yaml"
)
def generate_spacy_file(train_path: OutputPath(), dev_path: OutputPath()):
"""
Generates a small, dummy 'train.spacy' & 'dev.spacy' file
Returns:
-------
train_path : Relative location in GCS, for the "train.spacy" file.
dev_path: Relative location in GCS, for the "dev.spacy" file.
"""
import spacy
from spacy.training import Example
from spacy.tokens import DocBin
td = [ # Train (dummy) dataset, in 'spacy V2 presentation'
("Walmart is a leading e-commerce company", {"entities": [(0, 7, "ORG")]}),
("I reached Chennai yesterday.", {"entities": [(19, 28, "GPE")]}),
("I recently ordered a book from Amazon", {"entities": [(24,32, "ORG")]}),
("I was driving a BMW", {"entities": [(16,19, "PRODUCT")]}),
("I ordered this from ShopClues", {"entities": [(20,29, "ORG")]}),
("Fridge can be ordered in Amazon ", {"entities": [(0,6, "PRODUCT")]}),
("I bought a new Washer", {"entities": [(16,22, "PRODUCT")]}),
("I bought a old table", {"entities": [(16,21, "PRODUCT")]}),
("I bought a fancy dress", {"entities": [(18,23, "PRODUCT")]}),
("I rented a camera", {"entities": [(12,18, "PRODUCT")]}),
("I rented a tent for our trip", {"entities": [(12,16, "PRODUCT")]}),
("I rented a screwdriver from our neighbour", {"entities": [(12,22, "PRODUCT")]}),
("I repaired my computer", {"entities": [(15,23, "PRODUCT")]}),
("I got my clock fixed", {"entities": [(16,21, "PRODUCT")]}),
("I got my truck fixed", {"entities": [(16,21, "PRODUCT")]}),
]
dd = [ # Development (dummy) dataset (CV), in 'spacy V2 presentation'
("Flipkart started it's journey from zero", {"entities": [(0,8, "ORG")]}),
("I recently ordered from Max", {"entities": [(24,27, "ORG")]}),
("Flipkart is recognized as leader in market",{"entities": [(0,8, "ORG")]}),
("I recently ordered from Swiggy", {"entities": [(24,29, "ORG")]})
]
# Converting Train & Development datasets, from 'spaCy V2' to 'spaCy V3'
nlp = spacy.blank("en")
db_train = DocBin()
db_dev = DocBin()
for text, annotations in td:
example = Example.from_dict(nlp.make_doc(text), annotations)
db_train.add(example.reference)
for text, annotations in dd:
example = Example.from_dict(nlp.make_doc(text), annotations)
db_dev.add(example.reference)
db_train.to_disk(train_path + ".spacy") # <== Obtaining and storing "train.spacy"
db_dev.to_disk(dev_path + ".spacy") # <== Obtaining and storing "dev.spacy"
# ----------------------- ORIGINALLY POSTED CODE -----------------------
@component(
packages_to_install=[
"setuptools",
"wheel",
"spacy[cuda113,transformers,lookups]",
],
base_image="gcr.io/deeplearning-platform-release/base-cu113",
output_component_file="train.yaml"
)
def train(train_path: InputPath(), dev_path: InputPath(), output_path: OutputPath()):
"""
Trains a spacy model
Parameters:
----------
train_path : Relative location in GCS, for the "train.spacy" file.
dev_path: Relative location in GCS, for the "dev.spacy" file.
Returns:
-------
output : Destination path of the saved model.
"""
import spacy
import subprocess
spacy.require_gpu() # <=== IMAGE NOW MANAGES TO GET BUILT!
# Presets for training
subprocess.run(["python", "-m", "spacy", "init", "fill-config", "gcs/secret_path_to_config/base_config.cfg", "config.cfg"])
# Training model
subprocess.run(["python", "-m", "spacy", "train", "config.cfg",
"--output", output_path,
"--paths.train", "{}.spacy".format(train_path),
"--paths.dev", "{}.spacy".format(dev_path),
"--gpu-id", "0"])
# ----------------------------------------------------------------------
# Pipeline definition
@pipeline(
pipeline_root=PIPELINE_ROOT,
name="spacy-dummy-pipeline",
)
def spacy_pipeline():
"""
Builds a custom pipeline
"""
# Generating dummy "train.spacy" + "dev.spacy"
train_dev_sets = generate_spacy_file()
# With the output of the previous component, train a spaCy modeL
model = train(
train_dev_sets.outputs["train_path"],
train_dev_sets.outputs["dev_path"]
# ------ !!! THIS SECTION DOES THE TRICK !!! ------
).add_node_selector_constraint(
label_name="cloud.google.com/gke-accelerator",
value="NVIDIA_TESLA_T4"
).set_gpu_limit(1).set_memory_limit('32G')
# -------------------------------------------------
# Pipeline compilation
compiler.Compiler().compile(
pipeline_func=spacy_pipeline, package_path="pipeline_spacy_job.json"
)
# Pipeline run
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
run = aiplatform.PipelineJob( # Include your own naming here
display_name="spacy-dummy-pipeline",
template_path="pipeline_spacy_job.json",
job_id="ml-pipeline-spacydummy-small-{0}".format(TIMESTAMP),
parameter_values={},
enable_caching=True,
)
# Pipeline gets submitted
run.submit()
现在,解释;根据 Google:
By default, the component will run on as a Vertex AI CustomJob using an e2-standard-4 machine, with 4 core CPUs and 16GB memory.
因此,当 train
组件被编译时,它失败了,因为“它没有看到任何可用的 GPU 作为资源”;然而,在同一个 link 中,提到了 CPU 和 GPU 的所有可用设置。如您所见,在我的例子中,我在一 (1) 个 NVIDIA_TESLA_T4
GPU 卡下将 train
组件设置为 运行,并且我还将 CPU 内存增加到 32GB。通过这些修改,生成的管道如下所示:
如您所见,它编译成功,并且训练(并最终获得)一个功能性的 spaCy 模型。从这里,您可以调整此代码以满足您自己的需要。
我希望这对可能感兴趣的任何人有所帮助。
谢谢。