我如何在不改变 GCP 翻译 api 中使用 python 的发音的情况下翻译英文名称、地址文本
How can i transtale name, address text in english language without changing the pronunciation in GCP translate api using python
我正在尝试将人名和地址从印度语翻译成英语。我想保持发音完整。例如“सौरव”需要更改为“sourab”。 google translate using python 中是否有参数来执行此操作。有一些 html 参数,但是否有用于 python.
的参数
Set google translate don't translate name
from google.cloud import translate_v2 as translate
def translate_text_with_model(target, text, model="nmt"):
translate_client = translate.Client()
if isinstance(text, six.binary_type):
text = text.decode("utf-8")
result = translate_client.translate(text, target_language=target, model=model)
print(u"Translation: {}".format(result["translatedText"]))
translate_text_with_model("zh", "23 राज्सव", model="nmt")
苏拉夫。我能够重现这个问题,当运行使用你的代码时,结果是:
Translation: 23 Revenue
当预期输出是将“राज्सव”翻译为名词“Sourav”时。
在这种情况下,由于您正在处理专有名词或名称,我建议您创建一个词汇表,以便 API 可以使用词汇表的预设值而不是翻译这些词。您可以阅读词汇表及其实施的文档 here
这是我为获得预期输出而创建的代码和词汇表。
glossary.csv
(此文件必须上传到 Google Cloud Storage 存储桶中)
hi,en,pos,description
राज्सव,Sourav,noun,Name of user
create_glossary.py
(YOUR_GLOSSARY_ID 是您将用于 运行 词汇表文本的任何标识符)
from google.cloud import translate_v3 as translate
def create_glossary(
project_id="YOUR_PROJECT_ID",
input_uri="YOUR_INPUT_URI", #gs://YOUR_BUCKET_NAME/glossary.csv
glossary_id="YOUR_GLOSSARY_ID",
timeout=180,
):
"""
Create a equivalent term sets glossary. Glossary can be words or
short phrases (usually fewer than five words).
https://cloud.google.com/translate/docs/advanced/glossary#format-glossary
"""
client = translate.TranslationServiceClient()
# Supported language codes: https://cloud.google.com/translate/docs/languages
source_lang_code = "hi"
target_lang_code = "en"
location = "us-central1" # The location of the glossary
name = client.glossary_path(project_id, location, glossary_id)
language_codes_set = translate.types.Glossary.LanguageCodesSet(
language_codes=[source_lang_code, target_lang_code]
)
gcs_source = translate.types.GcsSource(input_uri=input_uri)
input_config = translate.types.GlossaryInputConfig(gcs_source=gcs_source)
glossary = translate.types.Glossary(
name=name, language_codes_set=language_codes_set, input_config=input_config
)
parent = f"projects/{project_id}/locations/{location}"
# glossary is a custom dictionary Translation API uses
# to translate the domain-specific terminology.
operation = client.create_glossary(parent=parent, glossary=glossary)
result = operation.result(timeout)
print("Created: {}".format(result.name))
print("Input Uri: {}".format(result.input_config.gcs_source.input_uri))
create_glossary()
最后,使用词汇表处理文本。您使用以下函数
from google.cloud import translate
def translate_text_with_glossary(
text="23 राज्सव",
project_id="YOUR_PROJECT_ID",
glossary_id="YOUR_GLOSSARY_ID",
):
"""Translates a given text using a glossary."""
client = translate.TranslationServiceClient()
location = "us-central1"
parent = f"projects/{project_id}/locations/{location}"
glossary = client.glossary_path(
project_id, "us-central1", glossary_id # The location of the glossary
)
glossary_config = translate.TranslateTextGlossaryConfig(glossary=glossary)
# Supported language codes: https://cloud.google.com/translate/docs/languages
response = client.translate_text(
request={
"contents": [text],
"target_language_code": "en",
"source_language_code": "hi",
"parent": parent,
"glossary_config": glossary_config,
}
)
print("Translated text: \n")
for translation in response.glossary_translations:
print("\t {}".format(translation.translated_text))
translate_text_with_glossary()
这输出:
Translated text:
23 Sourav
我至少找到了印度语的解决方案
有一个名为 indic-nlp-library 的包。根据发音进行翻译的过程称为Transliterator
官方linkhttp://anoopkunchukuttan.github.io/indic_nlp_library/
from indicnlp.transliterate.unicode_transliterate import ItransTransliterator
input_text='आज मौसम अच्छा है। इसलिए हम आज खेल सकते हैं!'
# Transliterate Hindi to Roman
print(ItransTransliterator.to_itrans(input_text, 'hn'))
输出
"aaja mausama achchaa hai.isalie hama aaja khela sakate hai "
我正在尝试将人名和地址从印度语翻译成英语。我想保持发音完整。例如“सौरव”需要更改为“sourab”。 google translate using python 中是否有参数来执行此操作。有一些 html 参数,但是否有用于 python.
的参数
Set google translate don't translate name
from google.cloud import translate_v2 as translate
def translate_text_with_model(target, text, model="nmt"):
translate_client = translate.Client()
if isinstance(text, six.binary_type):
text = text.decode("utf-8")
result = translate_client.translate(text, target_language=target, model=model)
print(u"Translation: {}".format(result["translatedText"]))
translate_text_with_model("zh", "23 राज्सव", model="nmt")
苏拉夫。我能够重现这个问题,当运行使用你的代码时,结果是:
Translation: 23 Revenue
当预期输出是将“राज्सव”翻译为名词“Sourav”时。
在这种情况下,由于您正在处理专有名词或名称,我建议您创建一个词汇表,以便 API 可以使用词汇表的预设值而不是翻译这些词。您可以阅读词汇表及其实施的文档 here
这是我为获得预期输出而创建的代码和词汇表。
glossary.csv
(此文件必须上传到 Google Cloud Storage 存储桶中)
hi,en,pos,description
राज्सव,Sourav,noun,Name of user
create_glossary.py
(YOUR_GLOSSARY_ID 是您将用于 运行 词汇表文本的任何标识符)
from google.cloud import translate_v3 as translate
def create_glossary(
project_id="YOUR_PROJECT_ID",
input_uri="YOUR_INPUT_URI", #gs://YOUR_BUCKET_NAME/glossary.csv
glossary_id="YOUR_GLOSSARY_ID",
timeout=180,
):
"""
Create a equivalent term sets glossary. Glossary can be words or
short phrases (usually fewer than five words).
https://cloud.google.com/translate/docs/advanced/glossary#format-glossary
"""
client = translate.TranslationServiceClient()
# Supported language codes: https://cloud.google.com/translate/docs/languages
source_lang_code = "hi"
target_lang_code = "en"
location = "us-central1" # The location of the glossary
name = client.glossary_path(project_id, location, glossary_id)
language_codes_set = translate.types.Glossary.LanguageCodesSet(
language_codes=[source_lang_code, target_lang_code]
)
gcs_source = translate.types.GcsSource(input_uri=input_uri)
input_config = translate.types.GlossaryInputConfig(gcs_source=gcs_source)
glossary = translate.types.Glossary(
name=name, language_codes_set=language_codes_set, input_config=input_config
)
parent = f"projects/{project_id}/locations/{location}"
# glossary is a custom dictionary Translation API uses
# to translate the domain-specific terminology.
operation = client.create_glossary(parent=parent, glossary=glossary)
result = operation.result(timeout)
print("Created: {}".format(result.name))
print("Input Uri: {}".format(result.input_config.gcs_source.input_uri))
create_glossary()
最后,使用词汇表处理文本。您使用以下函数
from google.cloud import translate
def translate_text_with_glossary(
text="23 राज्सव",
project_id="YOUR_PROJECT_ID",
glossary_id="YOUR_GLOSSARY_ID",
):
"""Translates a given text using a glossary."""
client = translate.TranslationServiceClient()
location = "us-central1"
parent = f"projects/{project_id}/locations/{location}"
glossary = client.glossary_path(
project_id, "us-central1", glossary_id # The location of the glossary
)
glossary_config = translate.TranslateTextGlossaryConfig(glossary=glossary)
# Supported language codes: https://cloud.google.com/translate/docs/languages
response = client.translate_text(
request={
"contents": [text],
"target_language_code": "en",
"source_language_code": "hi",
"parent": parent,
"glossary_config": glossary_config,
}
)
print("Translated text: \n")
for translation in response.glossary_translations:
print("\t {}".format(translation.translated_text))
translate_text_with_glossary()
这输出:
Translated text:
23 Sourav
我至少找到了印度语的解决方案 有一个名为 indic-nlp-library 的包。根据发音进行翻译的过程称为Transliterator
官方linkhttp://anoopkunchukuttan.github.io/indic_nlp_library/
from indicnlp.transliterate.unicode_transliterate import ItransTransliterator
input_text='आज मौसम अच्छा है। इसलिए हम आज खेल सकते हैं!'
# Transliterate Hindi to Roman
print(ItransTransliterator.to_itrans(input_text, 'hn'))
输出 "aaja mausama achchaa hai.isalie hama aaja khela sakate hai "