我如何在不改变 GCP 翻译 api 中使用 python 的发音的情况下翻译英文名称、地址文本

How can i transtale name, address text in english language without changing the pronunciation in GCP translate api using python

我正在尝试将人名和地址从印度语翻译成英语。我想保持发音完整。例如“सौरव”需要更改为“sourab”。 google translate using python 中是否有参数来执行此操作。有一些 html 参数,但是否有用于 python.
的参数 Set google translate don't translate name

     from google.cloud import translate_v2 as translate
        
def translate_text_with_model(target, text, model="nmt"):
        
        translate_client = translate.Client()
    
        if isinstance(text, six.binary_type):
            text = text.decode("utf-8")
    
        result = translate_client.translate(text, target_language=target, model=model)
    
        print(u"Translation: {}".format(result["translatedText"]))
        

translate_text_with_model("zh", "23 राज्सव", model="nmt")

苏拉夫。我能够重现这个问题,当运行使用你的代码时,结果是:

Translation: 23 Revenue

当预期输出是将“राज्सव”翻译为名词“Sourav”时。

在这种情况下,由于您正在处理专有名词或名称,我建议您创建一个词汇表,以便 API 可以使用词汇表的预设值而不是翻译这些词。您可以阅读词汇表及其实施的文档 here

这是我为获得预期输出而创建的代码和词汇表。

glossary.csv(此文件必须上传到 Google Cloud Storage 存储桶中)

hi,en,pos,description
राज्सव,Sourav,noun,Name of user

Note that each new row represents another word you would like to override with your glossary and you can also add more language columns.

create_glossary.py(YOUR_GLOSSARY_ID 是您将用于 运行 词汇表文本的任何标识符)

from google.cloud import translate_v3 as translate


def create_glossary(
    project_id="YOUR_PROJECT_ID",
    input_uri="YOUR_INPUT_URI", #gs://YOUR_BUCKET_NAME/glossary.csv
    glossary_id="YOUR_GLOSSARY_ID",
    timeout=180,
):
    """
    Create a equivalent term sets glossary. Glossary can be words or
    short phrases (usually fewer than five words).
    https://cloud.google.com/translate/docs/advanced/glossary#format-glossary
    """
    client = translate.TranslationServiceClient()

    # Supported language codes: https://cloud.google.com/translate/docs/languages
    source_lang_code = "hi"
    target_lang_code = "en"
    location = "us-central1"  # The location of the glossary

    name = client.glossary_path(project_id, location, glossary_id)
    language_codes_set = translate.types.Glossary.LanguageCodesSet(
        language_codes=[source_lang_code, target_lang_code]
    )

    gcs_source = translate.types.GcsSource(input_uri=input_uri)

    input_config = translate.types.GlossaryInputConfig(gcs_source=gcs_source)

    glossary = translate.types.Glossary(
        name=name, language_codes_set=language_codes_set, input_config=input_config
    )

    parent = f"projects/{project_id}/locations/{location}"
    # glossary is a custom dictionary Translation API uses
    # to translate the domain-specific terminology.
    operation = client.create_glossary(parent=parent, glossary=glossary)

    result = operation.result(timeout)
    print("Created: {}".format(result.name))
    print("Input Uri: {}".format(result.input_config.gcs_source.input_uri))
    
create_glossary()

最后,使用词汇表处理文本。您使用以下函数

from google.cloud import translate


def translate_text_with_glossary(
    text="23 राज्सव",
    project_id="YOUR_PROJECT_ID",
    glossary_id="YOUR_GLOSSARY_ID",
):
    """Translates a given text using a glossary."""

    client = translate.TranslationServiceClient()
    location = "us-central1"
    parent = f"projects/{project_id}/locations/{location}"

    glossary = client.glossary_path(
        project_id, "us-central1", glossary_id  # The location of the glossary
    )

    glossary_config = translate.TranslateTextGlossaryConfig(glossary=glossary)

    # Supported language codes: https://cloud.google.com/translate/docs/languages
    response = client.translate_text(
        request={
            "contents": [text],
            "target_language_code": "en",
            "source_language_code": "hi",
            "parent": parent,
            "glossary_config": glossary_config,
        }
    )

    print("Translated text: \n")
    for translation in response.glossary_translations:
        print("\t {}".format(translation.translated_text))
        
translate_text_with_glossary()

这输出:

Translated text: 
         23 Sourav

我至少找到了印度语的解决方案 有一个名为 indic-nlp-library 的包。根据发音进行翻译的过程称为Transliterator

官方linkhttp://anoopkunchukuttan.github.io/indic_nlp_library/

from indicnlp.transliterate.unicode_transliterate import ItransTransliterator
input_text='आज मौसम अच्छा है। इसलिए हम आज खेल सकते हैं!'

# Transliterate Hindi to Roman
print(ItransTransliterator.to_itrans(input_text, 'hn'))

输出 "aaja mausama achchaa hai.isalie hama aaja khela sakate hai "