Python - 一次循环 N 条记录,然后重新开始

Python - loop through N records at a time and then start again

我正在尝试编写一个调用 Google Translation API 的脚本,以便翻译具有 1000 行的 Excel 文件中的每一行。

我正在使用 pandas 从特定值加载和读取值,然后将数据框附加到列表,然后我使用 Google API 进行翻译:

import os
from google.cloud import translate_v2 as translate
import pandas as pd
from datetime import datetime

# Variable for GCP service account credentials

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r'path to credentials json'

# Path to the file

filepath = r'../file.xlsx'

# Instantiate the Google Translation API Client

translate_client = translate.Client()

# Read all the information from the Excel file within 'test' sheet name

df = pd.read_excel(filepath, sheet_name='test')

# Define an empty list

elements = []

# Loop the data frame and append the list


for i in df.index:
    elements.append(df['EN'][i])

# Loop the list and translate each line
for item in elements:
    output = translate_client.translate(
        elements,
        target_language='fr'
    )


result = [
    element['translatedText'] for element in output
]

print("The values corresponding to key : " + str(result))

在我追加到列表后,元素的总数将是 1000。Google Translation API 的问题是,如果您发送多个段,他们会调用它,它 returns 下面的错误:

400 POST https://translation.googleapis.com/language/translate/v2: Too many text segments

我已经调查过了,我发现发送 100 行(在我的例子中)是一个解决方案。现在我有点卡住了。

我将如何编写循环以一次迭代 100 行,翻译这 100 行,然后对结果执行某些操作,然后继续处理其他 100 行,依此类推,直到结束?

假设您能够将列表传递到单个翻译调用中,也许您可​​以这样做:

# Define a helper to step thru the list in chunks
def chunker(seq, size):
    return (seq[pos : pos + size] for pos in range(0, len(seq), size))

# Then iterate and handle them accordignly
output = []
for chunk in chunker(elements, 100):
    temp = translate_client.translate(
        chunk,
        target_language='fr'
    )
    output.extend(temp)