Python - 一次循环 N 条记录,然后重新开始
Python - loop through N records at a time and then start again
我正在尝试编写一个调用 Google Translation API
的脚本,以便翻译具有 1000 行的 Excel 文件中的每一行。
我正在使用 pandas
从特定值加载和读取值,然后将数据框附加到列表,然后我使用 Google API
进行翻译:
import os
from google.cloud import translate_v2 as translate
import pandas as pd
from datetime import datetime
# Variable for GCP service account credentials
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r'path to credentials json'
# Path to the file
filepath = r'../file.xlsx'
# Instantiate the Google Translation API Client
translate_client = translate.Client()
# Read all the information from the Excel file within 'test' sheet name
df = pd.read_excel(filepath, sheet_name='test')
# Define an empty list
elements = []
# Loop the data frame and append the list
for i in df.index:
elements.append(df['EN'][i])
# Loop the list and translate each line
for item in elements:
output = translate_client.translate(
elements,
target_language='fr'
)
result = [
element['translatedText'] for element in output
]
print("The values corresponding to key : " + str(result))
在我追加到列表后,元素的总数将是 1000。Google Translation API
的问题是,如果您发送多个段,他们会调用它,它 returns 下面的错误:
400 POST https://translation.googleapis.com/language/translate/v2: Too many text segments
我已经调查过了,我发现发送 100 行(在我的例子中)是一个解决方案。现在我有点卡住了。
我将如何编写循环以一次迭代 100 行,翻译这 100 行,然后对结果执行某些操作,然后继续处理其他 100 行,依此类推,直到结束?
假设您能够将列表传递到单个翻译调用中,也许您可以这样做:
# Define a helper to step thru the list in chunks
def chunker(seq, size):
return (seq[pos : pos + size] for pos in range(0, len(seq), size))
# Then iterate and handle them accordignly
output = []
for chunk in chunker(elements, 100):
temp = translate_client.translate(
chunk,
target_language='fr'
)
output.extend(temp)
我正在尝试编写一个调用 Google Translation API
的脚本,以便翻译具有 1000 行的 Excel 文件中的每一行。
我正在使用 pandas
从特定值加载和读取值,然后将数据框附加到列表,然后我使用 Google API
进行翻译:
import os
from google.cloud import translate_v2 as translate
import pandas as pd
from datetime import datetime
# Variable for GCP service account credentials
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r'path to credentials json'
# Path to the file
filepath = r'../file.xlsx'
# Instantiate the Google Translation API Client
translate_client = translate.Client()
# Read all the information from the Excel file within 'test' sheet name
df = pd.read_excel(filepath, sheet_name='test')
# Define an empty list
elements = []
# Loop the data frame and append the list
for i in df.index:
elements.append(df['EN'][i])
# Loop the list and translate each line
for item in elements:
output = translate_client.translate(
elements,
target_language='fr'
)
result = [
element['translatedText'] for element in output
]
print("The values corresponding to key : " + str(result))
在我追加到列表后,元素的总数将是 1000。Google Translation API
的问题是,如果您发送多个段,他们会调用它,它 returns 下面的错误:
400 POST https://translation.googleapis.com/language/translate/v2: Too many text segments
我已经调查过了,我发现发送 100 行(在我的例子中)是一个解决方案。现在我有点卡住了。
我将如何编写循环以一次迭代 100 行,翻译这 100 行,然后对结果执行某些操作,然后继续处理其他 100 行,依此类推,直到结束?
假设您能够将列表传递到单个翻译调用中,也许您可以这样做:
# Define a helper to step thru the list in chunks
def chunker(seq, size):
return (seq[pos : pos + size] for pos in range(0, len(seq), size))
# Then iterate and handle them accordignly
output = []
for chunk in chunker(elements, 100):
temp = translate_client.translate(
chunk,
target_language='fr'
)
output.extend(temp)