如何通过 .NET BigQuery API 重用 TCP 连接?
How to reuse TCP connections with .NET BigQuery API?
我正在使用 .NET 将数据流式传输到 BQ API。我在 Process Explorer 中注意到,新的 TCP/IP 连接被一遍又一遍地创建和结束。我想知道是否可以重用连接并避免连接创建和结束的大量开销?
public async Task InsertAsync(BaseBigQueryTable table, IList<IDictionary<string, object>> rowList, GetBqInsertIdFunction getInsert,CancellationToken ct)
{
if (rowList.Count == 0)
{
return;
}
string tableId = table.TableId;
IList<TableDataInsertAllRequest.RowsData> requestRows = rowList.Select(row => new TableDataInsertAllRequest.RowsData {Json = row,InsertId = getInsert(row)}).ToList();
TableDataInsertAllRequest request = new TableDataInsertAllRequest { Rows = requestRows };
bool needCreateTable = false;
BigqueryService bqService = null;
try
{
bqService = GetBigQueryService();
TableDataInsertAllResponse response =
await
bqService.Tabledata.InsertAll(request, _account.ProjectId, table.DataSetId, tableId)
.ExecuteAsync(ct);
IList<TableDataInsertAllResponse.InsertErrorsData> insertErrors = response.InsertErrors;
if (insertErrors != null && insertErrors.Count > 0)
{
//handling errors, removed for easier reading..
}
}catch{
//... removed for easier reading
}
finally
{
if (bqService != null)
bqService.Dispose();
}
}
private BigqueryService GetBigQueryService()
{
return new BigqueryService(new BaseClientService.Initializer
{
HttpClientInitializer = _credential,
ApplicationName = _applicationName,
});
}
** 跟进 **
下面给出的答案似乎是减少 http 连接的唯一解决方案。但是,我发现在大量实时数据流上使用批处理请求可能会有一些限制。请参阅我的另一个问题:Google API BatchRequest: An established connection was aborted by the software in your host machine
下面 link 介绍了如何批量 API 调用一起减少客户端必须建立的 HTTP 连接数
https://cloud.google.com/bigquery/batch
批量请求发出后,可以得到response,解析出所有涉及的jobid。作为替代方案,您可以在批处理请求中为每个内部请求预设 jobid。注意:您需要确保这些 jobid 是唯一的
之后,您可以通过 jobs.get https://cloud.google.com/bigquery/docs/reference/v2/jobs/get
检查每个作业的进展情况
我正在使用 .NET 将数据流式传输到 BQ API。我在 Process Explorer 中注意到,新的 TCP/IP 连接被一遍又一遍地创建和结束。我想知道是否可以重用连接并避免连接创建和结束的大量开销?
public async Task InsertAsync(BaseBigQueryTable table, IList<IDictionary<string, object>> rowList, GetBqInsertIdFunction getInsert,CancellationToken ct)
{
if (rowList.Count == 0)
{
return;
}
string tableId = table.TableId;
IList<TableDataInsertAllRequest.RowsData> requestRows = rowList.Select(row => new TableDataInsertAllRequest.RowsData {Json = row,InsertId = getInsert(row)}).ToList();
TableDataInsertAllRequest request = new TableDataInsertAllRequest { Rows = requestRows };
bool needCreateTable = false;
BigqueryService bqService = null;
try
{
bqService = GetBigQueryService();
TableDataInsertAllResponse response =
await
bqService.Tabledata.InsertAll(request, _account.ProjectId, table.DataSetId, tableId)
.ExecuteAsync(ct);
IList<TableDataInsertAllResponse.InsertErrorsData> insertErrors = response.InsertErrors;
if (insertErrors != null && insertErrors.Count > 0)
{
//handling errors, removed for easier reading..
}
}catch{
//... removed for easier reading
}
finally
{
if (bqService != null)
bqService.Dispose();
}
}
private BigqueryService GetBigQueryService()
{
return new BigqueryService(new BaseClientService.Initializer
{
HttpClientInitializer = _credential,
ApplicationName = _applicationName,
});
}
** 跟进 **
下面给出的答案似乎是减少 http 连接的唯一解决方案。但是,我发现在大量实时数据流上使用批处理请求可能会有一些限制。请参阅我的另一个问题:Google API BatchRequest: An established connection was aborted by the software in your host machine
下面 link 介绍了如何批量 API 调用一起减少客户端必须建立的 HTTP 连接数
https://cloud.google.com/bigquery/batch
批量请求发出后,可以得到response,解析出所有涉及的jobid。作为替代方案,您可以在批处理请求中为每个内部请求预设 jobid。注意:您需要确保这些 jobid 是唯一的
之后,您可以通过 jobs.get https://cloud.google.com/bigquery/docs/reference/v2/jobs/get
检查每个作业的进展情况