提高 C# 应用程序插入 SQL 数据库的性能
Improve insert performance for a C# application into SQL database
我需要提高 C# 应用程序中插入的性能。我首先出去从视图中获取数据。然后我通过 FOREACH
循环插入 table。我正在处理超过 200,000 条记录,执行此任务需要花费大量时间。我知道 SaveChanges
是到数据库的往返,但我不确定如何解决这个问题。我可以做些什么来缩短时间吗?
var values = db.TodaysAirs.ToList();
foreach (TodaysAir x in values)
{
//check to see if this is a new value or one that needs to be updated
var checkForNew = db.TodaysAirValues
.Where(m => m.ID == x.ID);
//new record
if (checkForNew.Count() == 0)
{
TodaysAirValue newRecord = new TodaysAirValue();
newRecord.ID = x.ID;
newRecord.Logger_Id = x.Logger_Id;
newRecord.SiteName = x.SiteName;
newRecord.Latitude = x.Latitude;
newRecord.Longitude = x.Longitude;
newRecord.Hour = x.Hour;
newRecord.Parameter = x.Parameter;
newRecord.Stan = x.Stan;
newRecord.Units = x.Units;
newRecord.InstrumentType = x.InstrumentType;
newRecord.NowCast = x.NowCast;
newRecord.AQIValue = x.AQIValue;
newRecord.HealthCategory = x.HealthCategory;
newRecord.Hr24Avg = x.Hr24Avg;
newRecord.Hr24Max = x.Hr24Max;
newRecord.Hr24Min = x.Hr24Min;
newRecord.SID = DateTime.Now;
db.TodaysAirValues.Add(newRecord);
db.SaveChanges();
// CallJenkinsJob();
}
}
目标应该是 运行 一个单一的原始 SQL 语句,看起来非常像这样:
INSERT INTO TodaysAirValues
(ID, Logger_id, SiteName, Latitude, Longitude, Hour, Parameter,
Stan, Units, InstrumentType, NowCast, AQIValue, HealthCategory,
Hr24Avg, Hr24Max, Hr24Min, SID)
SELECT ta.ID, ta.Logger_id, ta.SiteName, ta.Latitude, ta.Longitude,
ta.Hour, ta.Parameter, ta.Stan, ta.Units, ta.InstrumentType,
ta.NowCast, ta.AQIValue, ta.HealthCategory, ta.Hr24Avg,
ta.Hr24Max, ta.Hr24Min, current_timestamp
FROM TodaysAirs ta
LEFT JOIN TodaysAirValues tav ON tav.ID = ta.ID
WHERE tav.ID IS NULL
这可能不是所有的 table 或列名都完全正确,如果与数据库的 EF 映射有任何差异。您还可以使用 NOT EXISTS()
而不是 LEFT JOIN WHERE NULL
技术让它运行得更快。
我也看到了这个:
if the Count is greater than 0 it checks to see if any changes where made and if so update the record.
在那种情况下,如果您 precede (运行 这个额外的命令首先!)上面的 INSERT 和一个 UPDATE 看起来像这样:
UPDATE tav
SET tav.ID=ta.DI, tav.Logger_id=ta.Logger_id, tav.SiteName=ta.SiteName,
tav.Latitude=ta.Latitude, tav.Longitude=ta.Longitude, tav.Hour=ta.Hour,
tav.Parameter=ta.Parameter, tav.Stan=ta.Stan, tav.Units=ta.Units,
tav.InstrumentType=ta.InstrumentType, tav.NowCast=ta.NowCast,
tav.AQIValue=ta.AQIValue, tav.HealthCategory=ta.HealthCategory,
tav.Hr24Avg=ta.Hr24Avg,tav.Hr24Max=ta.Hr24Max, tav.Hr24Min=ta.Hr24Min,
tav.SID=ta.SID -- possibly current_timestamp here instead
FROM TodaysAirs ta
INNER JOIN TodaysAirValues tav ON tav.ID = ta.ID
WHERE (
-- compare here to decide if the record needs to update or not
)
很遗憾,我没有足够的信息来了解您想要的内容,无法为您提供完整的代码。
我需要提高 C# 应用程序中插入的性能。我首先出去从视图中获取数据。然后我通过 FOREACH
循环插入 table。我正在处理超过 200,000 条记录,执行此任务需要花费大量时间。我知道 SaveChanges
是到数据库的往返,但我不确定如何解决这个问题。我可以做些什么来缩短时间吗?
var values = db.TodaysAirs.ToList();
foreach (TodaysAir x in values)
{
//check to see if this is a new value or one that needs to be updated
var checkForNew = db.TodaysAirValues
.Where(m => m.ID == x.ID);
//new record
if (checkForNew.Count() == 0)
{
TodaysAirValue newRecord = new TodaysAirValue();
newRecord.ID = x.ID;
newRecord.Logger_Id = x.Logger_Id;
newRecord.SiteName = x.SiteName;
newRecord.Latitude = x.Latitude;
newRecord.Longitude = x.Longitude;
newRecord.Hour = x.Hour;
newRecord.Parameter = x.Parameter;
newRecord.Stan = x.Stan;
newRecord.Units = x.Units;
newRecord.InstrumentType = x.InstrumentType;
newRecord.NowCast = x.NowCast;
newRecord.AQIValue = x.AQIValue;
newRecord.HealthCategory = x.HealthCategory;
newRecord.Hr24Avg = x.Hr24Avg;
newRecord.Hr24Max = x.Hr24Max;
newRecord.Hr24Min = x.Hr24Min;
newRecord.SID = DateTime.Now;
db.TodaysAirValues.Add(newRecord);
db.SaveChanges();
// CallJenkinsJob();
}
}
目标应该是 运行 一个单一的原始 SQL 语句,看起来非常像这样:
INSERT INTO TodaysAirValues
(ID, Logger_id, SiteName, Latitude, Longitude, Hour, Parameter,
Stan, Units, InstrumentType, NowCast, AQIValue, HealthCategory,
Hr24Avg, Hr24Max, Hr24Min, SID)
SELECT ta.ID, ta.Logger_id, ta.SiteName, ta.Latitude, ta.Longitude,
ta.Hour, ta.Parameter, ta.Stan, ta.Units, ta.InstrumentType,
ta.NowCast, ta.AQIValue, ta.HealthCategory, ta.Hr24Avg,
ta.Hr24Max, ta.Hr24Min, current_timestamp
FROM TodaysAirs ta
LEFT JOIN TodaysAirValues tav ON tav.ID = ta.ID
WHERE tav.ID IS NULL
这可能不是所有的 table 或列名都完全正确,如果与数据库的 EF 映射有任何差异。您还可以使用 NOT EXISTS()
而不是 LEFT JOIN WHERE NULL
技术让它运行得更快。
我也看到了这个:
if the Count is greater than 0 it checks to see if any changes where made and if so update the record.
在那种情况下,如果您 precede (运行 这个额外的命令首先!)上面的 INSERT 和一个 UPDATE 看起来像这样:
UPDATE tav
SET tav.ID=ta.DI, tav.Logger_id=ta.Logger_id, tav.SiteName=ta.SiteName,
tav.Latitude=ta.Latitude, tav.Longitude=ta.Longitude, tav.Hour=ta.Hour,
tav.Parameter=ta.Parameter, tav.Stan=ta.Stan, tav.Units=ta.Units,
tav.InstrumentType=ta.InstrumentType, tav.NowCast=ta.NowCast,
tav.AQIValue=ta.AQIValue, tav.HealthCategory=ta.HealthCategory,
tav.Hr24Avg=ta.Hr24Avg,tav.Hr24Max=ta.Hr24Max, tav.Hr24Min=ta.Hr24Min,
tav.SID=ta.SID -- possibly current_timestamp here instead
FROM TodaysAirs ta
INNER JOIN TodaysAirValues tav ON tav.ID = ta.ID
WHERE (
-- compare here to decide if the record needs to update or not
)
很遗憾,我没有足够的信息来了解您想要的内容,无法为您提供完整的代码。