为什么 BadDataFound 处理程序会针对 1 个错误记录调用 2 次?
Why does the BadDataFound handler get called 2 times for 1 bad record?
我为 CsvHelper
配置连接了 BadDataFound
处理程序。
csvConfig.BadDataFound = args =>
{
_output.WriteLine($"BadDataFound: {args.Field}...");
};
示例 CSV(缩写):
case|unique_id|record
1731030|1|"{"apiversion\":\"1.0\",\"zone\":\"west\"}"
1478634|1|"{\"apiversion\":\"1.0\",\"zone\":\"north\"}"
我正在尝试的测试文件有 1 个错误记录,其中引用的字段缺少转义符。当我写一条日志消息或设置一个断点时,我看到处理程序被调用了 2 次用于这 1 个错误记录。
BadDataFound: "{"apiversion\":\"1.0\",\"zone\":\"west\"}"
BadDataFound: "{"apiversion\":\"1.0\",\"zone\":\"west\"}"
apiversion 之前的引用缺少转义,但只有 1 条记录有此问题。
这将导致我记录问题两次。
为什么这个处理程序会触发 2 次?是否有控制此行为的配置选项?
更新
我只是在想,因为你正在使用管道“|”对于您的定界符,我认为您可以使用 CsvMode.Escape
。 但是,如果您的 JSON 数据包含“|”,您将 运行 遇到问题或换行符。
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = "|",
Escape = '\',
Mode = CsvMode.Escape
};
using (var reader = new StringReader("case|unique_id|record\n1731030|1|\"{\"apiversion\\":\\"1.0\\",\\"zone\\":\\"west\\"}\""))
using (var csv = new CsvReader(reader, config))
{
var records = csv.GetRecords<Foo>().Dump();
}
}
public class Foo
{
[Name("case")]
public int Case { get; set; }
[Name("unique_id")]
public int UniqueId { get; set; }
[Name("record")]
public string Record { get; set; }
}
关于 BadDataFound 问题
不幸的是,我认为这是一个错误。这是由其他人于 10/5/2021 报告的。 https://github.com/JoshClose/CsvHelper/issues/1873
第二个用户 craigc39 有一个潜在的 hacky 解决方案来解决他的问题。
There is definitely a way around this - but it's incredibly hacky. You would have to use the CSV Helper library twice - once to scan the CSV and record the bad rows - including the exact row number where they happen. That way, when you are generating your bad rows list, you can ensure that there are no duplicates. Second time using the CSV Library to read all the rows - and skip any rows that you recorded in bad rows in the scan. That way, the bad rows don't actually end up going into the good rows. I'm about to test out this solution and hoping it works.
我为 CsvHelper
配置连接了 BadDataFound
处理程序。
csvConfig.BadDataFound = args =>
{
_output.WriteLine($"BadDataFound: {args.Field}...");
};
示例 CSV(缩写):
case|unique_id|record 1731030|1|"{"apiversion\":\"1.0\",\"zone\":\"west\"}" 1478634|1|"{\"apiversion\":\"1.0\",\"zone\":\"north\"}"
我正在尝试的测试文件有 1 个错误记录,其中引用的字段缺少转义符。当我写一条日志消息或设置一个断点时,我看到处理程序被调用了 2 次用于这 1 个错误记录。
BadDataFound: "{"apiversion\":\"1.0\",\"zone\":\"west\"}" BadDataFound: "{"apiversion\":\"1.0\",\"zone\":\"west\"}"
apiversion 之前的引用缺少转义,但只有 1 条记录有此问题。
这将导致我记录问题两次。
为什么这个处理程序会触发 2 次?是否有控制此行为的配置选项?
更新
我只是在想,因为你正在使用管道“|”对于您的定界符,我认为您可以使用 CsvMode.Escape
。 但是,如果您的 JSON 数据包含“|”,您将 运行 遇到问题或换行符。
var config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = "|",
Escape = '\',
Mode = CsvMode.Escape
};
using (var reader = new StringReader("case|unique_id|record\n1731030|1|\"{\"apiversion\\":\\"1.0\\",\\"zone\\":\\"west\\"}\""))
using (var csv = new CsvReader(reader, config))
{
var records = csv.GetRecords<Foo>().Dump();
}
}
public class Foo
{
[Name("case")]
public int Case { get; set; }
[Name("unique_id")]
public int UniqueId { get; set; }
[Name("record")]
public string Record { get; set; }
}
关于 BadDataFound 问题
不幸的是,我认为这是一个错误。这是由其他人于 10/5/2021 报告的。 https://github.com/JoshClose/CsvHelper/issues/1873
第二个用户 craigc39 有一个潜在的 hacky 解决方案来解决他的问题。
There is definitely a way around this - but it's incredibly hacky. You would have to use the CSV Helper library twice - once to scan the CSV and record the bad rows - including the exact row number where they happen. That way, when you are generating your bad rows list, you can ensure that there are no duplicates. Second time using the CSV Library to read all the rows - and skip any rows that you recorded in bad rows in the scan. That way, the bad rows don't actually end up going into the good rows. I'm about to test out this solution and hoping it works.