C# 在 CSV 文件中获取重复项并通过取平均值删除重复项
C# take a duplicate entry in a CSV file and remove the duplicate by taking an average
我的程序创建了一个 .csv 文件,其中包含人名和旁边的整数。
有时文件中有两个同名条目,但时间不同。我只想要每个人的一个实例。
我想取两个数字的平均值,只为名称生成一行,其中数字将是现有两个数字的平均值。
这里亚历克斯皮特有两个号码。我如何取 105 和 71(在本例中)的平均值来生成仅包含 Alex Pitt, 88 的行?
如果需要参考,这是我创建 CSV 文件的方式。
public void CreateCsvFile()
{
PaceCalculator ListGather = new PaceCalculator();
List<string> NList = ListGather.NameGain();
List<int> PList = ListGather.PaceGain();
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
string filepath = @"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
using (var file = File.CreateText(filepath))
{
foreach (var arr in nAndPList)
{
if (arr == null || arr.Length == 0) continue;
file.Write(arr[0]);
for (int i = 1; i < arr.Length; i++)
{
file.Write(arr[i]);
}
file.WriteLine();
}
}
}
更改以下代码:
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
至
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b)
.ToList()
.GroupBy(x => x.[The field you want to group by])
.Select(y => y.First);
首先,您可以像这样更简单地编写当前 CreateCsvFile
:
public void CreateCsvFile()
{
var filepath = @"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => String.Format("{0},{1}", a, b));
File.WriteAllLines(filepath, records);
}
现在,如果您有重复的名字,可以很容易地更改它以计算出平均配速,如下所示:
public void CreateCsvFile()
{
var filepath = @"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
from record in ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => new { Name = a, Pace = b })
group record.Pace by record.Name into grs
select String.Format("{0},{1}", grs.Key, grs.Average());
File.WriteAllLines(filepath, records);
}
我建议在将所有内容放入 CSV 文件之前合并重复项。
使用:
// The List with all duplicate values
List<string> duplicateChecker = new List<string>();
//Takes the duplicates and puts them in a new List. I'm using the NList because I assume the Names are the important part.
duplicateChecker = NList .Distinct().ToList();
现在您可以简单地遍历新列表并在您的 NList 中搜索它们的值。使用 foreach 循环查找 Nlist 中 Name 值的索引。之后,您可以使用索引通过简单的数学方法合并整数。
//Something like this:
Make a foreach loop for every entry in your duplicateChecker =>
Use Distrinc again on duplicateChecker to make sure you won't go twice through the same duplicate =>
Get the Value of the current String and search it in Nlist =>
Get the Index of the current Element in Nlist and search for the Index in Plist =>
Get the Integer of Plist and store it in a array =>
// make sure your math method runs before a new name starts. After that store the new values in your nAndPList
Once the Loop is through with the first name use a math method.
我希望你明白我想说的话。但是,我建议为您的个人使用唯一标识符。迟早会出现 2 个同名的人(就像在一家大公司中一样)。
我的程序创建了一个 .csv 文件,其中包含人名和旁边的整数。
有时文件中有两个同名条目,但时间不同。我只想要每个人的一个实例。
我想取两个数字的平均值,只为名称生成一行,其中数字将是现有两个数字的平均值。
这里亚历克斯皮特有两个号码。我如何取 105 和 71(在本例中)的平均值来生成仅包含 Alex Pitt, 88 的行?
如果需要参考,这是我创建 CSV 文件的方式。
public void CreateCsvFile()
{
PaceCalculator ListGather = new PaceCalculator();
List<string> NList = ListGather.NameGain();
List<int> PList = ListGather.PaceGain();
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
string filepath = @"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
using (var file = File.CreateText(filepath))
{
foreach (var arr in nAndPList)
{
if (arr == null || arr.Length == 0) continue;
file.Write(arr[0]);
for (int i = 1; i < arr.Length; i++)
{
file.Write(arr[i]);
}
file.WriteLine();
}
}
}
更改以下代码:
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();
至
List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b)
.ToList()
.GroupBy(x => x.[The field you want to group by])
.Select(y => y.First);
首先,您可以像这样更简单地编写当前 CreateCsvFile
:
public void CreateCsvFile()
{
var filepath = @"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => String.Format("{0},{1}", a, b));
File.WriteAllLines(filepath, records);
}
现在,如果您有重复的名字,可以很容易地更改它以计算出平均配速,如下所示:
public void CreateCsvFile()
{
var filepath = @"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
var ListGather = new PaceCalculator();
var records =
from record in ListGather.NameGain()
.Zip(ListGather.PaceGain(),
(a, b) => new { Name = a, Pace = b })
group record.Pace by record.Name into grs
select String.Format("{0},{1}", grs.Key, grs.Average());
File.WriteAllLines(filepath, records);
}
我建议在将所有内容放入 CSV 文件之前合并重复项。
使用:
// The List with all duplicate values
List<string> duplicateChecker = new List<string>();
//Takes the duplicates and puts them in a new List. I'm using the NList because I assume the Names are the important part.
duplicateChecker = NList .Distinct().ToList();
现在您可以简单地遍历新列表并在您的 NList 中搜索它们的值。使用 foreach 循环查找 Nlist 中 Name 值的索引。之后,您可以使用索引通过简单的数学方法合并整数。
//Something like this:
Make a foreach loop for every entry in your duplicateChecker =>
Use Distrinc again on duplicateChecker to make sure you won't go twice through the same duplicate =>
Get the Value of the current String and search it in Nlist =>
Get the Index of the current Element in Nlist and search for the Index in Plist =>
Get the Integer of Plist and store it in a array =>
// make sure your math method runs before a new name starts. After that store the new values in your nAndPList
Once the Loop is through with the first name use a math method.
我希望你明白我想说的话。但是,我建议为您的个人使用唯一标识符。迟早会出现 2 个同名的人(就像在一家大公司中一样)。