C# 在 CSV 文件中获取重复项并通过取平均值删除重复项

C# take a duplicate entry in a CSV file and remove the duplicate by taking an average

我的程序创建了一个 .csv 文件,其中包含人名和旁边的整数。

有时文件中有两个同名条目,但时间不同。我只想要每个人的一个实例。

我想取两个数字的平均值,只为名称生成一行,其中数字将是现有两个数字的平均值。

这里亚历克斯皮特有两个号码。我如何取 105 和 71(在本例中)的平均值来生成仅包含 Alex Pitt, 88 的行?

如果需要参考,这是我创建 CSV 文件的方式。

public void CreateCsvFile()
    {
        PaceCalculator ListGather = new PaceCalculator();
        List<string> NList = ListGather.NameGain();
        List<int> PList = ListGather.PaceGain();

        List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();

        string filepath = @"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";

        using (var file = File.CreateText(filepath))
        {
            foreach (var arr in nAndPList)
            {
                if (arr == null || arr.Length == 0) continue;
                file.Write(arr[0]);
                for (int i = 1; i < arr.Length; i++)
                {
                    file.Write(arr[i]);
                }
                file.WriteLine();
            }
        }
    }

更改以下代码:

 List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b).ToList();

List<string> nAndPList = NList.Zip(PList, (a, b) => a + ", " + b)
                        .ToList()
                        .GroupBy(x => x.[The field you want to group by])
                        .Select(y => y.First);

首先,您可以像这样更简单地编写当前 CreateCsvFile

public void CreateCsvFile()
{
    var filepath = @"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
    var ListGather = new PaceCalculator();

    var records =
        ListGather.NameGain()
            .Zip(ListGather.PaceGain(),
                (a, b) => String.Format("{0},{1}", a, b));

    File.WriteAllLines(filepath, records);
}

现在,如果您有重复的名字,可以很容易地更改它以计算出平均配速,如下所示:

public void CreateCsvFile()
{
    var filepath = @"F:\A2 Computing\C# Programming Project\ScheduleFile.csv";
    var ListGather = new PaceCalculator();

    var records =
        from record in ListGather.NameGain()
            .Zip(ListGather.PaceGain(),
                (a, b) => new { Name = a, Pace = b })
        group record.Pace by record.Name into grs
        select String.Format("{0},{1}", grs.Key, grs.Average());

    File.WriteAllLines(filepath, records);
}

我建议在将所有内容放入 CSV 文件之前合并重复项。

使用:

// The List with all duplicate values    
List<string> duplicateChecker = new List<string>();  

//Takes the duplicates and puts them in a new List. I'm using the NList because I assume the Names are the important part. 
duplicateChecker  = NList .Distinct().ToList();

现在您可以简单地遍历新列表并在您的 NList 中搜索它们的值。使用 foreach 循环查找 Nlist 中 Name 值的索引。之后,您可以使用索引通过简单的数学方法合并整数。

//Something like this:
Make a foreach loop for every entry in your duplicateChecker => 

Use Distrinc again on duplicateChecker to make sure you won't go twice through the same duplicate =>

Get the Value of the current String and search it in Nlist =>

Get the Index of the current Element in Nlist and search for the Index in Plist =>

Get the Integer of Plist and store it in a array =>

// make sure your math method runs before a new name starts. After that store the new values in your nAndPList
Once the Loop is through with the first name use a math method.

我希望你明白我想说的话。但是,我建议为您的个人使用唯一标识符。迟早会出现 2 个同名的人(就像在一家大公司中一样)。