C#比较csv中不同行的字段

C# compare fields from different lines in csv

我正在尝试比较一行中数组的 0 索引中的值和下一行中的 0 索引中的值。想象一个 CSV,其中我在第一列中有一个唯一标识符,在第二列中有一个对应的值。

USER1, 1P
USER1, 3G
USER2, 1P
USER3, 1V

我想检查下一行(或上一行,如果更容易的话)的值 [0] 进行比较,如果它们相同(如示例中所示),则将其连接到索引 1。那是,数据应显示为

USER1, 1P, 3G
USER2, 1P
USER3, 1V

在它被传递到下一个函数之前。到目前为止我有

 private void csvParse(string path)
        {
            using (TextFieldParser parser = new TextFieldParser(path))
                {
                    parser.Delimiters = new string[] { "," };
                    while (!parser.EndOfData)
                    {
                        string[] parts = parser.ReadFields();
                        if (parts == null)
                        {
                            break;
                        }
                        contact.ContactId = parts[0];
                        long nextLine;
                        nextLine = parser.LineNumber+1;
//if line1 parts[0] == line2 parts[0] etc.
                    }
                }
            }

有人有什么建议吗?谢谢你。

如何将数组保存到变量中:

private void csvParse(string path)
        {
            using (TextFieldParser parser = new TextFieldParser(path))
                {
                    parser.Delimiters = new string[] { "," };
                    string[] oldParts = new string[] { string.Empty };
                    while (!parser.EndOfData)
                    {
                        string[] parts = parser.ReadFields();
                        if (parts == null || parts.Length < 1)
                        {
                            break;
                        }

                        if (oldParts[0] == parts[0])
                        {
                             // concat logic goes here
                        }
                        else
                        {
                            contact.ContactId = parts[0];
                        }

                        long nextLine;
                        nextLine = parser.LineNumber+1;
                        oldParts = parts;
//if line1 parts[0] == line2 parts[0] etc.
                    }
                }
            }

执行此类操作的最简单方法是将每一行转换为一个对象。你可以用CsvHelperhttps://www.nuget.org/packages/CsvHelper/, to do the work for you or you can iterate each line and parse to an object. It is a great tool and it knows how to properly parse CSV files into a collection of objects. Then, whether you create the collection yourself or use CsvHelper, you can use Linq to GroupBy, https://msdn.microsoft.com/en-us/library/bb534304(v=vs.100).aspx, your "key" (in this case UserId) and Aggregate, https://msdn.microsoft.com/en-us/library/bb549218(v=vs.110).aspx,其他的属性变成一个字符串。然后,您可以将新的分组依据集合用于您的最终目标(将其写入文件或将其用于您需要的任何地方)。

您基本上找到了所有唯一条目,因此将它们放入以联系人 ID 为键的字典中。如下:

 private void csvParse(string path)
    {
        using (TextFieldParser parser = new TextFieldParser(path))
        {
            parser.Delimiters = new string[] { "," };
            Dictionary<string, List<string>> uniqueContacts = new Dictionary<string, List<string>>();
            while (!parser.EndOfData)
            {
                string[] parts = parser.ReadFields();
                if (parts == null || parts.Count() != 2)
                {
                    break;
                }
                //if contact id not present in dictionary add
                if (!uniqueContacts.ContainsKey(parts[0]))
                    uniqueContacts.Add(parts[0],new List<string>());
                //now there's definitely an existing contact in dic (the one 
                //we've just added or a previously added one) so add to the                   
                //list of strings for that contact
                uniqueContacts[parts[0]].Add(parts[1]);
            }

            //now do something with that dictionary of unique user names and
            // lists of strings, for example dump them to console in the 
            //format you specify:

            foreach (var contactId in uniqueContacts.Keys)
            {

                var sb = new StringBuilder();
                sb.Append($"contactId, ");
                foreach (var bit in uniqueContacts[contactId])
                {
                    sb.Append(bit);
                    if (bit != uniqueContacts[contactId].Last())
                        sb.Append(", ");
                }
                Console.WriteLine(sb);
            }
        }
    }

如果我没理解错的话,你问的本质上是"how do I group the values in the second column based on the values in the first column?"。

一种快速而简洁的方法是 Group By using LINQ:

var linesGroupedByUser =
    from line in File.ReadAllLines(path)
    let elements = line.Split(',')
    let user = new {Name = elements[0], Value = elements[1]}
    group  user by user.Name into users
    select users;

foreach (var user in linesGroupedByUser)
{
    string valuesAsString = String.Join(",", user.Select(x => x.Value));

    Console.WriteLine(user.Key + ", " + valuesAsString);
}

我没有使用您的 TextFieldParser class,但您可以轻松地使用它。但是,这种方法确实要求您有能力将所有数据加载到内存中。你不提这是否可行。