使用 LINQ 从文本文件中获取唯一记录
Take unique records from text file using LINQ
我已经设置了ICollection<User>
列表:
public ICollection<User> MyUsers { get; set; }
public IList<User> GetUserList(string path)
{
MyUsers = File.ReadAllLines(path)
.Where(linia => linia.Length > 1)
.Select(line => Parse(line))
.ToList();
return new List<User>(MyUsers);
}
private static User Parse(string line)
{
var column = line.Split('|');
return new User
{
ReadTime = column [0],
idUser = column [1],
LastName = column [2],
FirstName = column [3],
City = column[4]
};
}
我的源文本文件如下所示:
2019-03-03|1|LN1|FN1|Berlin
2019-03-03|2|LN2|FN2|Rome
2019-03-03|3|LN3|FN3|Wien
2019-03-03|4|LN4|FN4|Londyn
....
2019-03-27|1|LN1|FN1|Berlin
2019-03-27|2|LN2|FN2|Rome
2019-03-27|3|LN3|FN3|Wien
2019-03-27|4|LN4|FN4|Londyn
当我 运行 这个时,我得到了包含相同记录的列表,只有 ReadTime
不同。
如何设置 unique MyUsers
列表,其中第 ReadTime
列将采用最后日期?
您可以尝试使用简单的 GroupBy
方法:
MyUsers = File.ReadAllLines(path)
.Where(linia => linia.Length > 1)
.Select(line => Parse(line))
.GroupBy(
u => u.idUser,
(key, grp) => new User() {
ReadTime = grp.Select(u => u.ReadTime).Max(),
idUser = key,
LastName = grp.Select(u => u.LastName).FirstOrDefault(),
FirstName = grp.Select(u => u.FirstName).FirstOrDefault(),
City = grp.Select(u => u.City).FirstOrDefault(),
})
.ToList();
您可以使用MoreLINQ-NuGet-package, there is useful DistinctBy-函数:
MyUsers = File.ReadAllLines(path)
.Where(linia => linia.Length > 1)
.Select(line => Parse(line))
.OrderByDescending(r => r.ReadTime)
.DistinctBy(r => new { r.City, r.FirstName, r.idUser, r.LastName })
.ToList();
我们可以 GroupBy
并找到每个组的 Max
日期:
IEnumerable<string> result = File
.ReadLines(path)
.Where(line => !string.IsNullOrWhiteSpace(line)) // to be on the safe side
.Select(line => {
int p = line.IndexOf('|');
return new {
date = line.Substring(0, p), // date to take Max
key = line.Substring(p + 1) // group key
};
})
.GroupBy(item => item.key, item => item.date)
.Select(chunk => string.Join("|", chunk.Key, chunk.Max(item => item)));
过滤掉重复项后,我们可以解析成一个集合:
MyUsers = result
.Select(line => Parse(line))
.ToList();
我已经设置了ICollection<User>
列表:
public ICollection<User> MyUsers { get; set; }
public IList<User> GetUserList(string path)
{
MyUsers = File.ReadAllLines(path)
.Where(linia => linia.Length > 1)
.Select(line => Parse(line))
.ToList();
return new List<User>(MyUsers);
}
private static User Parse(string line)
{
var column = line.Split('|');
return new User
{
ReadTime = column [0],
idUser = column [1],
LastName = column [2],
FirstName = column [3],
City = column[4]
};
}
我的源文本文件如下所示:
2019-03-03|1|LN1|FN1|Berlin
2019-03-03|2|LN2|FN2|Rome
2019-03-03|3|LN3|FN3|Wien
2019-03-03|4|LN4|FN4|Londyn
....
2019-03-27|1|LN1|FN1|Berlin
2019-03-27|2|LN2|FN2|Rome
2019-03-27|3|LN3|FN3|Wien
2019-03-27|4|LN4|FN4|Londyn
当我 运行 这个时,我得到了包含相同记录的列表,只有 ReadTime
不同。
如何设置 unique MyUsers
列表,其中第 ReadTime
列将采用最后日期?
您可以尝试使用简单的 GroupBy
方法:
MyUsers = File.ReadAllLines(path)
.Where(linia => linia.Length > 1)
.Select(line => Parse(line))
.GroupBy(
u => u.idUser,
(key, grp) => new User() {
ReadTime = grp.Select(u => u.ReadTime).Max(),
idUser = key,
LastName = grp.Select(u => u.LastName).FirstOrDefault(),
FirstName = grp.Select(u => u.FirstName).FirstOrDefault(),
City = grp.Select(u => u.City).FirstOrDefault(),
})
.ToList();
您可以使用MoreLINQ-NuGet-package, there is useful DistinctBy-函数:
MyUsers = File.ReadAllLines(path)
.Where(linia => linia.Length > 1)
.Select(line => Parse(line))
.OrderByDescending(r => r.ReadTime)
.DistinctBy(r => new { r.City, r.FirstName, r.idUser, r.LastName })
.ToList();
我们可以 GroupBy
并找到每个组的 Max
日期:
IEnumerable<string> result = File
.ReadLines(path)
.Where(line => !string.IsNullOrWhiteSpace(line)) // to be on the safe side
.Select(line => {
int p = line.IndexOf('|');
return new {
date = line.Substring(0, p), // date to take Max
key = line.Substring(p + 1) // group key
};
})
.GroupBy(item => item.key, item => item.date)
.Select(chunk => string.Join("|", chunk.Key, chunk.Max(item => item)));
过滤掉重复项后,我们可以解析成一个集合:
MyUsers = result
.Select(line => Parse(line))
.ToList();