在 groupingBy 期间为重复记录组中的字段分配唯一值
Assign unique value to field in duplicate records group during groupingBy
根据 devReddit 提供的回复,我对以下测试文件(假数据)的 CSV 记录(相同的客户端名称)进行了分组:
CSV 测试文件
id,name,mother,birth,center
1,Antonio Carlos da Silva,Ana da Silva, 2008/03/31,1
2,Carlos Roberto de Souza,Amália Maria de Souza,2004/12/10,1
3,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2
4,Danilo da Silva Cardoso,Sônia de Paula Cardoso,2002/08/10,3
5,Ralfo dos Santos Filho,Helena dos Santos,2012/02/21,4
6,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2
7,Antonio Carlos da Silva,Ana da Silva, 2008/03/31,1
8,Ralfo dos Santos Filho,Helena dos Santos,2012/02/21,4
9,Rosana Pereira de Campos,Ivana Maria de Campos,2002/07/16,3
10,Paula Cristina de Abreu,Cristina Pereira de Abreu,2014/10/25,2
11,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2
12,Ralfo dos Santos Filho,Helena dos Santos,2012/02/21,4
客户端实体
package entities;
public class Client {
private String id;
private String name;
private String mother;
private String birth;
private String center;
public Client() {
}
public Client(String id, String name, String mother, String birth, String center) {
this.id = id;
this.name = name;
this.mother = mother;
this.birth = birth;
this.center = center;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getMother() {
return mother;
}
public void setMother(String mother) {
this.mother = mother;
}
public String getBirth() {
return birth;
}
public void setBirth(String birth) {
this.birth = birth;
}
public String getCenter() {
return center;
}
public void setCenter(String center) {
this.center = center;
}
@Override
public String toString() {
return "Client [id=" + id + ", name=" + name + ", mother=" + mother + ", birth=" + birth + ", center=" + center
+ "]";
}
}
计划
package application;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.function.Function;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import entities.Client;
public class Program {
public static void main(String[] args) throws IOException {
Pattern pattern = Pattern.compile(",");
List<Client> file = Files.lines(Paths.get("src/Client.csv"))
.skip(1)
.map(line -> {
String[] fields = pattern.split(line);
return new Client(fields[0], fields[1], fields[2], fields[3], fields[4]);
})
.collect(Collectors.toList());
Map<String, List<Client>> grouped = file
.stream()
.filter(x -> file.stream().anyMatch(y -> isDuplicate(x, y)))
.collect(Collectors.toList())
.stream()
.collect(Collectors.groupingBy(p -> p.getCenter(), LinkedHashMap::new, Collectors.mapping(Function.identity(), Collectors.toList())));
grouped.entrySet().forEach(System.out::println);
}
}
private static Boolean isDuplicate(Client x, Client y) {
return !x.getId().equals(y.getId())
&& x.getName().equals(y.getName())
&& x.getMother().equals(y.getMother())
&& x.getBirth().equals(y.getBirth());
}
最终结果(按中心分组)
1=[Client [id=1, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1],
Client [id=7, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1]]
2=[Client [id=3, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [id=5, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
Client [id=6, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [id=8, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
Client [id=11, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [id=12, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2]]
我需要什么
我需要为每组重复的记录分配一个唯一的值,每次中心值更改都重新开始,甚至将记录保持在一起,因为地图不保证这一点,根据以下示例:
左边的数字显示按中心分组(1 和 2)。重复的名称具有相同的内组号并从“1”开始。当中心号码改变时,内组号码要从“1”重新开始,依此类推。
1=[Client [group=1, id=1, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1],
Client [group=1, id=7, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1]]
// CENTER CHANGED (2) - Restart inner group number to "1" again.
2=[Client [group=1, id=3, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [group=1, id=6, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [group=1, id=11, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
// NAME CHANGED, BUT SAME CENTER YET - so increases by "1" (group=2)
Client [group=2, id=5, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
Client [group=2, id=8, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
Client [group=2, id=12, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2]]
如果我理解得很好,您需要根据所有三个属性 name
、mother
和 birth
对已经分组的条目进行排序。您可以在使用 groupingBy
收集之前应用这样的排序,使用 sorted
:
Map<String, List<Client>> grouped = file.stream()
.filter(x -> file.stream().anyMatch(y -> isDuplicate(x, y)))
.sorted(Comparator.comparing(Client::getName)
.thenComparing(Client::getMother)
.thenComparing(Client::getBirth))
.collect(Collectors.groupingBy(Client::getCenter));
Collectors.groupingBy
在内部使用 Collectors.toList()
作为其下游,因此它保留了您已经使用 sorted
定义的顺序;那么就不需要 LinkedHashMap
。
更新:
要生成 groupId,您可以从 Client
实体生成它。以下是更新后的 Client
:
package com.example.demo;
import java.util.Optional;
public class Client {
private String id;
private String name;
private String mother;
private String birth;
private String center;
private String groupId;
public Client() {
}
public Client(String id, String name, String mother, String birth, String center) {
this.id = id;
this.name = name;
this.mother = mother;
this.birth = birth;
this.center = center;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getMother() {
return mother;
}
public void setMother(String mother) {
this.mother = mother;
}
public String getBirth() {
return birth;
}
public void setBirth(String birth) {
this.birth = birth;
}
public String getCenter() {
return center;
}
public void setCenter(String center) {
this.center = center;
}
public Optional<String> getGroupId() {
return Optional.ofNullable(groupId);
}
public void setGroupId(final String groupId) {
this.groupId = groupId;
}
@Override
public String toString() {
return getGroupId().isPresent()
? "Client [groupId=" + groupId + ", id=" + id + ", name=" + name + ", mother=" + mother + ", birth=" + birth +
", center=" + center + "]"
: "Client [id=" + id + ", name=" + name + ", mother=" + mother + ", birth=" + birth + ", center=" + center + "]";
}
///
/// Other public methods
///
public Client generateAndAssignGroupId() {
setGroupId(String.format("**group=%s**", center));
return this;
}
}
和新流:
Map<String, List<Client>> grouped = file.stream()
.filter(x -> file.stream().anyMatch(y -> isDuplicate(x, y)))
.sorted(Comparator.comparing(Client::getName).thenComparing(Client::getMother).thenComparing(Client::getBirth))
.collect(Collectors.groupingBy(Client::getCenter,
Collectors.mapping(Client::generateAndAssignGroupId, Collectors.toList())));
不是在每个 filter
中使用 file.stream
,您可以通过使用相关字段形成键来创建地图:
Client
中的新方法class
public String getKey() {
return String.format("%s~%s~%s~%s", id, name, mother, birth);
}
使用它创建一个以计数为值的地图。
Map<String, Long> countMap =
file.stream()
.map(Client::getKey)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
然后
// For each inner group you need a separate id based on the name.
// The input would be a map with client name as the key and the
// value would be the corresponding list of clients.
// The below function returns a new map with
// integer as the key part (required unique id for each inner group).
Function<Map<String, List<Client>>, Map<Integer, List<Client>>> mapper
= map -> {
AtomicInteger i = new AtomicInteger(1);
return map.entrySet().stream()
.collect(Collectors.toMap(e -> i.getAndIncrement(), Map.Entry::getValue);
};
// assuming static import of "java.util.stream.Collectors"
Map<String, Map<Integer, List<Client>>> grouped =
file.stream()
.filter(x -> countMap.get(x.getKey()) > 1L) // indicates duplicate
.collect(groupingBy(Client::getCenter,
collectingAndThen(groupingBy(Client::getName, toList()),
mapper /* the above function*/ )));
该任务要求将CSV文件按中心分组,并在每组中按升序对名称进行排序。如果您尝试在 Java.
中执行,代码会很长
使用 open-source Java 包 SPL 很容易完成。一行代码就够了:
A
1
=file("client.csv":"UTF-8").import@ct().sort(center,name).derive(ranki(name;center):group)
SPL 提供 JDBC 驱动程序供 Java 调用。只需将上面的 SPL 脚本存储为 dense_rank.splx 并在调用存储过程时在 Java 中调用它:
…
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st=con.prepareCall("call dense_rank ()");
st.execute();
…
或者在执行 SQL 语句时在 Java 程序中执行 SPL 字符串:
…
st = con.prepareStatement("==file(\"client.csv\":\"UTF-8\")
.import@ct().sort(center,name).derive(ranki(name;center):group)");
st.execute();
…
根据 devReddit
CSV 测试文件
id,name,mother,birth,center
1,Antonio Carlos da Silva,Ana da Silva, 2008/03/31,1
2,Carlos Roberto de Souza,Amália Maria de Souza,2004/12/10,1
3,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2
4,Danilo da Silva Cardoso,Sônia de Paula Cardoso,2002/08/10,3
5,Ralfo dos Santos Filho,Helena dos Santos,2012/02/21,4
6,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2
7,Antonio Carlos da Silva,Ana da Silva, 2008/03/31,1
8,Ralfo dos Santos Filho,Helena dos Santos,2012/02/21,4
9,Rosana Pereira de Campos,Ivana Maria de Campos,2002/07/16,3
10,Paula Cristina de Abreu,Cristina Pereira de Abreu,2014/10/25,2
11,Pedro de Albuquerque,Maria de Albuquerque,2006/04/03,2
12,Ralfo dos Santos Filho,Helena dos Santos,2012/02/21,4
客户端实体
package entities;
public class Client {
private String id;
private String name;
private String mother;
private String birth;
private String center;
public Client() {
}
public Client(String id, String name, String mother, String birth, String center) {
this.id = id;
this.name = name;
this.mother = mother;
this.birth = birth;
this.center = center;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getMother() {
return mother;
}
public void setMother(String mother) {
this.mother = mother;
}
public String getBirth() {
return birth;
}
public void setBirth(String birth) {
this.birth = birth;
}
public String getCenter() {
return center;
}
public void setCenter(String center) {
this.center = center;
}
@Override
public String toString() {
return "Client [id=" + id + ", name=" + name + ", mother=" + mother + ", birth=" + birth + ", center=" + center
+ "]";
}
}
计划
package application;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.function.Function;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import entities.Client;
public class Program {
public static void main(String[] args) throws IOException {
Pattern pattern = Pattern.compile(",");
List<Client> file = Files.lines(Paths.get("src/Client.csv"))
.skip(1)
.map(line -> {
String[] fields = pattern.split(line);
return new Client(fields[0], fields[1], fields[2], fields[3], fields[4]);
})
.collect(Collectors.toList());
Map<String, List<Client>> grouped = file
.stream()
.filter(x -> file.stream().anyMatch(y -> isDuplicate(x, y)))
.collect(Collectors.toList())
.stream()
.collect(Collectors.groupingBy(p -> p.getCenter(), LinkedHashMap::new, Collectors.mapping(Function.identity(), Collectors.toList())));
grouped.entrySet().forEach(System.out::println);
}
}
private static Boolean isDuplicate(Client x, Client y) {
return !x.getId().equals(y.getId())
&& x.getName().equals(y.getName())
&& x.getMother().equals(y.getMother())
&& x.getBirth().equals(y.getBirth());
}
最终结果(按中心分组)
1=[Client [id=1, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1],
Client [id=7, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1]]
2=[Client [id=3, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [id=5, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
Client [id=6, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [id=8, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
Client [id=11, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [id=12, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2]]
我需要什么
我需要为每组重复的记录分配一个唯一的值,每次中心值更改都重新开始,甚至将记录保持在一起,因为地图不保证这一点,根据以下示例:
左边的数字显示按中心分组(1 和 2)。重复的名称具有相同的内组号并从“1”开始。当中心号码改变时,内组号码要从“1”重新开始,依此类推。
1=[Client [group=1, id=1, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1],
Client [group=1, id=7, name=Antonio Carlos da Silva, mother=Ana da Silva, birth= 2008/03/31, center=1]]
// CENTER CHANGED (2) - Restart inner group number to "1" again.
2=[Client [group=1, id=3, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [group=1, id=6, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
Client [group=1, id=11, name=Pedro de Albuquerque, mother=Maria de Albuquerque, birth=2006/04/03, center=2],
// NAME CHANGED, BUT SAME CENTER YET - so increases by "1" (group=2)
Client [group=2, id=5, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
Client [group=2, id=8, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2],
Client [group=2, id=12, name=Ralfo dos Santos Filho, mother=Helena dos Santos, birth=2012/02/21, center=2]]
如果我理解得很好,您需要根据所有三个属性 name
、mother
和 birth
对已经分组的条目进行排序。您可以在使用 groupingBy
收集之前应用这样的排序,使用 sorted
:
Map<String, List<Client>> grouped = file.stream()
.filter(x -> file.stream().anyMatch(y -> isDuplicate(x, y)))
.sorted(Comparator.comparing(Client::getName)
.thenComparing(Client::getMother)
.thenComparing(Client::getBirth))
.collect(Collectors.groupingBy(Client::getCenter));
Collectors.groupingBy
在内部使用 Collectors.toList()
作为其下游,因此它保留了您已经使用 sorted
定义的顺序;那么就不需要 LinkedHashMap
。
更新:
要生成 groupId,您可以从 Client
实体生成它。以下是更新后的 Client
:
package com.example.demo;
import java.util.Optional;
public class Client {
private String id;
private String name;
private String mother;
private String birth;
private String center;
private String groupId;
public Client() {
}
public Client(String id, String name, String mother, String birth, String center) {
this.id = id;
this.name = name;
this.mother = mother;
this.birth = birth;
this.center = center;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getMother() {
return mother;
}
public void setMother(String mother) {
this.mother = mother;
}
public String getBirth() {
return birth;
}
public void setBirth(String birth) {
this.birth = birth;
}
public String getCenter() {
return center;
}
public void setCenter(String center) {
this.center = center;
}
public Optional<String> getGroupId() {
return Optional.ofNullable(groupId);
}
public void setGroupId(final String groupId) {
this.groupId = groupId;
}
@Override
public String toString() {
return getGroupId().isPresent()
? "Client [groupId=" + groupId + ", id=" + id + ", name=" + name + ", mother=" + mother + ", birth=" + birth +
", center=" + center + "]"
: "Client [id=" + id + ", name=" + name + ", mother=" + mother + ", birth=" + birth + ", center=" + center + "]";
}
///
/// Other public methods
///
public Client generateAndAssignGroupId() {
setGroupId(String.format("**group=%s**", center));
return this;
}
}
和新流:
Map<String, List<Client>> grouped = file.stream()
.filter(x -> file.stream().anyMatch(y -> isDuplicate(x, y)))
.sorted(Comparator.comparing(Client::getName).thenComparing(Client::getMother).thenComparing(Client::getBirth))
.collect(Collectors.groupingBy(Client::getCenter,
Collectors.mapping(Client::generateAndAssignGroupId, Collectors.toList())));
不是在每个 filter
中使用 file.stream
,您可以通过使用相关字段形成键来创建地图:
Client
中的新方法class
public String getKey() {
return String.format("%s~%s~%s~%s", id, name, mother, birth);
}
使用它创建一个以计数为值的地图。
Map<String, Long> countMap =
file.stream()
.map(Client::getKey)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
然后
// For each inner group you need a separate id based on the name.
// The input would be a map with client name as the key and the
// value would be the corresponding list of clients.
// The below function returns a new map with
// integer as the key part (required unique id for each inner group).
Function<Map<String, List<Client>>, Map<Integer, List<Client>>> mapper
= map -> {
AtomicInteger i = new AtomicInteger(1);
return map.entrySet().stream()
.collect(Collectors.toMap(e -> i.getAndIncrement(), Map.Entry::getValue);
};
// assuming static import of "java.util.stream.Collectors"
Map<String, Map<Integer, List<Client>>> grouped =
file.stream()
.filter(x -> countMap.get(x.getKey()) > 1L) // indicates duplicate
.collect(groupingBy(Client::getCenter,
collectingAndThen(groupingBy(Client::getName, toList()),
mapper /* the above function*/ )));
该任务要求将CSV文件按中心分组,并在每组中按升序对名称进行排序。如果您尝试在 Java.
中执行,代码会很长使用 open-source Java 包 SPL 很容易完成。一行代码就够了:
A | |
---|---|
1 | =file("client.csv":"UTF-8").import@ct().sort(center,name).derive(ranki(name;center):group) |
SPL 提供 JDBC 驱动程序供 Java 调用。只需将上面的 SPL 脚本存储为 dense_rank.splx 并在调用存储过程时在 Java 中调用它:
…
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st=con.prepareCall("call dense_rank ()");
st.execute();
…
或者在执行 SQL 语句时在 Java 程序中执行 SPL 字符串:
…
st = con.prepareStatement("==file(\"client.csv\":\"UTF-8\")
.import@ct().sort(center,name).derive(ranki(name;center):group)");
st.execute();
…