Group_by 和 group_concat 在 shell 脚本中

Group_by and group_concat in shell script

我的目的是识别类路径中重复的 jar。所以我使用以下命令进行了一些预处理。

mvn -o dependency:list | grep ":.*:.*:.*" | cut -d] -f2- | sed 's/:[a-z]*$//g' | sort -u -t: -k2

并且生成的文件格式为

group_id:artifact_id:type:version

所以,现在举个例子,我在文件中有以下两行

com.sun.jersey:jersey-client:jar:1.19.1
org.glassfish.jersey.core:jersey-client:jar:2.26

我想生成一个包含以下内容的文件。

jersey-client | com.sun.jersey:1.19.1,org.glassfish.jersey.core:2.26

此文件的内容各不相同。可以有多个具有差异版本的库。 知道如何使用 shell 脚本吗?我想避免数据库查询。

在此处添加示例文件快照...

org.glassfish.jaxb:jaxb-runtime:jar:2.4.0-b180725.0644
    org.jboss.spec.javax.annotation:jboss-annotations-api_1.2_spec:jar:1.0.2.Final
    org.jboss.logging:jboss-logging:jar:3.3.2.Final
    org.jboss.spec.javax.transaction:jboss-transaction-api_1.2_spec:jar:1.0.1.Final
    org.jboss.spec.javax.websocket:jboss-websocket-api_1.1_spec:jar:1.1.3.Final
    com.github.stephenc.jcip:jcip-annotations:jar:1.0-1
    com.beust:jcommander:jar:1.72
    com.sun.jersey.contribs:jersey-apache-client4:jar:1.19.1
    org.glassfish.jersey.ext:jersey-bean-validation:jar:2.26
    com.sun.jersey:jersey-client:jar:1.19.1
    org.glassfish.jersey.core:jersey-client:jar:2.26
    org.glassfish.jersey.core:jersey-common:jar:2.26
    org.glassfish.jersey.containers:jersey-container-servlet:jar:2.26
    org.glassfish.jersey.containers:jersey-container-servlet-core:jar:2.26
    com.sun.jersey:jersey-core:jar:1.19.1
    org.glassfish.jersey.ext:jersey-entity-filtering:jar:2.26
    org.glassfish.jersey.inject:jersey-hk2:jar:2.31
    org.glassfish.jersey.media:jersey-media-jaxb:jar:2.26
    org.glassfish.jersey.media:jersey-media-json-jackson:jar:2.26
    org.glassfish.jersey.media:jersey-media-multipart:jar:2.26
    org.glassfish.jersey.core:jersey-server:jar:2.26
    org.glassfish.jersey.ext:jersey-spring4:jar:2.26
    net.minidev:json-smart:jar:2.3
    com.google.code.findbugs:jsr305:jar:3.0.1
    javax.ws.rs:jsr311-api:jar:1.1.1
    org.slf4j:jul-to-slf4j:jar:1.7.25
    junit:junit:jar:4.12
    org.latencyutils:LatencyUtils:jar:2.0.3
    org.liquibase:liquibase-core:jar:3.5.5
    log4j:log4j:jar:1.2.16
    org.apache.logging.log4j:log4j-api:jar:2.10.0
    com.googlecode.log4jdbc:log4jdbc:jar:1.2
    org.apache.logging.log4j:log4j-to-slf4j:jar:2.10.0
    ch.qos.logback:logback-classic:jar:1.2.3
    ch.qos.logback:logback-core:jar:1.2.3
    io.dropwizard.metrics:metrics-core:jar:4.1.6
    io.dropwizard.metrics:metrics-healthchecks:jar:4.1.6
    io.dropwizard.metrics:metrics-jmx:jar:4.1.6
    io.micrometer:micrometer-core:jar:1.0.6
    org.jvnet.mimepull:mimepull:jar:1.9.6
    com.microsoft.sqlserver:mssql-jdbc:jar:6.2.2.jre8
    com.netflix.netflix-commons:netflix-commons-util:jar:0.3.0
    com.netflix.netflix-commons:netflix-statistics:jar:0.1.1
    io.netty:netty-buffer:jar:4.1.27.Final
    io.netty:netty-codec:jar:4.1.27.Final
    io.netty:netty-codec-http:jar:4.1.27.Final
    io.netty:netty-common:jar:4.1.27.Final
    io.netty:netty-resolver:jar:4.1.27.Final
    io.netty:netty-transport:jar:4.1.27.Final
    io.netty:netty-transport-native-epoll:jar:4.1.27.Final
    io.netty:netty-transport-native-unix-common:jar:4.1.27.Final
    com.nimbusds:nimbus-jose-jwt:jar:8.3

可能有更简单的方法,但这是我现在可以做的......可能可以通过一些调整缩小到单行

[07:38 am alex ~]$ date; cat a
Wed  4 Nov 07:38:21 GMT 2020
com.sun.jersey:jersey-client:jar:1.19.1
org.glassfish.jersey.core:jersey-client:jar:2.26

[07:38 am alex ~]$ FIRST=`cat a | awk -F'[:]' '{print }' | uniq`
[07:38 am alex ~]$ SECOND=`cat a | awk -F'[:]' '{print ":"}' | xargs | sed 's/ /,/g'`
[07:38 am alex ~]$ echo "$FIRST | $SECOND"
jersey-client | com.sun.jersey:1.19.1,org.glassfish.jersey.core:2.26

能否请您尝试跟随,这可以在一个 awk 本身中完成。完全基于您展示的示例。

awk '
BEGIN{
  FS=":"
  OFS=" | "
}
FNR==1{
  first=
  third=
  second=
  next
}
FNR==2{
  print second,first","":"$NF
}
' Input_file

说明: 为以上添加详细说明。

awk '                             ##Starting awk program from here.
BEGIN{                            ##Starting BEGIN section of this program from here.
  FS=":"                          ##Setting field separator colon here.
  OFS=" | "                       ##Setting output field separator as space | space here.
}
FNR==1{                           ##Checking conditon if this is first line then do following.
  first=                        ##Creating first with 1st field value.
  third=                        ##Creating third with 3rd field value.
  second=                       ##Creating second with 2nd field value of current line.
  next                            ##next will skip all further statements from here.
}
FNR==2{                           ##Checking condition if this is 2nd line then do following.
  print second,first","":"$NF   ##Printing second first first field and last field of current line.
}
' Input_file                      ##Mentioning Input_file name here.