grep,从文件中剪切并删除 \n

grep, cut and remove \n from file

我正在处理一个输入文件,该文件在新的一行中包含用户 ID 列表。在 bash 脚本中,我 运行 在该输入文件上进行 while 循环,使用 grep -E 执行 ldapsearch 查询以过滤我想要的结果。生成的输出文件当前格式如下 (/mountpoint/out_file_1.out);

uid=user_id1,cn=Users,ou=Department,dc=myORG    
LDAPresource=myORG_RESname1   LDAPresource=myORG_RESname2  
uid=user_id2,cn=Users,ou=Department,dc=myORG  
LDAPresource=myORG_RESname2   LDAPresource=myORG_RESname3

然而,所需的输出应如下所示;

user_id1;myORG_RESname1
user_id1;myORG_RESname2
user_id2;myORG_RESname2
user_id2;myORG_RESname3

到目前为止,我已经尝试使用 grep 和 cut 来实现上述所需的输出。这是我在上面第一个结果文件上 运行ning 的确切命令:

grep -E '(^uid=|myORG_RESname1|myORG_RESname2|myORG_RESname3)' /mountpoint/out_file_1.out | cut -d, -f1 >&5

这会导致第二个输出 (/mountpoint/out_file_2.out);

uid=user_id1  
LDAPresource=myORG_RESname1     
LDAPresource=myORG_RESname2  

再次,运行使用 cut 进行另一个 grep:

grep -E 'LDAPresource|uid=' /mountpoint/out_file_2.out | cut -d= -f2 >&6

最终生成此输出 (/mountpoint/out_file_3.out):

user_id1  
myORG_RESname1  
myORG_RESname2  

这“几乎”是我所需要的。我生成的最后一个输出需要去掉换行符并为已找到的每个资源名称重复用户 ID,如已针对所需输出 (/mountpoint/final_output.out) 所述:

user_id1;myORG_RESname1  
user_id1;myORG_RESname2 

使用:

tr '\n' ';' < input_file > output_file 没有给我想要的结果...

有什么想法可以实现吗?非常感谢任何帮助。

编辑:

这是我正在 运行 参考的实际 bash 脚本:

#!/bin/bash

# assign file descriptor for input fd
exec 3< /mountpoint/userlist
# assign file descriptor for output fd unfiltered
exec 4> /mountpoint/out_file_1.out
# assign file descriptor for output fd filtered
exec 5> /mountpoint/out_file_2.out
# assign file descriptor for output fd final
exec 6> /mountpoint/out_file_3.out

while IFS= read -ru 3 LINE; do
    ldapsearch -h IPADDR -D "uid=admin,cn=Users,ou=Department,dc=myDC" -w somepwd "(uid=$LINE)" LDAPresource >&4
    grep -E '(^uid=|Resource1|Resource2|Resource3)' /mountpoint/out_file_1.out | cut -d, -f1 >&5
    grep -E 'TAMresource|uid=' /mountpoint/out_file_2.out | cut -d= -f2 >&6
    #tr '\n' ';' < input_filename > file
done
# close fd #3 inputfile
exec 3<&-
# close fd #4 & 5 outputfiles
exec 4>&-
exec 5>&-
# exit with 0 success status
exit 0

使用您显示的示例,请尝试执行以下操作。使用 GNU awk.

中显示的示例编写和测试
awk '
match([=10=],/uid=[^,]*/){
  val1=substr([=10=],RSTART+4,RLENGTH-4)
  next
}
{
  val=""
  while([=10=]){
    match([=10=],/LDAPresource=[^ ]*/)
    val=(val?val OFS:"")(val1 ";" substr([=10=],RSTART+13,RLENGTH-13))
    [=10=]=substr([=10=],RSTART+RLENGTH)
  }
  print val
}' Input_file

说明: 为以上添加详细说明。

awk '                                 ##Starting awk program from here.
match([=11=],/uid=[^,]*/){                ##Using match function to match regex uid= till comma comes in current line.
  val1=substr([=11=],RSTART+4,RLENGTH-4)  ##Creating val1 variable which has sub string of matched regex of above.
  next                                ##next will skip all further statements from here.
}
{
  val=""                              ##Nullifying val variable here.
  while([=11=]){                          ##Running loop till current line value is not null.
    match([=11=],/LDAPresource=[^ ]*/)    ##using match to match regex from string LDAPresource= till space comes.
    val=(val?val OFS:"")(val1 ";" substr([=11=],RSTART+13,RLENGTH-13))  ##Creating val which has val1 ; and sub string of above matched regex.
    [=11=]=substr([=11=],RSTART+RLENGTH)      ##Saving rest of line in current line.
  }
  print val                           ##Printing val here.
}' Input_file                         ##Mentioning Input_file name here.

您要执行的转换的规范不清楚。看起来您想成对处理行,采用每对第一行表示的 uid 属性和每对第二行指定的两个 LDAPresource 属性,并将它们组合成两行,每行包含id;resource 对。

首先,我不会为此使用 grepcutsedawk 将是更合适的工具。我更像是一个 sed 的人而不是 awk 的人,但我确信一个非常简单的 awk 脚本可以一次性完成这项工作。对于 sed,我会使用两个:

  • 首先,从你的输入到你的第三个输出是这样的:

    sed 's/^[^=]*=//; s/,.*//; n; s/LDAPresource=//g; s/ \{1,\}/\n/'
    
  • 其次,组合生成的三元组行以实现您想要的输出:

    sed 's/$/;/; h; N; x; N; H; x; s/;\n/;/g'
    

您可以将它们组合到一个命令中(尽管我肯定会建议为此编写一个脚本,而不是在命令行中全部输入):

sed 's/^[^=]*=//; s/,.*//; n; s/LDAPresource=//g; s/ \{1,\}/\n/' /mountpoint/out_file_1.out |
  sed 's/$/;/; h; N; x; N; H; x; s/;\n/;/g'

说明

给定的每个 sed 命令指定一个以分号分隔的步骤序列,该序列将在一个循环中执行,直到输入用完。

这是多行形式的第一个,带有注释

# The next line of input is implicitly read into sed's pattern space, sans trailing newline

# Replace the leading substring up to the first '=' with nothing (that is, delete it)
s/^[^=]*=//

# Replace the substring from the first comma to the end of the line with nothing.
# This leaves just the uid value.
s/,.*//

# Print the contents of the pattern space followed by a newline (supposes that the
# -n command line option has not been given) and replace the contents of the pattern
# space with the next line of input.
n

# Replace all substrings 'LDAPresource=' in the pattern space with nothing
s/LDAPresource=//g

# Replace the first (and only) run of one or more consecutive space characters with a newline
s/ \{1,\}/\n/

# The remaining contents of the pattern space and a trailing newline are printed at this point
# (assuming no '-n' option) and the cycle repeats.

第二个是:

# The next line of input is implicitly read into sed's pattern space sans trailing newline

# Substitute a semicolon (;) for the zero-length space at the end of the line (that
# is, append a semicolon).
s/$/;/

# Copy the contents of the pattern space into the hold space.  Both spaces then contain
# the uid plus a semicolon
h

# Append a newline followed by the next line of input (sans trailing newline) to the
# pattern space
N

# Swap the contents of the pattern and hold spaces.
x

# Append a newline followed by the next line of input (sans trailing newline) to the
# pattern space
N

# Append a newline followed by the contents of the pattern space to the hold space.
# After this, the contents of the hold space have the form
# <uid>;<newline><resource1><newline><uid>;<newline><resource2>
H

# Swap the pattern and hold spaces
x

# Replace each (semicolon, newline) pair with just a semicolon.  This completes
# joining the uid and resource pairs into semicolon-(only-)delimited form,
# leaving a newline between each pair
s/;\n/;/g

# The remaining contents of the pattern space and a trailing newline are printed at this
# point (assuming no '-n' option) and the cycle repeats.
$ awk -F'[=,[:space:]]+' -v OFS=',' 'NR%2{uid=; next} {print uid,  ORS uid, }' file
user_id1,myORG_RESname1
user_id1,myORG_RESname2
user_id2,myORG_RESname2
user_id2,myORG_RESname3