UNIX:如何从右侧剪切某些并非所有字段长度相同的列

UNIX: How to cut columns from the right where some not all fields are the same length

我有一个数据列表,我需要从某些列中删除某些字符。

这是列表:

JCG2380 GREEN, JULIE C          JR-II BISS CPSC BS   INFO TECH  XXX/XXX-9445
JAG1936 GREEN, JOE A.           SO-I  BISS CPSC BS   INFO TECH  XXX/XXX-7993
ACG4636 GREEN, ADAM C.          JR-II BISS CPSC BS   COMP SCI   XXX/XXX-0437
SPG1696 GREEN, SEAN P.          JR-I  BISS CPSC BS   COMP SCI   XXX/XXX-2398
SEG8835 GREEN, SHAWN E.         FR-II BISS CPSC BS   COMP SCI   XXX/XXX-7149
MCGo599 GREEN, MICHAEL C.       JR-I  BISS CPSC BS   COMP SCI   XXX/XXX-OOOO
GJG1887 GREEN, GREGORY J.       SO-II BISS CPSC BS   INFO TECH  XXX/XXX-4354
NGG5479 GREEN, NICHOLAS G       JR-I  BISS CPSC BS   INFO TECH  XXX/XXX-8268
ZTG7190 GREEN, ZACHARY T.       FR-II BISS CPSC BS   INFO TECH  XXX/XXX-1298
AXG9097 GREEN, ALEXANDER        SO-I  BISS CPSC BS   INFO TECH  XXX/XXX-0313
RJG6624 GREEN, ROBERT J.        SO-II BISS CPSC BS   COMP SCI   XXX/XXX-ZOZI
MWG1990 GREEN, MATTHEW W        SO-II BISS CPSC BS   INFO TECH  XXX/XXX-0581

这里的问题是并非所有字段的大小都相同。请注意亚历山大·格林(倒数第三位)没有中间名的首字母。这使我无法在每一列上统一使用 awk。我的解决方案是剪切文件右侧的所有内容,这样字段分隔符就不会弄乱所有内容。

那么如何使用剪切命令从最右边的列开始并减少7列?

您可以使用剪切,因为您的数据具有固定宽度的字段。

这是我用 ocr 文本得到的结果:

$ cut -c 33-51,73-77 input
JR-II BISS CPSC BS 9445
SO-I  BISS CPSC BS 7993
JR-II BISS CPSC BS 0437
JR-I  BISS CPSC BS 2398
FR-II BISS CPSC BS 7149
JR-I  BISS CPSC BS OOOO
SO-II BISS CPSC BS 4354
JR-I  BISS CPSC BS 8268
FR-II BISS CPSC BS 1298
SO-I  BISS CPSC BS 0313
SO-II BISS CPSC BS ZOZI
SO-II BISS CPSC BS 0581

并匹配您在评论中写的要求:

Exactly what I'm trying to do is get the first character out of the columns that start (from the top entry) with JR, BISS, CPSC, INFO. Then I need the last 4 digits from the phone numbers on the right.

$ cut -c 32-33,38-39,43-44,48-49,64-64,73-77 input
 J B C B 9445
 S B C B 7993
 J B C B 0437
 J B C B 2398
 F B C B 7149
 J B C B OOOO
 S B C B 4354
 J B C B 8268
 F B C B 1298
 S B C B 0313
 S B C B ZOZI
 S B C B 0581

您需要调整实际数据的范围。

以下符合我的理解要求,除了我使用制表符作为输出的字段分隔符以方便您进行调整:

awk 'BEGIN {OFS="\t"} { 
   # Each line is assumed to have a variable number
   # of name fields plus 8 other tokens:
   nnames = NF-8;

   # from the right:
   tel=$NF; 
   subject2=$(NF-1); 
   subject1=$(NF-2);
   bs=$(NF-3); cpsc=$(NF-4); biss=$(NF-5); data=$(NF-6);

   name=;
   for (i=2; i<=nnames;i++) {name=name " " $(i+1)}

   # Adjustments
   data=substr(data,2); biss=substr(biss,2); cpsc=substr(cpsc,2); 
   subject1=substr(subject1,2)
   sub( /[^-]*-/,"", tel);

   print , name, data, biss, cpsc, bs, subject1 " " subject2, tel;
}'

输出:

JCG2380 GREEN, JULIE C  R-II    ISS PSC BS  NFO TECH    9445
JAG1936 GREEN, JOE A.   O-I ISS PSC BS  NFO TECH    7993
ACG4636 GREEN, ADAM C.  R-II    ISS PSC BS  OMP SCI 0437
SPG1696 GREEN, SEAN P.  R-I ISS PSC BS  OMP SCI 2398
SEG8835 GREEN, SHAWN E. R-II    ISS PSC BS  OMP SCI 7149
MCGo599 GREEN, MICHAEL C.   R-I ISS PSC BS  OMP SCI OOOO
GJG1887 GREEN, GREGORY J.   O-II    ISS PSC BS  NFO TECH    4354
NGG5479 GREEN, NICHOLAS G   R-I ISS PSC BS  NFO TECH    8268
ZTG7190 GREEN, ZACHARY T.   R-II    ISS PSC BS  NFO TECH    1298
AXG9097 GREEN, ALEXANDER    O-I ISS PSC BS  NFO TECH    0313
RJG6624 GREEN, ROBERT J.    O-II    ISS PSC BS  OMP SCI ZOZI
MWG1990 GREEN, MATTHEW W    O-II    ISS PSC BS  NFO TECH    0581