将部分行转换为列
Convert part of row to columns
我有一个包含输入的文件:
rownum,identifier,items_in_list
1,"ABC",{(123),(345),(69),(95),(90),(83),(3A)}
预期输出为:
rownum,identifier,items_in_list
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A
我尝试使用 "awk" 但它是为了将列中的所有项目转换为行,但我的只需要一些列到行..
我的代码:
echo "1,"ABC",{(123),(345),(69),(95),(90),(83),(3A)}" | awk -vRS="{" 'NF'
但这会转换为:
1,ABC,
(123),(345),(69),(95),(90),(83),(3A)}
更新:
您的所有命令都可以正常工作,但对于一个小故障,抱歉作为新手,我只能投票给一个答案。
谢谢!但是如果这些行没有多个数字而只有一个数字,我就会遇到麻烦。例如,采用这种格式:
输入
1,33262,"ABC",{(64)}
1,33263,"ABC",{(66),(57)}
实际输出:
1,33262,SOME_FIELD_NAME
1,33262,64
1,33263,SOME_FIELD_NAME
1,33262,65,66
所需输出:
1,33262,SOME_FIELD_NAME,64
1,33263,SOME_FIELD_NAME,65
1,33263,SOME_FIELD_NAME,66
更新:
"Actual Output" Jotne 建议的代码:awk -F, '{a=$1","$2;gsub(/[{()}]/,"");for (i=3 ;i<=NF;i++) 打印一个","$i}' 文件。
抱歉,我的输入有时有 2 个前导字段,有时有 3-10 个前导字段,但我们要转换为列的行始终以“{”开头,各个数字包含在“()”和行尾用'}'表示。 Jotne 的代码适用于 2 个领先领域,但不能适用于 3 个领先领域。有人可以建议一种解析字段的通用方法吗?
这是 awk
的一种方式
awk -F, '{a=",";gsub(/[{()}]/,"");for (i=3;i<=NF;i++) print a","$i}' file
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A
使用RS
awk -vRS=, '{gsub(/[{()}]/,"")} NR==1 {a=;next} NR==2 {a=a",";next} {print a","}' file
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A
如果您仍在寻找 Python 解决方案:
input = '1,"ABC",{(123),(345),(69),(95),(90),(83),(3A)}'
for extra_char in '{}()"':
input = input.replace(extra_char, '')
input_elems = input.split(',')
rownum, identifier = input_elems[0:2]
for item in input_elems[2:]:
print rownum, identifier, item
基于Python的解决方案:
import csv
import re
data = ['rownum,identifier,items_in_list',
'1,"ABC",{(123),(345),(69),(95),(90),(83),(3A)}']
reader = csv.reader(data) # change data to open(filename, 'rb')
pat = r'{*\(([0-9a-fA-F]+)\)}*'
next(reader)
for row in reader:
for elem in row[2:]:
mat = re.search(pat, elem).group(1)
print(','.join([row[0], '"{}"'.format(row[1]), mat]))
输出:
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A
awk -F, '{gsub(/)./,ORS); gsub(/(^[^(]+)?[(]/, OFS OFS); printf "%s",[=10=]}' file
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A
我有一个包含输入的文件:
rownum,identifier,items_in_list
1,"ABC",{(123),(345),(69),(95),(90),(83),(3A)}
预期输出为:
rownum,identifier,items_in_list
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A
我尝试使用 "awk" 但它是为了将列中的所有项目转换为行,但我的只需要一些列到行..
我的代码:
echo "1,"ABC",{(123),(345),(69),(95),(90),(83),(3A)}" | awk -vRS="{" 'NF'
但这会转换为:
1,ABC,
(123),(345),(69),(95),(90),(83),(3A)}
更新:
您的所有命令都可以正常工作,但对于一个小故障,抱歉作为新手,我只能投票给一个答案。
谢谢!但是如果这些行没有多个数字而只有一个数字,我就会遇到麻烦。例如,采用这种格式:
输入
1,33262,"ABC",{(64)}
1,33263,"ABC",{(66),(57)}
实际输出:
1,33262,SOME_FIELD_NAME
1,33262,64
1,33263,SOME_FIELD_NAME
1,33262,65,66
所需输出:
1,33262,SOME_FIELD_NAME,64
1,33263,SOME_FIELD_NAME,65
1,33263,SOME_FIELD_NAME,66
更新:
"Actual Output" Jotne 建议的代码:awk -F, '{a=$1","$2;gsub(/[{()}]/,"");for (i=3 ;i<=NF;i++) 打印一个","$i}' 文件。
抱歉,我的输入有时有 2 个前导字段,有时有 3-10 个前导字段,但我们要转换为列的行始终以“{”开头,各个数字包含在“()”和行尾用'}'表示。 Jotne 的代码适用于 2 个领先领域,但不能适用于 3 个领先领域。有人可以建议一种解析字段的通用方法吗?
这是 awk
awk -F, '{a=",";gsub(/[{()}]/,"");for (i=3;i<=NF;i++) print a","$i}' file
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A
使用RS
awk -vRS=, '{gsub(/[{()}]/,"")} NR==1 {a=;next} NR==2 {a=a",";next} {print a","}' file
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A
如果您仍在寻找 Python 解决方案:
input = '1,"ABC",{(123),(345),(69),(95),(90),(83),(3A)}'
for extra_char in '{}()"':
input = input.replace(extra_char, '')
input_elems = input.split(',')
rownum, identifier = input_elems[0:2]
for item in input_elems[2:]:
print rownum, identifier, item
Python的解决方案:
import csv
import re
data = ['rownum,identifier,items_in_list',
'1,"ABC",{(123),(345),(69),(95),(90),(83),(3A)}']
reader = csv.reader(data) # change data to open(filename, 'rb')
pat = r'{*\(([0-9a-fA-F]+)\)}*'
next(reader)
for row in reader:
for elem in row[2:]:
mat = re.search(pat, elem).group(1)
print(','.join([row[0], '"{}"'.format(row[1]), mat]))
输出:
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A
awk -F, '{gsub(/)./,ORS); gsub(/(^[^(]+)?[(]/, OFS OFS); printf "%s",[=10=]}' file
1,"ABC",123
1,"ABC",345
1,"ABC",69
1,"ABC",95
1,"ABC",90
1,"ABC",83
1,"ABC",3A