Concat / Join / Transform 多列到一个结构列
Concat / Join / Transform multiple columns to one struct column
我有一个非常大的遗留文件,其中包含约 5000 列和大量记录。
许多列被命名为 a_1,a_2,...,a_200
等
我想将列数连接到结构中(以便以后更好地处理数据),所以改为:
_| a_1 | a_2 | a_3 |...
0| true | false | true |...
1| false | true | false |...
我想要结构 a { 1: true, 2: false, ... 200: true }
.
如何使用 Python 转换它,可能是 Panda 的?
列始终具有相同的前缀,如 a_、b_...等
问候
考虑像这样的 CSV
_|a_1|a_2|a_3|a_4|b_1|b_2|b_3|b_4
0|true|false|true|false|true|false|true|false
1|false|true|false|true|false|true|false|true
以下是在 python 中仅使用标准库的方法:
import csv
with open("data.csv", newline='') as csvfile:
reader = csv.reader(csvfile, delimiter='|')
headers = next(reader) # The first line is the table columns
# let's extract the tuples (letter, number) from the table columns
dataranks = [tuple(x.split("_")) for x in headers[1:]] # dataranks = [('a', '1'), ('a', '2'), ('a', '3'), ('a', '4'), ('b', '1'), ('b', '2'), ('b', '3'), ('b', '4')
joined_data = []
for row in reader:
# for each row, let's make a new dictionary
aggregate = {}
# for each value in the row, let's associate it to it's (letter, number) column data tuple
for value, ranks in zip(row[1:], dataranks):
# for each letter, let's use a nested dict for the number values
if ranks[0] not in aggregate:
aggregate[ranks[0]] = {}
# just need to fill the dict now
aggregate[ranks[0]][ranks[1]] = value
# and add it to our list.
joined_data.append(aggregate)
print(joined_data)
joined_data 的内容为:
[{'a': {'1': 'true', '2': 'false', '3': 'true', '4': 'false'},
'b': {'1': 'true', '2': 'false', '3': 'true', '4': 'false'}},
{'a': {'1': 'false', '2': 'true', '3': 'false', '4': 'true'},
'b': {'1': 'false', '2': 'true', '3': 'false', '4': 'true'}}]
我有一个非常大的遗留文件,其中包含约 5000 列和大量记录。
许多列被命名为 a_1,a_2,...,a_200
等
我想将列数连接到结构中(以便以后更好地处理数据),所以改为:
_| a_1 | a_2 | a_3 |...
0| true | false | true |...
1| false | true | false |...
我想要结构 a { 1: true, 2: false, ... 200: true }
.
如何使用 Python 转换它,可能是 Panda 的? 列始终具有相同的前缀,如 a_、b_...等
问候
考虑像这样的 CSV
_|a_1|a_2|a_3|a_4|b_1|b_2|b_3|b_4
0|true|false|true|false|true|false|true|false
1|false|true|false|true|false|true|false|true
以下是在 python 中仅使用标准库的方法:
import csv
with open("data.csv", newline='') as csvfile:
reader = csv.reader(csvfile, delimiter='|')
headers = next(reader) # The first line is the table columns
# let's extract the tuples (letter, number) from the table columns
dataranks = [tuple(x.split("_")) for x in headers[1:]] # dataranks = [('a', '1'), ('a', '2'), ('a', '3'), ('a', '4'), ('b', '1'), ('b', '2'), ('b', '3'), ('b', '4')
joined_data = []
for row in reader:
# for each row, let's make a new dictionary
aggregate = {}
# for each value in the row, let's associate it to it's (letter, number) column data tuple
for value, ranks in zip(row[1:], dataranks):
# for each letter, let's use a nested dict for the number values
if ranks[0] not in aggregate:
aggregate[ranks[0]] = {}
# just need to fill the dict now
aggregate[ranks[0]][ranks[1]] = value
# and add it to our list.
joined_data.append(aggregate)
print(joined_data)
joined_data 的内容为:
[{'a': {'1': 'true', '2': 'false', '3': 'true', '4': 'false'},
'b': {'1': 'true', '2': 'false', '3': 'true', '4': 'false'}},
{'a': {'1': 'false', '2': 'true', '3': 'false', '4': 'true'},
'b': {'1': 'false', '2': 'true', '3': 'false', '4': 'true'}}]