python 如何将一个 csv 文件中的州代码映射到另一个 csv 文件中的州名称?
How to map state code from one csv file to state name in another csv file in python?
我有一个 csv
文件,csv_file.csv
,其中每个州都有多条记录,并且州是用一个 ID 标识的。样本看起来像:
state_id,year,value
01,2012,8.0
01,2012,8.1
01,2012,8.0
01,2012,7.7
01,2013,7.3
01,2013,7.0
01,2013,7.0
我想将上面数据集中的 state_id
转换为相应的 state_name
并将记录写入另一个 csv
文件,output.csv
,这样对于每个状态value
字段排成一行,输出变为:
Alabama,8.0,8.1,8.0,7.7,7.3,7.0,7.0
Alaska,8.1,8.1,8.0,7.4,7.25,7.6,7.5
为了进行映射,我有另一个 csv
文件 state.csv
,其中包含映射详细信息:
state_id,state_name
01,Alabama
02,Alaska
04,Arizona
05,Arkansas
06,California
08,Colorado
09,Connecticut
我写了这段代码,但这似乎只转换了 [=] 的 4 条记录(state_id
01
和 year
2012
的前 4 条记录) 16=] 当我打开 Output.csv
时,我只看到 4 条记录,而且对它们来说,value
字段也重复了。我当前的代码是:
reader_csv = csv.reader(open('csv_file.csv', 'rb'))
reader_state = csv.reader(open('states.csv', 'rb'))
file_write = open('Output.csv', 'a')
writer = csv.writer(file_write)
for line in reader_csv:
for states in reader_state:
if line[0] == states[0]:
print line[0]+'='+states[1]
writer.writerow([states[1]]+[line[1]]+[line[2]])
break
file_write.close()
我在这里犯了什么错误,我该如何进行映射才能将 state_id
更改为 state_name
?
您应该将您的州唯一标识符存储在字典中。然后,为 csv_file.csv
.
的每一行访问该对象的值
import csv
reader_csv = csv.reader(open('csv_file.csv', 'r')) # no b flag for python3
file_write = open('output.csv', 'a')
writer = csv.writer(file_write)
# Dictionary construction
with open('states.csv', mode='r') as infile:
reader = csv.reader(infile)
states_dict = {rows[0]:rows[1] for rows in reader}
# File writing
for line in reader_csv:
writer.writerow([states_dict[line[0]]]+[line[1]]+[line[2]])
file_write.close()
import csv
with open('state.csv') as csvfile:
reader = csv.DictReader(csvfile)
states = {row.get('state_id'): row.get('state_name') for row in reader}
with open('csv_file.csv') as csvfile:
reader = csv.DictReader(csvfile)
with open('output.csv', 'wb') as outfile:
fieldnames = ['state_name', 'year', 'value']
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()
for row in reader:
writer.writerow({'state_name': states.get(row.get('state_id')), 'year': row.get('year'), 'value': row.get('value')})
这是我的方法:对于 state.csv,将其转换为 look-up 字典,然后读取输入、翻译、写入:
import csv
with open('state.csv', 'rb') as f:
id2name = dict(csv.reader(f))
with open('csv_file.csv', 'rb') as ifile, open('output.', 'wb') as ofile:
reader = csv.reader(ifile)
writer = csv.writer(ofile)
for state_id, year, value in reader:
state = id2name[state_id]
writer.writerow([state, year, value])
更新
更新代码以将所有值写入同一行。此解决方案使用 itertools.groupby
函数,我们按第一个字段对记录进行分组。输出不会有 header.
import csv
from itertools import groupby
with open('state.csv', 'rb') as f:
id2name = dict(csv.reader(f))
with open('csv_file.csv', 'rb') as ifile, open('output.csv', 'wb') as ofile:
reader = csv.reader(ifile)
next(reader) # skip the header
writer = csv.writer(ofile)
# Group by the state_id, which is the first field (record[0])
group_by_state_id = groupby(reader, lambda record: record[0])
for state_id, record_group in group_by_state_id:
state = id2name[state_id]
values = [value for state_id, year, value in record_group]
writer.writerow([state] + values)
更新 2
如果您的系统安装了 sqlite3
(我的 Mac 随附 pre-installed),那么以下脚本将获得所需的结果。请务必从您的 csv 文件中删除 headers。
-- script.sql
.mode csv
CREATE TABLE state (sid TEXT, name TEXT);
.import state.csv state
CREATE TABLE raw (sid TEXT, year INT, value REAL);
.import csv_file.csv raw
SELECT state.name, group_concat(raw.value)
FROM state, raw
WHERE state.sid = raw.sid
GROUP BY state.name;
使用方法:
$ sqlite3 < script.sql > output.csv
我有一个 csv
文件,csv_file.csv
,其中每个州都有多条记录,并且州是用一个 ID 标识的。样本看起来像:
state_id,year,value
01,2012,8.0
01,2012,8.1
01,2012,8.0
01,2012,7.7
01,2013,7.3
01,2013,7.0
01,2013,7.0
我想将上面数据集中的 state_id
转换为相应的 state_name
并将记录写入另一个 csv
文件,output.csv
,这样对于每个状态value
字段排成一行,输出变为:
Alabama,8.0,8.1,8.0,7.7,7.3,7.0,7.0
Alaska,8.1,8.1,8.0,7.4,7.25,7.6,7.5
为了进行映射,我有另一个 csv
文件 state.csv
,其中包含映射详细信息:
state_id,state_name
01,Alabama
02,Alaska
04,Arizona
05,Arkansas
06,California
08,Colorado
09,Connecticut
我写了这段代码,但这似乎只转换了 [=] 的 4 条记录(state_id
01
和 year
2012
的前 4 条记录) 16=] 当我打开 Output.csv
时,我只看到 4 条记录,而且对它们来说,value
字段也重复了。我当前的代码是:
reader_csv = csv.reader(open('csv_file.csv', 'rb'))
reader_state = csv.reader(open('states.csv', 'rb'))
file_write = open('Output.csv', 'a')
writer = csv.writer(file_write)
for line in reader_csv:
for states in reader_state:
if line[0] == states[0]:
print line[0]+'='+states[1]
writer.writerow([states[1]]+[line[1]]+[line[2]])
break
file_write.close()
我在这里犯了什么错误,我该如何进行映射才能将 state_id
更改为 state_name
?
您应该将您的州唯一标识符存储在字典中。然后,为 csv_file.csv
.
import csv
reader_csv = csv.reader(open('csv_file.csv', 'r')) # no b flag for python3
file_write = open('output.csv', 'a')
writer = csv.writer(file_write)
# Dictionary construction
with open('states.csv', mode='r') as infile:
reader = csv.reader(infile)
states_dict = {rows[0]:rows[1] for rows in reader}
# File writing
for line in reader_csv:
writer.writerow([states_dict[line[0]]]+[line[1]]+[line[2]])
file_write.close()
import csv
with open('state.csv') as csvfile:
reader = csv.DictReader(csvfile)
states = {row.get('state_id'): row.get('state_name') for row in reader}
with open('csv_file.csv') as csvfile:
reader = csv.DictReader(csvfile)
with open('output.csv', 'wb') as outfile:
fieldnames = ['state_name', 'year', 'value']
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
writer.writeheader()
for row in reader:
writer.writerow({'state_name': states.get(row.get('state_id')), 'year': row.get('year'), 'value': row.get('value')})
这是我的方法:对于 state.csv,将其转换为 look-up 字典,然后读取输入、翻译、写入:
import csv
with open('state.csv', 'rb') as f:
id2name = dict(csv.reader(f))
with open('csv_file.csv', 'rb') as ifile, open('output.', 'wb') as ofile:
reader = csv.reader(ifile)
writer = csv.writer(ofile)
for state_id, year, value in reader:
state = id2name[state_id]
writer.writerow([state, year, value])
更新
更新代码以将所有值写入同一行。此解决方案使用 itertools.groupby
函数,我们按第一个字段对记录进行分组。输出不会有 header.
import csv
from itertools import groupby
with open('state.csv', 'rb') as f:
id2name = dict(csv.reader(f))
with open('csv_file.csv', 'rb') as ifile, open('output.csv', 'wb') as ofile:
reader = csv.reader(ifile)
next(reader) # skip the header
writer = csv.writer(ofile)
# Group by the state_id, which is the first field (record[0])
group_by_state_id = groupby(reader, lambda record: record[0])
for state_id, record_group in group_by_state_id:
state = id2name[state_id]
values = [value for state_id, year, value in record_group]
writer.writerow([state] + values)
更新 2
如果您的系统安装了 sqlite3
(我的 Mac 随附 pre-installed),那么以下脚本将获得所需的结果。请务必从您的 csv 文件中删除 headers。
-- script.sql
.mode csv
CREATE TABLE state (sid TEXT, name TEXT);
.import state.csv state
CREATE TABLE raw (sid TEXT, year INT, value REAL);
.import csv_file.csv raw
SELECT state.name, group_concat(raw.value)
FROM state, raw
WHERE state.sid = raw.sid
GROUP BY state.name;
使用方法:
$ sqlite3 < script.sql > output.csv