Handling error "TypeError: Expected tuple, got str" loading a CSV to pandas multilevel and multiindex (pandas)
Handling error "TypeError: Expected tuple, got str" loading a CSV to pandas multilevel and multiindex (pandas)
我正在尝试加载 CSV 文件 (this file) 以创建多索引和多级数据框。它在列中有 5(五)个索引 和 3(三个)级别。
我该怎么办?这是代码:
df = pd.read_csv('./teste.csv'
,index_col=[0,1,2,3,4]
,header=[0,1,2,3]
,skipinitialspace=True
,tupleize_cols=True)
df.columns = pd.MultiIndex.from_tuples(df.columns)
预期输出:
variables u \
level 1
days 1 2
times 00h 06h 12h 18h 00h
wsid lat lon start prcp_24
329 -43.969397 -19.883945 2007-03-18 10:00:00 72.0 0 0 0 0 0
2007-03-20 10:00:00 104.4 0 0 0 0 0
2007-10-18 23:00:00 92.8 0 0 0 0 0
2007-12-21 00:00:00 60.4 0 0 0 0 0
2008-01-19 18:00:00 53.0 0 0 0 0 0
2008-04-05 01:00:00 80.8 0 0 0 0 0
2008-10-31 17:00:00 101.8 0 0 0 0 0
2008-11-01 04:00:00 82.0 0 0 0 0 0
2008-12-29 00:00:00 57.8 0 0 0 0 0
2009-03-28 10:00:00 72.4 0 0 0 0 0
2009-10-07 02:00:00 57.8 0 0 0 0 0
2009-10-08 00:00:00 83.8 0 0 0 0 0
2009-11-28 16:00:00 84.4 0 0 0 0 0
2009-12-18 04:00:00 51.8 0 0 0 0 0
2009-12-28 00:00:00 96.4 0 0 0 0 0
2010-01-06 05:00:00 74.2 0 0 0 0 0
2011-12-18 00:00:00 113.6 0 0 0 0 0
2011-12-19 00:00:00 90.6 0 0 0 0 0
2012-11-15 07:00:00 85.8 0 0 0 0 0
2013-10-17 00:00:00 52.4 0 0 0 0 0
2014-04-01 22:00:00 72.0 0 0 0 0 0
2014-10-20 06:00:00 56.6 0 0 0 0 0
2014-12-13 09:00:00 104.4 0 0 0 0 0
2015-02-09 00:00:00 62.0 0 0 0 0 0
2015-02-16 19:00:00 56.8 0 0 0 0 0
2015-05-06 17:00:00 50.8 0 0 0 0 0
2016-02-26 00:00:00 52.2 0 0 0 0 0
我需要处理错误 "TypeError: Expected tuple, got str":
TypeError: Expected tuple, got str
您收到错误消息是因为您的某些列不是元组,它们是 df.columns
中从索引 2368
到 2959
的字符串。
列为字符串的索引:
df.columns[2368:2959]
Index(['('z', '1', '1', '00h').1', '('z', '1', '1', '06h').1',
'('z', '1', '1', '12h').1', '('z', '1', '1', '18h').1',
'('z', '1', '2', '00h').1', '('z', '1', '2', '06h').1',
'('z', '1', '2', '12h').1', '('z', '1', '2', '18h').1',
'('z', '1', '3', '00h').1', '('z', '1', '3', '06h').1',
...
'('z', '1000', '2', '06h').1', '('z', '1000', '2', '12h').1',
'('z', '1000', '2', '18h').1', '('z', '1000', '3', '00h').1',
'('z', '1000', '3', '06h').1', '('z', '1000', '3', '12h').1',
'('z', '1000', '3', '18h').1', '('z', '1000', '4', '00h').1',
'('z', '1000', '4', '06h').1', '('z', '1000', '4', '12h').1'],
dtype='object', length=591)
由于您想要使用元组的多索引列数据框,因此我们首先通过使用 re.findall
和 regex pattern = '(\(.*?\)).'
获取必要的子字符串来清理这些字符串,然后将此值传递给 ast.literal_eval
用于自动将字符串转换为元组。最后,使用 pd.MultiIndex.from_tuples
作为:
df = pd.read_csv('teste.csv',index_col=[0,1,2,3,4],header=[0,1,2,3],parse_dates=True)
import re
import ast
column_list = []
for column in df.columns:
if isinstance(column,str):
column_list.append(ast.literal_eval(re.findall('(\(.*?\)).',column)[0]))
else:
column_list.append(column)
df.columns = pd.MultiIndex.from_tuples(column_list,
names=('variables', 'level','days','times'))
print(df.iloc[:,:6].head())
variables u
level 1
days 1 2
times 00h 06h 12h 18h 00h 06h
wsid lat lon start prcp_24
329 -43.969397 -19.883945 2007-03-18 10:00:00 72.0 0 0 0 0 0 0
2007-03-20 10:00:00 104.4 0 0 0 0 0 0
2007-10-18 23:00:00 92.8 0 0 0 0 0 0
2007-12-21 00:00:00 60.4 0 0 0 0 0 0
2008-01-19 18:00:00 53.0 0 0 0 0 0 0
我正在尝试加载 CSV 文件 (this file) 以创建多索引和多级数据框。它在列中有 5(五)个索引 和 3(三个)级别。
我该怎么办?这是代码:
df = pd.read_csv('./teste.csv'
,index_col=[0,1,2,3,4]
,header=[0,1,2,3]
,skipinitialspace=True
,tupleize_cols=True)
df.columns = pd.MultiIndex.from_tuples(df.columns)
预期输出:
variables u \
level 1
days 1 2
times 00h 06h 12h 18h 00h
wsid lat lon start prcp_24
329 -43.969397 -19.883945 2007-03-18 10:00:00 72.0 0 0 0 0 0
2007-03-20 10:00:00 104.4 0 0 0 0 0
2007-10-18 23:00:00 92.8 0 0 0 0 0
2007-12-21 00:00:00 60.4 0 0 0 0 0
2008-01-19 18:00:00 53.0 0 0 0 0 0
2008-04-05 01:00:00 80.8 0 0 0 0 0
2008-10-31 17:00:00 101.8 0 0 0 0 0
2008-11-01 04:00:00 82.0 0 0 0 0 0
2008-12-29 00:00:00 57.8 0 0 0 0 0
2009-03-28 10:00:00 72.4 0 0 0 0 0
2009-10-07 02:00:00 57.8 0 0 0 0 0
2009-10-08 00:00:00 83.8 0 0 0 0 0
2009-11-28 16:00:00 84.4 0 0 0 0 0
2009-12-18 04:00:00 51.8 0 0 0 0 0
2009-12-28 00:00:00 96.4 0 0 0 0 0
2010-01-06 05:00:00 74.2 0 0 0 0 0
2011-12-18 00:00:00 113.6 0 0 0 0 0
2011-12-19 00:00:00 90.6 0 0 0 0 0
2012-11-15 07:00:00 85.8 0 0 0 0 0
2013-10-17 00:00:00 52.4 0 0 0 0 0
2014-04-01 22:00:00 72.0 0 0 0 0 0
2014-10-20 06:00:00 56.6 0 0 0 0 0
2014-12-13 09:00:00 104.4 0 0 0 0 0
2015-02-09 00:00:00 62.0 0 0 0 0 0
2015-02-16 19:00:00 56.8 0 0 0 0 0
2015-05-06 17:00:00 50.8 0 0 0 0 0
2016-02-26 00:00:00 52.2 0 0 0 0 0
我需要处理错误 "TypeError: Expected tuple, got str":
TypeError: Expected tuple, got str
您收到错误消息是因为您的某些列不是元组,它们是 df.columns
中从索引 2368
到 2959
的字符串。
列为字符串的索引:
df.columns[2368:2959]
Index(['('z', '1', '1', '00h').1', '('z', '1', '1', '06h').1',
'('z', '1', '1', '12h').1', '('z', '1', '1', '18h').1',
'('z', '1', '2', '00h').1', '('z', '1', '2', '06h').1',
'('z', '1', '2', '12h').1', '('z', '1', '2', '18h').1',
'('z', '1', '3', '00h').1', '('z', '1', '3', '06h').1',
...
'('z', '1000', '2', '06h').1', '('z', '1000', '2', '12h').1',
'('z', '1000', '2', '18h').1', '('z', '1000', '3', '00h').1',
'('z', '1000', '3', '06h').1', '('z', '1000', '3', '12h').1',
'('z', '1000', '3', '18h').1', '('z', '1000', '4', '00h').1',
'('z', '1000', '4', '06h').1', '('z', '1000', '4', '12h').1'],
dtype='object', length=591)
由于您想要使用元组的多索引列数据框,因此我们首先通过使用 re.findall
和 regex pattern = '(\(.*?\)).'
获取必要的子字符串来清理这些字符串,然后将此值传递给 ast.literal_eval
用于自动将字符串转换为元组。最后,使用 pd.MultiIndex.from_tuples
作为:
df = pd.read_csv('teste.csv',index_col=[0,1,2,3,4],header=[0,1,2,3],parse_dates=True)
import re
import ast
column_list = []
for column in df.columns:
if isinstance(column,str):
column_list.append(ast.literal_eval(re.findall('(\(.*?\)).',column)[0]))
else:
column_list.append(column)
df.columns = pd.MultiIndex.from_tuples(column_list,
names=('variables', 'level','days','times'))
print(df.iloc[:,:6].head())
variables u
level 1
days 1 2
times 00h 06h 12h 18h 00h 06h
wsid lat lon start prcp_24
329 -43.969397 -19.883945 2007-03-18 10:00:00 72.0 0 0 0 0 0 0
2007-03-20 10:00:00 104.4 0 0 0 0 0 0
2007-10-18 23:00:00 92.8 0 0 0 0 0 0
2007-12-21 00:00:00 60.4 0 0 0 0 0 0
2008-01-19 18:00:00 53.0 0 0 0 0 0 0