我的代码中 pandas 到 csv 有什么问题?
What is the problem with the pandas to csv in my code?
我是运行我正在做的一个项目的代码,目的是寻找迪斯尼乐园等待时间的模式:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_pirates_all = pd.read_csv(
"https://cdn.touringplans.com/datasets/pirates_of_caribbean_dlr.csv",usecols=['date','datetime','SPOSTMIN'],
parse_dates=['date', 'datetime'],
)
df_pirates_all['ride'] = 'pirates'
df_pirates_all['open'] = ~((df_pirates_all['SPOSTMIN'] == -999))
df_pirates = df_pirates_all.set_index('datetime').sort_index()
df_pirates = df_pirates.loc['2017-01-01 06:00':'2017-02-01 00:00']
df_pirates = df_pirates.resample('15Min').ffill()
df_star_tours_all = pd.read_csv(
"https://cdn.touringplans.com/datasets/star_tours_dlr.csv", usecols=['date','datetime','SPOSTMIN'],
parse_dates=['date', 'datetime']
)
df_star_tours_all['ride'] = 'star_tours'
df_star_tours_all['open'] = ~((df_star_tours_all['SPOSTMIN'] == -999))
df_star_tours = df_star_tours_all.set_index('datetime').sort_index()
df_star_tours = df_star_tours.loc['2017-01-01 06:00':'2017-02-01 00:00']
df_star_tours = df_star_tours.resample('15Min').ffill()
df_space_all = pd.read_csv(
"https://cdn.touringplans.com/datasets/space_mountain_dlr.csv", usecols=['date','datetime','SPOSTMIN'],
parse_dates=['date', 'datetime']
)
df_space_all['ride'] = 'space'
df_space_all['open'] = ~((df_space_all['SPOSTMIN'] == -999))
df_space = df_space_all.set_index('datetime').sort_index()
df_space = df_space.loc['2017-01-01 06:00':'2017-02-01 00:00']
df_space = df_space.resample('15Min').ffill()
all_data = pd.concat([df_pirates, df_star_tours, df_space]).reset_index()
all_data = (
all_data
# Drop any "NaN" values in the column 'ride'
.dropna(subset=['ride', ])
# Make datetime and ride a "Multi-Index"
.set_index(['datetime', 'ride'])
# Choose the column 'SPOSTMIN'
['SPOSTMIN']
# Take the last index ('ride') and rotate to become column names
.unstack()
)
# print (all_data)
for month, group in all_data.groupby(pd.Grouper(freq='M')):
with pd.ExcelWriter(f'{month}.xlsx') as writer:
for day, dfsub in group.groupby(pd.Grouper(freq='D')):
dfsub.to_excel(writer, sheet_name='day')
但是我 运行 遇到了这个错误
FileCreateError: [Errno 22] Invalid argument: '2017-01-31 00:00:00.xlsx'
并且连接到dfsub.to_excel线。
它大部分已通过评论修复,但是,只有一个 sheet 出现并且它只有最后一天的数据 (1-31-17) 而不是单独的 sheet 1-1-17,1-2-17,等等
对于基于代码的第一个错误,您不关心具体的日期和时间,所以这样做:
with pd.ExcelWriter(f'{month.date()}.xlsx'):
这会将日期时间对象转换为日期对象
你的第二个错误是说你正在尝试创建一个不完全唯一的列作为索引,pandas 不允许。
也许有些字段可以合并或使用另一个字段?
解决问题的方法是更改
中的代码
for month, group in all_data.groupby(pd.Grouper(freq='M')):
with pd.ExcelWriter(f'{month}.xlsx') as writer:
for day, dfsub in group.groupby(pd.Grouper(freq='D')):
dfsub.to_excel(writer, sheet_name='day')
至
for month, group in all_data.groupby(pd.Grouper(freq='M')):
with pd.ExcelWriter(f'{month.strftime("%B %Y")}.xlsx') as writer:
for day, dfsub in group.groupby(pd.Grouper(freq='D')):
dfsub.to_excel(writer,sheet_name=str(day.date()))
根据提出的建议。
我是运行我正在做的一个项目的代码,目的是寻找迪斯尼乐园等待时间的模式:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_pirates_all = pd.read_csv(
"https://cdn.touringplans.com/datasets/pirates_of_caribbean_dlr.csv",usecols=['date','datetime','SPOSTMIN'],
parse_dates=['date', 'datetime'],
)
df_pirates_all['ride'] = 'pirates'
df_pirates_all['open'] = ~((df_pirates_all['SPOSTMIN'] == -999))
df_pirates = df_pirates_all.set_index('datetime').sort_index()
df_pirates = df_pirates.loc['2017-01-01 06:00':'2017-02-01 00:00']
df_pirates = df_pirates.resample('15Min').ffill()
df_star_tours_all = pd.read_csv(
"https://cdn.touringplans.com/datasets/star_tours_dlr.csv", usecols=['date','datetime','SPOSTMIN'],
parse_dates=['date', 'datetime']
)
df_star_tours_all['ride'] = 'star_tours'
df_star_tours_all['open'] = ~((df_star_tours_all['SPOSTMIN'] == -999))
df_star_tours = df_star_tours_all.set_index('datetime').sort_index()
df_star_tours = df_star_tours.loc['2017-01-01 06:00':'2017-02-01 00:00']
df_star_tours = df_star_tours.resample('15Min').ffill()
df_space_all = pd.read_csv(
"https://cdn.touringplans.com/datasets/space_mountain_dlr.csv", usecols=['date','datetime','SPOSTMIN'],
parse_dates=['date', 'datetime']
)
df_space_all['ride'] = 'space'
df_space_all['open'] = ~((df_space_all['SPOSTMIN'] == -999))
df_space = df_space_all.set_index('datetime').sort_index()
df_space = df_space.loc['2017-01-01 06:00':'2017-02-01 00:00']
df_space = df_space.resample('15Min').ffill()
all_data = pd.concat([df_pirates, df_star_tours, df_space]).reset_index()
all_data = (
all_data
# Drop any "NaN" values in the column 'ride'
.dropna(subset=['ride', ])
# Make datetime and ride a "Multi-Index"
.set_index(['datetime', 'ride'])
# Choose the column 'SPOSTMIN'
['SPOSTMIN']
# Take the last index ('ride') and rotate to become column names
.unstack()
)
# print (all_data)
for month, group in all_data.groupby(pd.Grouper(freq='M')):
with pd.ExcelWriter(f'{month}.xlsx') as writer:
for day, dfsub in group.groupby(pd.Grouper(freq='D')):
dfsub.to_excel(writer, sheet_name='day')
但是我 运行 遇到了这个错误
FileCreateError: [Errno 22] Invalid argument: '2017-01-31 00:00:00.xlsx'
并且连接到dfsub.to_excel线。
它大部分已通过评论修复,但是,只有一个 sheet 出现并且它只有最后一天的数据 (1-31-17) 而不是单独的 sheet 1-1-17,1-2-17,等等
对于基于代码的第一个错误,您不关心具体的日期和时间,所以这样做:
with pd.ExcelWriter(f'{month.date()}.xlsx'):
这会将日期时间对象转换为日期对象
你的第二个错误是说你正在尝试创建一个不完全唯一的列作为索引,pandas 不允许。
也许有些字段可以合并或使用另一个字段?
解决问题的方法是更改
中的代码for month, group in all_data.groupby(pd.Grouper(freq='M')):
with pd.ExcelWriter(f'{month}.xlsx') as writer:
for day, dfsub in group.groupby(pd.Grouper(freq='D')):
dfsub.to_excel(writer, sheet_name='day')
至
for month, group in all_data.groupby(pd.Grouper(freq='M')):
with pd.ExcelWriter(f'{month.strftime("%B %Y")}.xlsx') as writer:
for day, dfsub in group.groupby(pd.Grouper(freq='D')):
dfsub.to_excel(writer,sheet_name=str(day.date()))
根据提出的建议。