IndexError 阻碍代码处理更大的 csv 文件
IndexError obstructing code from working with larger csv file
我有使用 groupby 对 csv 进行排序然后绘制信息的数据。我使用了一小部分信息样本来创建代码。它 运行 很顺利,然后我尝试 运行 它处理巨大的数据文件。
我是 Python 的新手,这个问题一直很令人沮丧,所以即使是有关如何解决此问题的建议也会有所帮助。
我的代码在此部分停止:
import pandas as pd
df =pd.DataFrame.from_csv('MYDATA.csv')
mode = lambda ts: ts.value_counts(sort=True).index[0]
我尝试只选择庞大数据文件的一部分 运行,但对于整个文件我都收到了这个错误:
IndexError:索引 0 超出了大小为 0 的轴 0 的范围
但我并排查看了两个数据集,发现列是一样的!我注意到大文件有一些 utf8 重音问题,我正在努力解决这些问题,但这个 IndexError 让我感到困惑。
这是回溯
runfile('C:/Users/jbyrusb/Documents/Python Scripts/Tests/tests/TopSixCustomersExecute.py', wdir='C:/Users/jbyrusb/Documents/Python Scripts/Tests/tests')
Traceback (most recent call last):
File "<ipython-input-45-53a2a006076e>", line 1, in <module>
runfile('C:/Users/jbyrusb/Documents/Python Scripts/Tests/tests/TopSixCustomersExecute.py', wdir='C:/Users/jbyrusb/Documents/Python Scripts/Tests/tests')
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/jbyrusb/Documents/Python Scripts/Tests/tests/TopSixCustomersExecute.py", line 23, in <module>
df = df.groupby('CompanyName')[['Column1','Name', 'Birthday', 'Country', 'County']].agg(mode).T.reindex(columns=cols)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.py", line 676, in agg
return self.aggregate(func, *args, **kwargs)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2674, in aggregate
result = self._aggregate_generic(arg, *args, **kwargs)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2722, in _aggregate_generic
return self._aggregate_item_by_item(func, *args, **kwargs)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2751, in _aggregate_item_by_item
colg.aggregate(func, *args, **kwargs), data)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2307, in aggregate
result = self._aggregate_named(func_or_funcs, *args, **kwargs)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2394, in _aggregate_named
output = func(group, *args, **kwargs)
File "C:/Users/jbyrusb/Documents/Python Scripts/Tests/tests/TopSixCustomersExecute.py", line 20, in <lambda>
mode = lambda ts: ts.value_counts(sort=True).index[0]
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\index.py", line 915, in __getitem__
return getitem(key)
IndexError: index 0 is out of bounds for axis 0 with size 0
很难不看到导致错误的数据,但试试这个:
mode = (lambda ts: ts.value_counts(sort=True).index[0]
if len(ts.value_counts(sort=True)) else None)
我通过更改 sep 参数解决了同样的问题
九月='\t'
到
sep=','.
希望能救人。
我有使用 groupby 对 csv 进行排序然后绘制信息的数据。我使用了一小部分信息样本来创建代码。它 运行 很顺利,然后我尝试 运行 它处理巨大的数据文件。
我是 Python 的新手,这个问题一直很令人沮丧,所以即使是有关如何解决此问题的建议也会有所帮助。
我的代码在此部分停止:
import pandas as pd
df =pd.DataFrame.from_csv('MYDATA.csv')
mode = lambda ts: ts.value_counts(sort=True).index[0]
我尝试只选择庞大数据文件的一部分 运行,但对于整个文件我都收到了这个错误:
IndexError:索引 0 超出了大小为 0 的轴 0 的范围
但我并排查看了两个数据集,发现列是一样的!我注意到大文件有一些 utf8 重音问题,我正在努力解决这些问题,但这个 IndexError 让我感到困惑。
这是回溯
runfile('C:/Users/jbyrusb/Documents/Python Scripts/Tests/tests/TopSixCustomersExecute.py', wdir='C:/Users/jbyrusb/Documents/Python Scripts/Tests/tests')
Traceback (most recent call last):
File "<ipython-input-45-53a2a006076e>", line 1, in <module>
runfile('C:/Users/jbyrusb/Documents/Python Scripts/Tests/tests/TopSixCustomersExecute.py', wdir='C:/Users/jbyrusb/Documents/Python Scripts/Tests/tests')
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Users/jbyrusb/Documents/Python Scripts/Tests/tests/TopSixCustomersExecute.py", line 23, in <module>
df = df.groupby('CompanyName')[['Column1','Name', 'Birthday', 'Country', 'County']].agg(mode).T.reindex(columns=cols)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.py", line 676, in agg
return self.aggregate(func, *args, **kwargs)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2674, in aggregate
result = self._aggregate_generic(arg, *args, **kwargs)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2722, in _aggregate_generic
return self._aggregate_item_by_item(func, *args, **kwargs)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2751, in _aggregate_item_by_item
colg.aggregate(func, *args, **kwargs), data)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2307, in aggregate
result = self._aggregate_named(func_or_funcs, *args, **kwargs)
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\groupby.py", line 2394, in _aggregate_named
output = func(group, *args, **kwargs)
File "C:/Users/jbyrusb/Documents/Python Scripts/Tests/tests/TopSixCustomersExecute.py", line 20, in <lambda>
mode = lambda ts: ts.value_counts(sort=True).index[0]
File "C:\Users\jbyrusb\AppData\Local\Continuum\Anaconda\lib\site-packages\pandas\core\index.py", line 915, in __getitem__
return getitem(key)
IndexError: index 0 is out of bounds for axis 0 with size 0
很难不看到导致错误的数据,但试试这个:
mode = (lambda ts: ts.value_counts(sort=True).index[0]
if len(ts.value_counts(sort=True)) else None)
我通过更改 sep 参数解决了同样的问题 九月='\t' 到 sep=','.
希望能救人。