将 csv 读入 pandas 数据框和 pandas,除第一列外的所有列都被删除

Reading csv into pandas dataframe and pandas and all the columns except for the very first one get deleted

我正在清理数据集,我想用该特定列的中值替换异常值 -9999.9。每列代表一个月,这就是我为解决异常值而写的内容。当我完成用中位数替换离群值时,我将重新格式化的列和我未触及的列连接起来。请参阅下面的代码:

**import pandas as pd 
import numpy as np
#Abottsford British Columbia
abottsfordbc = pd.read_csv("/Users/name/Desktop/Python_Scripts/wind_classifier/data_sets/canadian_windspeeds/abottsford_bc.csv", engine = 'python', sep = ',')
df_abottsfordbc = pd.DataFrame(abottsfordbc)
dataframe_labels = df_abottsfordbc[["Jan","Mar","Apr","May","Jun","Jul","Aug","Sep","Nov","Dec","Annual","Winter","Spring","Summer","Autumn"]]
for i in df_abottsfordbc["Feb"]:
    if i == -9999.9:
        column_median = df_abottsfordbc["Feb"].median()
        outlier_convert = df_abottsfordbc["Feb"].replace(to_replace = [-9999.9], value = [0])
        zero_to_medianFeb = outlier_convert.replace(to_replace = [0], value = [column_median])
for i in dataframe_labels["Oct"]:
    if i == -9999.9:
        column_median = df_abottsfordbc["Oct"].median()
        outlier_convert = df_abottsfordbc["Oct"].replace(to_replace = [-9999.9], value = [0])
        zero_to_medianOct = outlier_convert.replace(to_replace = [0], value = [column_median])
abottsford_bc_concat = pd.concat([dataframe_labels, zero_to_medianFeb, zero_to_medianOct], axis = 1)**

我想知道是否有人可以帮助我解决我面临的这个问题。我最近将数据从 Windows 10 计算机下载到 Mac 运行 macOS Catalina,我不太确定为什么它在 Windows 10 上运行良好但不是 MacOS,我在 Mac 上使用 Spyder 4.1.4 版和 Python 3.8。我不确定为什么我的 Spyder IDE 能够解释 Windows10 上的数据和脚本而不是 macOS Catalina。我检查了我正在阅读的 .csv 文件,它在 Microsoft Excel 中完全没问题。所有列名都在它们应该在的位置。但是,当我打印数据框时,我得到了这个:

**

print(abottsfordbc)
                                                                                     Year
1953 16.4 9.5  11.9 10.0 10.0 8.1  8.9  8.6 9.2  8.6  12.7 11.9 10.5 12.9 10.6 8.6   10.2
1954 17.9 16.5 12.6 13.5 10.8 10.5 8.9  7.9 7.9  11.6 11.8 13.1 11.9 15.4 12.3 9.1   10.4
1955 8.6  10.2 11.5 11.2 8.8  7.2  7.1  6.6 5.9  8.7  14.3 11.1 9.3  10.7 10.5 7.0    9.6
1956 10.5 10.0 16.1 13.6 12.6 13.4 10.8 9.9 11.4 14.0 11.1 18.4 12.7 10.5 14.1 11.4  12.2
1957 17.9 18.4 14.9 13.0 10.7 12.4 12.1 9.4 9.5  14.8 11.1 18.4 13.6 18.3 12.9 11.3  11.8
                                                                                  ...
2010 10.1 9.0  10.9 13.2 10.3 9.5  9.8  8.5 8.5  7.9  12.3 10.9 10.1 9.4  11.4 9.3    9.6
2011 10.2 14.5 13.2 11.7 9.0  10.4 9.3  7.8 8.0  7.4  10.5 7.7  10.0 11.9 11.3 9.2    8.6
2012 12.9 11.1 13.2 9.8  10.4 9.6  9.2  7.8 6.6  10.6 9.7  10.2 10.1 10.6 11.2 8.8    9.0
2013 7.5  10.4 10.6 11.7 9.0  9.2  10.8 8.4 9.1  7.9  9.2  11.4 9.6  9.3  10.5 9.5    8.7
2014 9.9  17.5 11.5 11.4 9.6  10.3 10.1 8.6 8.7  10.0 13.4 11.4 11.0 12.9 10.8 9.7   10.7
[62 rows x 1 columns]

**

我不断收到的错误代码可以在下面找到,这是有道理的,因为当我打印数据框时,我可以看到“Feb”不是一列。有谁知道为什么我的 read_csv() 没有正确读取我的 .csv 文件?而不是只将年份解释为最后一列而不是第一列并将其余列留空?当我在 Microsoft Excel 中打开 .csv 文件时,它的格式正确并保存为 .csv UTF-8 文件。任何帮助将不胜感激。

    runfile('/Users/name/Desktop/Python_Scripts/wind_classifier/cleanwind.py', wdir='/Users/name/Desktop/Python_Scripts/wind_classifier')
    Traceback (most recent call last):
    
      File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
        return self._engine.get_loc(key)
    
      File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
    
      File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
    
      File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
    
      File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
    
    KeyError: 'Feb'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/Users/bryanekeh/Desktop/Python_Scripts/wind_classifier/cleanwind.py", line 11, in <module>
    for i in abottsfordbc["Feb"]:

  File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)

  File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Feb'

你怎么知道文件没有被正确读入?您是在列操作之前还是之后打印它?

对于您的代码,

  1. 您不需要将 pd.read_csv() 转换为 pd.DataFrame。它已经是一个 DataFrame 对象。

  2. 不要 遍历列值。在 pandas 中几乎总有更好的方法。在这种情况下,尝试

df_abottsfordbc["Feb"] = df_abottsfordbc["Feb"].replace(-9999, df_abottsfordbc["Feb"].median()

df_abottsfordbc["Oct"] 的类似过程。此外,您不必执行 concat.

希望这对您有所帮助。