我应该如何逐行遍历 csv 文件以同时获得 1 个预测

Question

我的代码：

counter = 0
while True :
    try :     
        test = pd.read_csv('test.csv' , nrows = 1 , skiprows = (counter) )
        counter += 1
    except:
        sleep(1)
        print('sleep')
        continue

    test = test.iloc[::,::]

    print('counter' , counter)

    test  = test[newx] # the same model trained features
    print('########### Binary classification is loading #############')
    print('test' , test)
    result1 = GB_model.predict(test) # lunch predication 

    #print(result1)

    print('......Binary classfication result......')

    for i in result1:
        if i == 1:
            print('Attack traffic!!')
        else:
            print('Benign')

csv 文件中的行和列示例

https://pmqu-my.sharepoint.com/:x:/g/personal/3710137_upm_edu_sa/EWwYIThGjwhLnaHM-0OirdcBwQyIHfy8o1WG_M0tcohJOg?e=n8cvQR

错误，仅预测第一行，然后下一行将出现此错误，我认为它出现是因为下一行将进入预测函数而没有行 headers（功能）

└─# python script.py                                                                                                
/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
counter 1
########### Binary classification is loading #############
test    protocol  flow_duration  tot_fwd_pkts  tot_bwd_pkts  ...  init_bwd_win_byts  fwd_seg_size_min  active_mean  idle_mean
0        17   5.003721e+06             4             0  ...                  0                 8          0.0        0.0

[1 rows x 31 columns]
......Binary classfication result......
Benign
:( no attack traffic maybe next time
counter 2
Traceback (most recent call last):
  File "/home/kali/Desktop/cicflowmeter-0.1.6/project/script.py", line 90, in <module>
    test  = test[newx]
  File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/frame.py", line 3511, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
  File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5782, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5842, in _raise_if_missing
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['protocol', 'flow_duration', 'tot_fwd_pkts', 'tot_bwd_pkts',\n       'totlen_fwd_pkts', 'totlen_bwd_pkts', 'fwd_pkt_len_mean',\n       'fwd_pkt_len_std', 'bwd_pkt_len_mean', 'flow_byts_s', 'flow_pkts_s',\n       'flow_iat_std', 'flow_iat_min', 'fwd_iat_tot', 'fwd_iat_min',\n       'bwd_iat_tot', 'bwd_iat_min', 'fwd_psh_flags', 'fwd_urg_flags',\n       'bwd_pkts_s', 'fin_flag_cnt', 'rst_flag_cnt', 'psh_flag_cnt',\n       'ack_flag_cnt', 'urg_flag_cnt', 'down_up_ratio', 'init_fwd_win_byts',\n       'init_bwd_win_byts', 'fwd_seg_size_min', 'active_mean', 'idle_mean'],\n      dtype='object')] are in the [columns]"

Answer 1

如果您不想一次全部阅读，那么您可以分块阅读（使用行数）

import pandas as pd

for row in pd.read_csv('filename.csv', chunksize=1):
    print(row)

但您真正的问题可能是您的文件不是正确的 CSV。

在 CSV 中，每一行都需要相同的列，但不同行中的行值不同，您应该将其作为普通文本文件读取并以不同方式解析每一行（使用自己的代码）

我应该如何逐行遍历 csv 文件以同时获得 1 个预测

How should I itrate over csv file row by row to get 1 prediction at the time

python

csv

pandas

catboost