我应该如何逐行遍历 csv 文件以同时获得 1 个预测
How should I itrate over csv file row by row to get 1 prediction at the time
我的代码:
counter = 0
while True :
try :
test = pd.read_csv('test.csv' , nrows = 1 , skiprows = (counter) )
counter += 1
except:
sleep(1)
print('sleep')
continue
test = test.iloc[::,::]
print('counter' , counter)
test = test[newx] # the same model trained features
print('########### Binary classification is loading #############')
print('test' , test)
result1 = GB_model.predict(test) # lunch predication
#print(result1)
print('......Binary classfication result......')
for i in result1:
if i == 1:
print('Attack traffic!!')
else:
print('Benign')
csv 文件中的行和列示例
错误,仅预测第一行,然后下一行将出现此错误,我认为它出现是因为下一行将进入预测函数而没有行 headers(功能)
└─# python script.py
/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
counter 1
########### Binary classification is loading #############
test protocol flow_duration tot_fwd_pkts tot_bwd_pkts ... init_bwd_win_byts fwd_seg_size_min active_mean idle_mean
0 17 5.003721e+06 4 0 ... 0 8 0.0 0.0
[1 rows x 31 columns]
......Binary classfication result......
Benign
:( no attack traffic maybe next time
counter 2
Traceback (most recent call last):
File "/home/kali/Desktop/cicflowmeter-0.1.6/project/script.py", line 90, in <module>
test = test[newx]
File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/frame.py", line 3511, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5782, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5842, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['protocol', 'flow_duration', 'tot_fwd_pkts', 'tot_bwd_pkts',\n 'totlen_fwd_pkts', 'totlen_bwd_pkts', 'fwd_pkt_len_mean',\n 'fwd_pkt_len_std', 'bwd_pkt_len_mean', 'flow_byts_s', 'flow_pkts_s',\n 'flow_iat_std', 'flow_iat_min', 'fwd_iat_tot', 'fwd_iat_min',\n 'bwd_iat_tot', 'bwd_iat_min', 'fwd_psh_flags', 'fwd_urg_flags',\n 'bwd_pkts_s', 'fin_flag_cnt', 'rst_flag_cnt', 'psh_flag_cnt',\n 'ack_flag_cnt', 'urg_flag_cnt', 'down_up_ratio', 'init_fwd_win_byts',\n 'init_bwd_win_byts', 'fwd_seg_size_min', 'active_mean', 'idle_mean'],\n dtype='object')] are in the [columns]"
如果您不想一次全部阅读,那么您可以分块阅读(使用行数)
import pandas as pd
for row in pd.read_csv('filename.csv', chunksize=1):
print(row)
但您真正的问题可能是您的文件不是正确的 CSV。
在 CSV 中,每一行都需要相同的列,但不同行中的行值不同,您应该将其作为普通文本文件读取并以不同方式解析每一行(使用自己的代码)
我的代码:
counter = 0
while True :
try :
test = pd.read_csv('test.csv' , nrows = 1 , skiprows = (counter) )
counter += 1
except:
sleep(1)
print('sleep')
continue
test = test.iloc[::,::]
print('counter' , counter)
test = test[newx] # the same model trained features
print('########### Binary classification is loading #############')
print('test' , test)
result1 = GB_model.predict(test) # lunch predication
#print(result1)
print('......Binary classfication result......')
for i in result1:
if i == 1:
print('Attack traffic!!')
else:
print('Benign')
csv 文件中的行和列示例
错误,仅预测第一行,然后下一行将出现此错误,我认为它出现是因为下一行将进入预测函数而没有行 headers(功能)
└─# python script.py
/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
counter 1
########### Binary classification is loading #############
test protocol flow_duration tot_fwd_pkts tot_bwd_pkts ... init_bwd_win_byts fwd_seg_size_min active_mean idle_mean
0 17 5.003721e+06 4 0 ... 0 8 0.0 0.0
[1 rows x 31 columns]
......Binary classfication result......
Benign
:( no attack traffic maybe next time
counter 2
Traceback (most recent call last):
File "/home/kali/Desktop/cicflowmeter-0.1.6/project/script.py", line 90, in <module>
test = test[newx]
File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/frame.py", line 3511, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5782, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/home/kali/Desktop/conda/Desktop/envs/project/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 5842, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['protocol', 'flow_duration', 'tot_fwd_pkts', 'tot_bwd_pkts',\n 'totlen_fwd_pkts', 'totlen_bwd_pkts', 'fwd_pkt_len_mean',\n 'fwd_pkt_len_std', 'bwd_pkt_len_mean', 'flow_byts_s', 'flow_pkts_s',\n 'flow_iat_std', 'flow_iat_min', 'fwd_iat_tot', 'fwd_iat_min',\n 'bwd_iat_tot', 'bwd_iat_min', 'fwd_psh_flags', 'fwd_urg_flags',\n 'bwd_pkts_s', 'fin_flag_cnt', 'rst_flag_cnt', 'psh_flag_cnt',\n 'ack_flag_cnt', 'urg_flag_cnt', 'down_up_ratio', 'init_fwd_win_byts',\n 'init_bwd_win_byts', 'fwd_seg_size_min', 'active_mean', 'idle_mean'],\n dtype='object')] are in the [columns]"
如果您不想一次全部阅读,那么您可以分块阅读(使用行数)
import pandas as pd
for row in pd.read_csv('filename.csv', chunksize=1):
print(row)
但您真正的问题可能是您的文件不是正确的 CSV。
在 CSV 中,每一行都需要相同的列,但不同行中的行值不同,您应该将其作为普通文本文件读取并以不同方式解析每一行(使用自己的代码)