如何将 pandas 数据帧转换为 libsvm 格式?

how to convert pandas dataframe to libsvm format?

我有如下 pandas 数据框。

df
Out[50]: 
    0   1   2   3   4   5   6   7   8   9  ...  90  91  92  93  94  95  96  97 \
0   0   0   0   0   0   0   0   0   0   0 ...   1   1   1   1   1   1   1   1   
1   0   1   1   1   0   0   1   1   1   1 ...   0   0   0   0   0   0   0   0   
2   1   1   1   1   1   1   1   1   1   1 ...   0   0   0   0   0   0   0   0   
3   0   0   0   0   0   0   0   0   0   0 ...   1   1   1   1   1   1   1   1   
4   0   0   0   0   0   0   0   0   0   0 ...   1   1   1   1   1   1   1   1   
5   1   0   0   1   1   1   1   0   0   0 ...   0   0   0   0   0   0   0   0   
6   0   0   0   0   0   0   0   0   0   0 ...   1   1   1   1   1   1   1   1   
7   0   0   0   0   0   0   0   0   0   0 ...   1   1   1   1   1   1   1   1   

[8 rows x 100 columns]

我将目标变量作为数组,如下所示。

[1, -1, -1, 1, 1, -1, 1, 1]

如何将此目标变量映射到数据框并将其转换为 lib SVM 格式?

equi = {0:1, 1:-1, 2:-1,3:1,4:1,5:-1,6:1,7:1}
df["labels"] = df.index.map[(equi)]
d = df[np.setdiff1d(df.columns,['indx','labels'])]
e = df.label
dump_svmlight_file(d,e,'D:/result/smvlight2.dat')er code here

错误:

 File "D:/spyder/april.py", line 54, in <module>
df["labels"] = df.index.map[(equi)]

TypeError: 'method' object is not subscriptable

当我使用

df["labels"] = df.index.list(map[(equi)])

错误:

AttributeError: 'RangeIndex' object has no attribute 'list'

请帮我解决这些错误。

我认为你需要转换 index to_series and then call map:

df["labels"] = df.index.to_series().map(equi)

或使用rename of index:

df["labels"] = df.rename(index=equi).index

总计:

因为列的差异 pandas 有 difference:

from sklearn.datasets import dump_svmlight_file

equi = {0:1, 1:-1, 2:-1,3:1,4:1,5:-1,6:1,7:1}

df["labels"] = df.rename(index=equi).index
e = df["labels"]
d = df[df.columns.difference(['indx','labels'])]

dump_svmlight_file(d,e,'C:/result/smvlight2.dat')

而且 label 列似乎不是必需的:

from sklearn.datasets import dump_svmlight_file

equi = {0:1, 1:-1, 2:-1,3:1,4:1,5:-1,6:1,7:1}
e = df.rename(index=equi).index
d = df[df.columns.difference(['indx'])]
dump_svmlight_file(d,e,'C:/result/smvlight2.dat')