正在将 SKLearn 20_newsgroups 数据集加载到 Pandas DataFrame
Loading SKLearn 20_newsgroups dataset into Pandas DataFrame
Python:我正在尝试加载 sklearn.20_newsgroups 数据集 sklearn.utils.Bunch 进入 pandas 数据框。
I downloaded datasets the below link
categories = ["alt.atheism", "alt.atheism" ,"comp.os.ms-windows.misc" , "comp.sys.ibm.pc.hardware",
"comp.sys.mac.hardware" , "comp.windows.x","misc.forsale", "rec.autos","rec.motorcycles",
"rec.sport.baseball","rec.sport.hockey", "sci.crypt","sci.electronics", "sci.med","sci.space",
"soc.religion.christian","talk.politics.guns" ,"talk.politics.mideast","talk.politics.misc" ,"talk.religion.misc"]
docs_to_train = sklearn.datasets.load_files("/home/Documents03-04-2019/dataset/20_newsgroups",
description = None,
categories = categories,
load_content = True,
encoding = 'ISO-8859-1',
shuffle = True,
random_state = 42)
The below code I treid.
docs_to_train.keys()
data1 = pd.DataFrame(docs_to_train.data, columns=docs_to_train.target_names])
data1['Target'] = pd.Series(data1=docs_to_train.target, index=data1.index)
期望的输出
我 运行 下面类似的代码,它的工作方式类似,我需要像数据帧格式这样的新闻组。
from sklearn.datasets import load_breast_cancer
data = pd.DataFrame(cancer.data, columns=[cancer.feature_names])
data['Target'] = pd.Series(data=cancer.target, index=data.index)
您的几个关键字引用了不相关的代码:您写 cancer
或 data
而不是 data1
,并且有一个不匹配的 ]
.
试试这个:
data1 = pd.DataFrame(docs_to_train.data, columns=[docs_to_train.target_names])
data1['Target'] = pd.Series(data=docs_to_train.target, index=data1.index)
如果这不起作用,试试这个而不是第二行:
data1['Target'] = pd.Series(data=docs_to_train.target)
Python:我正在尝试加载 sklearn.20_newsgroups 数据集 sklearn.utils.Bunch 进入 pandas 数据框。
I downloaded datasets the below link
categories = ["alt.atheism", "alt.atheism" ,"comp.os.ms-windows.misc" , "comp.sys.ibm.pc.hardware",
"comp.sys.mac.hardware" , "comp.windows.x","misc.forsale", "rec.autos","rec.motorcycles",
"rec.sport.baseball","rec.sport.hockey", "sci.crypt","sci.electronics", "sci.med","sci.space",
"soc.religion.christian","talk.politics.guns" ,"talk.politics.mideast","talk.politics.misc" ,"talk.religion.misc"]
docs_to_train = sklearn.datasets.load_files("/home/Documents03-04-2019/dataset/20_newsgroups",
description = None,
categories = categories,
load_content = True,
encoding = 'ISO-8859-1',
shuffle = True,
random_state = 42)
The below code I treid.
docs_to_train.keys()
data1 = pd.DataFrame(docs_to_train.data, columns=docs_to_train.target_names])
data1['Target'] = pd.Series(data1=docs_to_train.target, index=data1.index)
期望的输出 我 运行 下面类似的代码,它的工作方式类似,我需要像数据帧格式这样的新闻组。
from sklearn.datasets import load_breast_cancer
data = pd.DataFrame(cancer.data, columns=[cancer.feature_names])
data['Target'] = pd.Series(data=cancer.target, index=data.index)
您的几个关键字引用了不相关的代码:您写 cancer
或 data
而不是 data1
,并且有一个不匹配的 ]
.
试试这个:
data1 = pd.DataFrame(docs_to_train.data, columns=[docs_to_train.target_names])
data1['Target'] = pd.Series(data=docs_to_train.target, index=data1.index)
如果这不起作用,试试这个而不是第二行:
data1['Target'] = pd.Series(data=docs_to_train.target)