使用 colab 从内存中读取 .csv 文件
Reading a .csv file from memory using colab
大家好。我有一个 .csv 文件,我想从我的驱动器中读取它。我正在使用 colab 来做到这一点。但是,我正在使用 excel 来设置 csv 文件,但是当我在我的 colab 上指定位置时,它仍然显示 .xlsx,并且我在下面遇到此错误:
ParserError Traceback (most recent call last)
<ipython-input-4-b8dede7d2e2c> in <module>()
7
8 #load dataset
----> 9 dataset = pd.read_csv('/content/mnt/MyDrive/Colab Notebooks/salary_data.csv.xslx')
10
11 # split data into features and target
3 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in read(self, nrows)
2155 def read(self, nrows=None):
2156 try:
-> 2157 data = self._reader.read(nrows)
2158 except StopIteration:
2159 if self._first_chunk:
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()
ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2
这是下面的代码:model.ipynb
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import pickle
from sklearn.metrics import r2_score
#load dataset
dataset = pd.read_csv('/content/mnt/MyDrive/Colab Notebooks/salary_data.csv .xslx')
# split data into features and target
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
#split the data into train and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.05, random_state = 0)
# create a model
regressor = LinearRegression()
#train the model
regressor.fit(X_train, y_train)
#perform prediction
y_pred = regressor.predict(X_test)
# you can check the peformance of the model from the following code
#print("R2 score: {}".format(r2_score(y_test,y_pred)))
#save the trained model
pickle.dump(regressor, open('/content/mnt/MyDrive/Colab Notebooks/regressor.pkl','wb'))
请帮我解决这个问题。谢谢
首先,您的路径似乎有误。接近尾声时有 space。
('/content/mnt/MyDrive/Colab Notebooks/salary_data.csv .xslx')
其次,该文件实际上是 .csv 还是 .xslx?它有两个结局,所以有点模棱两可。
如果是 .csv,您应该删除文件名末尾的 .xslx 和路径。
如果是 .xlsx,您可以使用 read_excel()
而不是 read_csv()
或者 您可以在 Excel 中转换为 CSV。打开 Excel 中的 .xslx -> 转到文件 -> 另存为 -> CSV。
大家好。我有一个 .csv 文件,我想从我的驱动器中读取它。我正在使用 colab 来做到这一点。但是,我正在使用 excel 来设置 csv 文件,但是当我在我的 colab 上指定位置时,它仍然显示 .xlsx,并且我在下面遇到此错误:
ParserError Traceback (most recent call last)
<ipython-input-4-b8dede7d2e2c> in <module>()
7
8 #load dataset
----> 9 dataset = pd.read_csv('/content/mnt/MyDrive/Colab Notebooks/salary_data.csv.xslx')
10
11 # split data into features and target
3 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in read(self, nrows)
2155 def read(self, nrows=None):
2156 try:
-> 2157 data = self._reader.read(nrows)
2158 except StopIteration:
2159 if self._first_chunk:
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()
ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2
这是下面的代码:model.ipynb
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import pickle
from sklearn.metrics import r2_score
#load dataset
dataset = pd.read_csv('/content/mnt/MyDrive/Colab Notebooks/salary_data.csv .xslx')
# split data into features and target
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
#split the data into train and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.05, random_state = 0)
# create a model
regressor = LinearRegression()
#train the model
regressor.fit(X_train, y_train)
#perform prediction
y_pred = regressor.predict(X_test)
# you can check the peformance of the model from the following code
#print("R2 score: {}".format(r2_score(y_test,y_pred)))
#save the trained model
pickle.dump(regressor, open('/content/mnt/MyDrive/Colab Notebooks/regressor.pkl','wb'))
请帮我解决这个问题。谢谢
首先,您的路径似乎有误。接近尾声时有 space。
('/content/mnt/MyDrive/Colab Notebooks/salary_data.csv .xslx')
其次,该文件实际上是 .csv 还是 .xslx?它有两个结局,所以有点模棱两可。
如果是 .csv,您应该删除文件名末尾的 .xslx 和路径。
如果是 .xlsx,您可以使用 read_excel()
而不是 read_csv()
或者 您可以在 Excel 中转换为 CSV。打开 Excel 中的 .xslx -> 转到文件 -> 另存为 -> CSV。