使用 Google Colab 时如何保存 np.array 的结果以备将来使用
How to save the results of an np.array for future use when using Google Colab
我正在做一个信息检索项目。为此,我正在使用 Google Colab。我正处于计算一些特征 ("input_features") 并且我有标签 ("labels") 的阶段通过执行 for 循环,我花了大约 4 个小时才完成。
所以最后我将结果附加到一个数组中:
input_features = np.array(input_features)
labels = np.array(labels)
所以我的问题是:
是否可以保存这些结果以便在使用 google colab 时使用它们?
我找到了 2 个可能适用的选项,但我不知道这些文件是在哪里创建的。
1) 将它们保存为 csv 文件。我的代码是:
from numpy import savetxt
# save to csv file
savetxt('input_features.csv', input_features, delimiter=',')
savetxt('labels.csv', labels, delimiter=',')
为了加载它们:
from numpy import loadtxt
# load array
input_features = loadtxt('input_features.csv', delimiter=',')
labels = loadtxt('labels.csv', delimiter=',')
# print the array
print(input_features)
print(labels)
但是当我打印时我仍然没有收到任何回复。
2) 使用 pickle 保存数组的结果,我按照此处的这些说明进行操作:
https://colab.research.google.com/drive/1EAFQxQ68FfsThpVcNU7m8vqt4UZL0Le1#scrollTo=gZ7OTLo3pw8M
from google.colab import files
import pickle
def features_pickeled(input_features, results):
input_features = input_features + '.txt'
pickle.dump(results, open(input_features, 'wb'))
files.download(input_features)
def labels_pickeled(labels, results):
labels = labels + '.txt'
pickle.dump(results, open(labels, 'wb'))
files.download(labels)
并加载它们:
def load_from_local():
loaded_features = {}
uploaded = files.upload()
for input_features in uploaded.keys():
unpickeled_features = uploaded[input_features]
loaded[input_features] = pickle.load(BytesIO(data))
return loaded_features
def load_from_local():
loaded_labels = {}
uploaded = files.upload()
for labels in uploaded.keys():
unpickeled_labels = uploaded[labels]
loaded[labels] = pickle.load(BytesIO(data))
return loaded_labes
#How do I print the pickled files to see if I have them ready for use???
当使用 python 时,我会为泡菜做这样的事情:
#Create pickle file
with open("name.pickle", "wb") as pickle_file:
pickle.dump(name, pickle_file)
#Load the pickle file
with open("name.pickle", "rb") as name_pickled:
name_b = pickle.load(name_pickled)
但问题是我没有在我的 google 驱动器中看到要创建的任何文件。
我的代码是否正确或是否遗漏了部分代码?
长篇描述,希望能详细解释我想做什么以及我为这个问题做了什么。
预先感谢您的帮助。
Google Colaboratory notebook 实例永远无法保证在您断开连接并重新连接时能够访问相同的资源,因为它们 运行 在虚拟机上。因此,您无法在 Colab 中 "save" 您的数据。这里有一些解决方案:
- Colab 保存您的代码。如果您引用的 for 循环操作不会花费大量时间 运行,只需留下代码并在每次连接笔记本时 运行 它。
- 查看 np.save. This function allows you to save an array to a binary file. Then, you could re-upload your binary file when you reconnect your notebook. Better yet, you could store the binary file on Google Drive, mount your drive to your notebook,并像那样引用它。
# Mount driver to authenticate yourself to gdrive
from google.colab import drive
drive.mount('/content/gdrive')
#---
# Import necessary libraries
import numpy as np
from numpy import savetxt
import pandas as pd
#---
# Create array
arr = np.array([1, 2, 3, 4, 5])
# save to csv file
savetxt('arr.csv', arr, delimiter=',') # You will see the results if you press in the File icon (left panel)
然后您可以通过以下方式再次加载它:
# You can copy the path when you find your file in the file icon
arr = pd.read_csv('/content/arr.csv', sep=',', header=None) # You can also save your result as a txt file
arr
我正在做一个信息检索项目。为此,我正在使用 Google Colab。我正处于计算一些特征 ("input_features") 并且我有标签 ("labels") 的阶段通过执行 for 循环,我花了大约 4 个小时才完成。
所以最后我将结果附加到一个数组中:
input_features = np.array(input_features)
labels = np.array(labels)
所以我的问题是: 是否可以保存这些结果以便在使用 google colab 时使用它们?
我找到了 2 个可能适用的选项,但我不知道这些文件是在哪里创建的。
1) 将它们保存为 csv 文件。我的代码是:
from numpy import savetxt
# save to csv file
savetxt('input_features.csv', input_features, delimiter=',')
savetxt('labels.csv', labels, delimiter=',')
为了加载它们:
from numpy import loadtxt
# load array
input_features = loadtxt('input_features.csv', delimiter=',')
labels = loadtxt('labels.csv', delimiter=',')
# print the array
print(input_features)
print(labels)
但是当我打印时我仍然没有收到任何回复。
2) 使用 pickle 保存数组的结果,我按照此处的这些说明进行操作: https://colab.research.google.com/drive/1EAFQxQ68FfsThpVcNU7m8vqt4UZL0Le1#scrollTo=gZ7OTLo3pw8M
from google.colab import files
import pickle
def features_pickeled(input_features, results):
input_features = input_features + '.txt'
pickle.dump(results, open(input_features, 'wb'))
files.download(input_features)
def labels_pickeled(labels, results):
labels = labels + '.txt'
pickle.dump(results, open(labels, 'wb'))
files.download(labels)
并加载它们:
def load_from_local():
loaded_features = {}
uploaded = files.upload()
for input_features in uploaded.keys():
unpickeled_features = uploaded[input_features]
loaded[input_features] = pickle.load(BytesIO(data))
return loaded_features
def load_from_local():
loaded_labels = {}
uploaded = files.upload()
for labels in uploaded.keys():
unpickeled_labels = uploaded[labels]
loaded[labels] = pickle.load(BytesIO(data))
return loaded_labes
#How do I print the pickled files to see if I have them ready for use???
当使用 python 时,我会为泡菜做这样的事情:
#Create pickle file
with open("name.pickle", "wb") as pickle_file:
pickle.dump(name, pickle_file)
#Load the pickle file
with open("name.pickle", "rb") as name_pickled:
name_b = pickle.load(name_pickled)
但问题是我没有在我的 google 驱动器中看到要创建的任何文件。
我的代码是否正确或是否遗漏了部分代码?
长篇描述,希望能详细解释我想做什么以及我为这个问题做了什么。
预先感谢您的帮助。
Google Colaboratory notebook 实例永远无法保证在您断开连接并重新连接时能够访问相同的资源,因为它们 运行 在虚拟机上。因此,您无法在 Colab 中 "save" 您的数据。这里有一些解决方案:
- Colab 保存您的代码。如果您引用的 for 循环操作不会花费大量时间 运行,只需留下代码并在每次连接笔记本时 运行 它。
- 查看 np.save. This function allows you to save an array to a binary file. Then, you could re-upload your binary file when you reconnect your notebook. Better yet, you could store the binary file on Google Drive, mount your drive to your notebook,并像那样引用它。
# Mount driver to authenticate yourself to gdrive
from google.colab import drive
drive.mount('/content/gdrive')
#---
# Import necessary libraries
import numpy as np
from numpy import savetxt
import pandas as pd
#---
# Create array
arr = np.array([1, 2, 3, 4, 5])
# save to csv file
savetxt('arr.csv', arr, delimiter=',') # You will see the results if you press in the File icon (left panel)
然后您可以通过以下方式再次加载它:
# You can copy the path when you find your file in the file icon
arr = pd.read_csv('/content/arr.csv', sep=',', header=None) # You can also save your result as a txt file
arr