Error: No module name 'Custom Class' while passing a Client object in the custom class's constructor in dask
Error: No module name 'Custom Class' while passing a Client object in the custom class's constructor in dask
我一直在尝试为 Preprocessing
编写自定义 类,然后是 Feature selection
和 Machine Learning
算法。
我使用 @delayed
破解了这个 (preprocessing only)
。但是当我从 tutorials 中读到同样可以使用 Client
实现时。它造成了两个问题。
Running as a script. Not as a Jupyter notebook
第一个问题:
# Haven't run any scheduler or worker manually
client = Client() # Nothing passed as an argument
# Local Cluster is not working;
Error:...
if __name__=='__main__':
freeze_support()
...
我在 Jupyter Notebook 中尝试了相同的方法,但 运行 不同终端中没有任何调度程序或工作程序。 成功了!!
现在,我用 1 个调度程序和 2 个工作程序触发了 3 个终端,并在脚本中将其更改为 Client('IP')
。 错误已解决,此行为的任何原因。
第二题:
题目中提到的错误。将 client = Client('IP')
作为参数传递给构造函数,并将 self.client.submit
东西用于集群。但失败并显示错误消息
Error: No module name 'diya_info'
代码如下:
main.py
import dask.dataframe as dd
from diya_info import Diya_Info
import time
# from dask import delayed
from dask.distributed import Client
df = dd.read_csv(
'/Users/asifali/workspace/playground/flask/yellow_tripdata_2015- 01.csv')
# df = delayed(df.fillna(0.3))
# df = df.compute()
client = Client('192.168.0.129:8786')
X = df.drop('payment_type', axis=1).copy()
y = df['payment_type']
Instance = Diya_Info(X, y, client)
s = time.ctime(int(time.time()))
print(s)
Instance = Instance.fit(X, y)
e = time.ctime(int(time.time()))
print(e)
# print((e-s) % 60, ' secs')
diya_info.py
from sklearn.base import TransformerMixin, BaseEstimator
from dask.multiprocessing import get
from dask import delayed, compute
class Diya_Info(BaseEstimator, TransformerMixin):
def __init__(self, X, y, client):
assert X is not None, 'X can\'t be None'
assert type(X).__name__ == 'DataFrame', 'X not of type DataFrame'
assert y is not None, 'y can\'t be None'
assert type(y).__name__ == 'Series', 'y not of type Series'
self.client = client
def fit(self, X, y):
self.X = X
self.y = y
# X_status = self.has_null(self.X)
# y_status = self.has_null(self.y)
# X_len = self.get_len(self.X)
# y_len = self.get_len(self.y)
X_status = self.client.submit(self.has_null, self.X)
y_status = self.client.submit(self.has_null, self.y)
X_len = self.client.submit(self.get_len, self.X)
y_len = self.client.submit(self.get_len, self.y)
# X_null, y_null, X_length, y_length
X_null, y_null, X_length, y_length = self.client.gather(
[X_status, y_status, X_len, y_len])
assert X_null == False, 'X contains some columns with null/NaN values'
assert y_null == False, 'y contains some columns with null/NaN values'
assert X_length == y_length, 'Shape mismatch, X and y are of different length'
return self
def transform(self, X):
return X
@staticmethod
# @delayed
def has_null(df):
return df.isnull().values.any()
@staticmethod
# @delayed
def get_len(df):
return len(df)
这是完整的堆栈跟踪:
Sat Aug 11 13:29:08 2018
distributed.utils - ERROR - No module named 'diya_info'
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/distributed/utils.py", line 238, in f
result[0] = yield make_coro()
File "/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
value = future.result()
File "/anaconda3/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.6/site-packages/distributed/client.py", line 1315, in _gather
traceback)
File "/anaconda3/lib/python3.6/site-packages/six.py", line 692, in reraise
raise value.with_traceback(tb)
File "/anaconda3/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
return pickle.loads(x)
ModuleNotFoundError: No module named 'diya_info'
Traceback (most recent call last):
File "notebook/main.py", line 24, in <module>
Instance = Instance.fit(X, y)
File "/Users/asifali/workspace/pythonProjects/ML-engine-DataX/pre-processing/notebook/diya_info.py", line 28, in fit
X_status, y_status, X_len, y_len)
File "/anaconda3/lib/python3.6/site-packages/distributed/client.py", line 2170, in compute
result = self.gather(futures)
File "/anaconda3/lib/python3.6/site-packages/distributed/client.py", line 1437, in gather
asynchronous=asynchronous)
File "/anaconda3/lib/python3.6/site-packages/distributed/client.py", line 592, in sync
return sync(self.loop, func, *args, **kwargs)
File "/anaconda3/lib/python3.6/site-packages/distributed/utils.py", line 254, in sync
six.reraise(*error[0])
File "/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/anaconda3/lib/python3.6/site-packages/distributed/utils.py", line 238, in f
result[0] = yield make_coro()
File "/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
value = future.result()
File "/anaconda3/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.6/site-packages/distributed/client.py", line 1315, in _gather
traceback)
File "/anaconda3/lib/python3.6/site-packages/six.py", line 692, in reraise
raise value.with_traceback(tb)
File "/anaconda3/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
return pickle.loads(x)
ModuleNotFoundError: No module named 'diya_info'
如果我取消对 @delayed
的注释并再添加一些注释,它就会起作用。但是如何通过传入 client
作为参数来使其工作。
我的想法是对我要编写的所有库使用相同的客户端。
更新 1:
我通过删除 @staticmethod
装饰器并将函数放在 fit closure
中来修复 second problem
。 但是 @staticmethod
有什么问题,这些装饰器是为非自我相关的东西设计的,对吧?
这是 diya_info.py
:
...
def fit(self, X, y):
self.X = X
self.y = y
# function removed from @staticmethod
def has_null(df): return df.isnull().values.any()
# function removed from @staticmethod
def get_len(df): return len(df)
X_status = self.client.submit(has_null, self.X)
y_status = self.client.submit(has_null, self.y)
...
有没有办法用 @staticmethod
做到这一点。我对解决这个问题的方式感觉不太好。仍然没有关于 Problem 1
的线索
ModuleNotFoundError: No module named 'diya_info'
这意味着虽然您的客户可以访问此模块,但您的员工不能。解决此问题的一个简单方法是将您的脚本上传给您的工作人员。
client.upload_file('diya_info.py')
但总的来说,您有责任确保您的员工和客户都拥有相同的软件环境
我一直在尝试为 Preprocessing
编写自定义 类,然后是 Feature selection
和 Machine Learning
算法。
我使用 @delayed
破解了这个 (preprocessing only)
。但是当我从 tutorials 中读到同样可以使用 Client
实现时。它造成了两个问题。
Running as a script. Not as a Jupyter notebook
第一个问题:
# Haven't run any scheduler or worker manually
client = Client() # Nothing passed as an argument
# Local Cluster is not working;
Error:...
if __name__=='__main__':
freeze_support()
...
我在 Jupyter Notebook 中尝试了相同的方法,但 运行 不同终端中没有任何调度程序或工作程序。 成功了!!
现在,我用 1 个调度程序和 2 个工作程序触发了 3 个终端,并在脚本中将其更改为 Client('IP')
。 错误已解决,此行为的任何原因。
第二题:
题目中提到的错误。将 client = Client('IP')
作为参数传递给构造函数,并将 self.client.submit
东西用于集群。但失败并显示错误消息
Error: No module name 'diya_info'
代码如下:
main.py
import dask.dataframe as dd
from diya_info import Diya_Info
import time
# from dask import delayed
from dask.distributed import Client
df = dd.read_csv(
'/Users/asifali/workspace/playground/flask/yellow_tripdata_2015- 01.csv')
# df = delayed(df.fillna(0.3))
# df = df.compute()
client = Client('192.168.0.129:8786')
X = df.drop('payment_type', axis=1).copy()
y = df['payment_type']
Instance = Diya_Info(X, y, client)
s = time.ctime(int(time.time()))
print(s)
Instance = Instance.fit(X, y)
e = time.ctime(int(time.time()))
print(e)
# print((e-s) % 60, ' secs')
diya_info.py
from sklearn.base import TransformerMixin, BaseEstimator
from dask.multiprocessing import get
from dask import delayed, compute
class Diya_Info(BaseEstimator, TransformerMixin):
def __init__(self, X, y, client):
assert X is not None, 'X can\'t be None'
assert type(X).__name__ == 'DataFrame', 'X not of type DataFrame'
assert y is not None, 'y can\'t be None'
assert type(y).__name__ == 'Series', 'y not of type Series'
self.client = client
def fit(self, X, y):
self.X = X
self.y = y
# X_status = self.has_null(self.X)
# y_status = self.has_null(self.y)
# X_len = self.get_len(self.X)
# y_len = self.get_len(self.y)
X_status = self.client.submit(self.has_null, self.X)
y_status = self.client.submit(self.has_null, self.y)
X_len = self.client.submit(self.get_len, self.X)
y_len = self.client.submit(self.get_len, self.y)
# X_null, y_null, X_length, y_length
X_null, y_null, X_length, y_length = self.client.gather(
[X_status, y_status, X_len, y_len])
assert X_null == False, 'X contains some columns with null/NaN values'
assert y_null == False, 'y contains some columns with null/NaN values'
assert X_length == y_length, 'Shape mismatch, X and y are of different length'
return self
def transform(self, X):
return X
@staticmethod
# @delayed
def has_null(df):
return df.isnull().values.any()
@staticmethod
# @delayed
def get_len(df):
return len(df)
这是完整的堆栈跟踪:
Sat Aug 11 13:29:08 2018
distributed.utils - ERROR - No module named 'diya_info'
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/distributed/utils.py", line 238, in f
result[0] = yield make_coro()
File "/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
value = future.result()
File "/anaconda3/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.6/site-packages/distributed/client.py", line 1315, in _gather
traceback)
File "/anaconda3/lib/python3.6/site-packages/six.py", line 692, in reraise
raise value.with_traceback(tb)
File "/anaconda3/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
return pickle.loads(x)
ModuleNotFoundError: No module named 'diya_info'
Traceback (most recent call last):
File "notebook/main.py", line 24, in <module>
Instance = Instance.fit(X, y)
File "/Users/asifali/workspace/pythonProjects/ML-engine-DataX/pre-processing/notebook/diya_info.py", line 28, in fit
X_status, y_status, X_len, y_len)
File "/anaconda3/lib/python3.6/site-packages/distributed/client.py", line 2170, in compute
result = self.gather(futures)
File "/anaconda3/lib/python3.6/site-packages/distributed/client.py", line 1437, in gather
asynchronous=asynchronous)
File "/anaconda3/lib/python3.6/site-packages/distributed/client.py", line 592, in sync
return sync(self.loop, func, *args, **kwargs)
File "/anaconda3/lib/python3.6/site-packages/distributed/utils.py", line 254, in sync
six.reraise(*error[0])
File "/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/anaconda3/lib/python3.6/site-packages/distributed/utils.py", line 238, in f
result[0] = yield make_coro()
File "/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run
value = future.result()
File "/anaconda3/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "/anaconda3/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run
yielded = self.gen.throw(*exc_info)
File "/anaconda3/lib/python3.6/site-packages/distributed/client.py", line 1315, in _gather
traceback)
File "/anaconda3/lib/python3.6/site-packages/six.py", line 692, in reraise
raise value.with_traceback(tb)
File "/anaconda3/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
return pickle.loads(x)
ModuleNotFoundError: No module named 'diya_info'
如果我取消对 @delayed
的注释并再添加一些注释,它就会起作用。但是如何通过传入 client
作为参数来使其工作。
我的想法是对我要编写的所有库使用相同的客户端。
更新 1:
我通过删除 @staticmethod
装饰器并将函数放在 fit closure
中来修复 second problem
。 但是 @staticmethod
有什么问题,这些装饰器是为非自我相关的东西设计的,对吧?
这是 diya_info.py
:
...
def fit(self, X, y):
self.X = X
self.y = y
# function removed from @staticmethod
def has_null(df): return df.isnull().values.any()
# function removed from @staticmethod
def get_len(df): return len(df)
X_status = self.client.submit(has_null, self.X)
y_status = self.client.submit(has_null, self.y)
...
有没有办法用 @staticmethod
做到这一点。我对解决这个问题的方式感觉不太好。仍然没有关于 Problem 1
ModuleNotFoundError: No module named 'diya_info'
这意味着虽然您的客户可以访问此模块,但您的员工不能。解决此问题的一个简单方法是将您的脚本上传给您的工作人员。
client.upload_file('diya_info.py')
但总的来说,您有责任确保您的员工和客户都拥有相同的软件环境