Pandas Modin ray库启动失败
Pandas Modin ray library fails to startup
我正在尝试使用 modin
加速我的 pandas 数据处理
import os
os.environ["MODIN_ENGINE"] = "ray"
import modin.pandas as pd
df = pd.read_csv(r"C:\Users\Harshad\Documents\Files\Data\Pre-processed\data.csv", low_memory=False)
我收到以下警告和错误:
UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:
import ray
ray.init()
Traceback (most recent call last):
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\node.py", line 240, in __init__
self.redis_password)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\_private\services.py", line 328, in wait_for_node
raise TimeoutError("Timed out while waiting for node to startup.")
TimeoutError: Timed out while waiting for node to startup.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/Harshad/Documents/Code/data.py", line 18, in <module>
low_memory=False)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\pandas\io.py", line 135, in read_csv
return _read(**kwargs)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\pandas\io.py", line 58, in _read
Engine.subscribe(_update_engine)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\config\pubsub.py", line 213, in subscribe
callback(cls)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\pandas\__init__.py", line 127, in _update_engine
initialize_ray()
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\core\execution\ray\common\utils.py", line 185, in initialize_ray
ray.init(**ray_init_kwargs)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\worker.py", line 922, in init
ray_params=ray_params)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\node.py", line 243, in __init__
"The current node has not been updated within 30 "
Exception: The current node has not been updated within 30 seconds, this could happen because of some of the Ray processes failed to startup.
虽然我已经清楚地重新运行代码,但它们之间的时间超过 30 秒。
当我在安装 modin 和 ray 后第一次 运行 它时,它 运行 相当好,只有以下警告:
UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:
import ray
ray.init()
然后我修改代码为:
import os
os.environ["MODIN_ENGINE"] = "ray"
import modin.pandas as pd
import ray
ray.init()
df = pd.read_csv(r"C:\Users\Harshad\Documents\Files\Data\Pre-processed\data.csv", low_memory=False)
我收到这个错误:
Traceback (most recent call last):
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\node.py", line 240, in __init__
self.redis_password)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\_private\services.py", line 328, in wait_for_node
raise TimeoutError("Timed out while waiting for node to startup.")
TimeoutError: Timed out while waiting for node to startup.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/Harshad/Documents/Code/data.py", line 18, in <module>
low_memory=False)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\pandas\io.py", line 135, in read_csv
return _read(**kwargs)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\pandas\io.py", line 58, in _read
Engine.subscribe(_update_engine)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\config\pubsub.py", line 213, in subscribe
callback(cls)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\pandas\__init__.py", line 127, in _update_engine
initialize_ray()
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\core\execution\ray\common\utils.py", line 185, in initialize_ray
ray.init(**ray_init_kwargs)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\worker.py", line 922, in init
ray_params=ray_params)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\node.py", line 243, in __init__
"The current node has not been updated within 30 "
Exception: The current node has not been updated within 30 seconds, this could happen because of some of the Ray processes failed to startup
当我查看 Github for this issue 时,发现它是一个错误
如何解决这些警告和错误?
编辑:我重新启动了我的 pycharm 环境,它允许一个 re运行 循环。这表明这是一个 Pycharm/environment 问题?
我该如何解决这个问题?
在导入之前尝试 init
ing ray
modin
:
import os
os.environ["MODIN_ENGINE"] = "ray"
import ray
ray.init()
import modin.pandas as pd
我正在尝试使用 modin
加速我的 pandas 数据处理import os
os.environ["MODIN_ENGINE"] = "ray"
import modin.pandas as pd
df = pd.read_csv(r"C:\Users\Harshad\Documents\Files\Data\Pre-processed\data.csv", low_memory=False)
我收到以下警告和错误:
UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:
import ray
ray.init()
Traceback (most recent call last):
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\node.py", line 240, in __init__
self.redis_password)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\_private\services.py", line 328, in wait_for_node
raise TimeoutError("Timed out while waiting for node to startup.")
TimeoutError: Timed out while waiting for node to startup.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/Harshad/Documents/Code/data.py", line 18, in <module>
low_memory=False)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\pandas\io.py", line 135, in read_csv
return _read(**kwargs)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\pandas\io.py", line 58, in _read
Engine.subscribe(_update_engine)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\config\pubsub.py", line 213, in subscribe
callback(cls)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\pandas\__init__.py", line 127, in _update_engine
initialize_ray()
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\core\execution\ray\common\utils.py", line 185, in initialize_ray
ray.init(**ray_init_kwargs)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\worker.py", line 922, in init
ray_params=ray_params)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\node.py", line 243, in __init__
"The current node has not been updated within 30 "
Exception: The current node has not been updated within 30 seconds, this could happen because of some of the Ray processes failed to startup.
虽然我已经清楚地重新运行代码,但它们之间的时间超过 30 秒。
当我在安装 modin 和 ray 后第一次 运行 它时,它 运行 相当好,只有以下警告:
UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:
import ray
ray.init()
然后我修改代码为:
import os
os.environ["MODIN_ENGINE"] = "ray"
import modin.pandas as pd
import ray
ray.init()
df = pd.read_csv(r"C:\Users\Harshad\Documents\Files\Data\Pre-processed\data.csv", low_memory=False)
我收到这个错误:
Traceback (most recent call last):
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\node.py", line 240, in __init__
self.redis_password)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\_private\services.py", line 328, in wait_for_node
raise TimeoutError("Timed out while waiting for node to startup.")
TimeoutError: Timed out while waiting for node to startup.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/Harshad/Documents/Code/data.py", line 18, in <module>
low_memory=False)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\pandas\io.py", line 135, in read_csv
return _read(**kwargs)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\pandas\io.py", line 58, in _read
Engine.subscribe(_update_engine)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\config\pubsub.py", line 213, in subscribe
callback(cls)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\pandas\__init__.py", line 127, in _update_engine
initialize_ray()
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\modin\core\execution\ray\common\utils.py", line 185, in initialize_ray
ray.init(**ray_init_kwargs)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\worker.py", line 922, in init
ray_params=ray_params)
File "C:\Users\Harshad\Documents\pythonProject\venv\lib\site-packages\ray\node.py", line 243, in __init__
"The current node has not been updated within 30 "
Exception: The current node has not been updated within 30 seconds, this could happen because of some of the Ray processes failed to startup
当我查看 Github for this issue 时,发现它是一个错误
如何解决这些警告和错误?
编辑:我重新启动了我的 pycharm 环境,它允许一个 re运行 循环。这表明这是一个 Pycharm/environment 问题?
我该如何解决这个问题?
在导入之前尝试 init
ing ray
modin
:
import os
os.environ["MODIN_ENGINE"] = "ray"
import ray
ray.init()
import modin.pandas as pd