Ascend 910b vllm运行报错: cannot import name 'log' from 'torch.distributed.elastic
在Ascend 910b上运行vllm报错. ImportError: cannot import name 'log' from 'torch.distributed.elastic.agent.server.api'
详细错误如下:
代码语言:shell复制/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/ascend-toolkit/latest owner does not match the current owner.
warnings.warn(f"Warning: The {path} owner does not match the current owner.")
/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch_npu/utils/collect_env.py:58: UserWarning: Warning: The /usr/local/Ascend/ascend-toolkit/8.0.0/x86_64-linux/ascend_toolkit_install.info owner does not match the current owner.
warnings.warn(f"Warning: The {path} owner does not match the current owner.")
INFO 04-24 11:04:32 __init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 04-24 11:04:32 __init__.py:32] name=ascend, value=vllm_ascend:register
INFO 04-24 11:04:32 __init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 04-24 11:04:32 __init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-24 11:04:32 __init__.py:44] plugin ascend loaded.
INFO 04-24 11:04:32 __init__.py:198] Platform plugin ascend is activated
INFO 04-24 11:04:32 __init__.py:30] Available plugins for group vllm.general_plugins:
INFO 04-24 11:04:32 __init__.py:32] name=ascend_enhanced_model, value=vllm_ascend:register_model
INFO 04-24 11:04:32 __init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 04-24 11:04:32 __init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 04-24 11:04:32 __init__.py:44] plugin ascend_enhanced_model loaded.
WARNING 04-24 11:04:32 _custom_ops.py:21] Failed to import from vllm._C with ImportError('libcudart.so.12: cannot open shared object file: No such file or directory')
INFO 04-24 11:04:33 importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
[2025-04-24 11:04:33,706] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to npu (auto detect)
Traceback (most recent call last):
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1967, in _get_module
return importlib.import_module("." + module_name, self.__name__)
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/transformers/modeling_utils.py", line 158, in <module>
import deepspeed
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/deepspeed/__init__.py", line 22, in <module>
from . import module_inject
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/deepspeed/module_inject/__init__.py", line 6, in <module>
from .replace_module import replace_transformer_layer, revert_transformer_layer, ReplaceWithTensorSlicing, GroupQuantizer, generic_injection
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 607, in <module>
from ..pipe import PipelineModule
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/deepspeed/pipe/__init__.py", line 6, in <module>
from ..runtime.pipe import PipelineModule, LayerSpec, TiedLayerSpec
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/deepspeed/runtime/pipe/__init__.py", line 6, in <module>
from .module import PipelineModule, LayerSpec, TiedLayerSpec
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/deepspeed/runtime/pipe/module.py", line 19, in <module>
from ..activation_checkpointing import checkpointing
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 26, in <module>
from deepspeed.runtime.config import DeepSpeedConfig
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/deepspeed/runtime/config.py", line 41, in <module>
from ..elasticity import (
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/deepspeed/elasticity/__init__.py", line 10, in <module>
from .elastic_agent import DSElasticAgent
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/deepspeed/elasticity/elastic_agent.py", line 9, in <module>
from torch.distributed.elastic.agent.server.api import log, _get_socket_with_port
ImportError: cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "./ascend/example_vllm.py", line 13, in <module>
llm = LLM(model="Qwen/Qwen2.5-0.5B-Instruct")
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/vllm/utils.py", line 1022, in inner
return fn(*args, **kwargs)
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 212, in __init__
engine_args = EngineArgs(
File "<string>", line 107, in __init__
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 235, in __post_init__
load_general_plugins()
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/vllm/plugins/__init__.py", line 82, in load_general_plugins
func()
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/vllm_ascend/__init__.py", line 28, in register_model
register_model()
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/vllm_ascend/models/__init__.py", line 5, in register_model
from .qwen2_vl import CustomQwen2VLForConditionalGeneration # noqa: F401
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/vllm_ascend/models/qwen2_vl.py", line 32, in <module>
from vllm.model_executor.models.qwen2_vl import (
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_vl.py", line 55, in <module>
from vllm.model_executor.model_loader.weight_utils import default_weight_loader
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/vllm/model_executor/model_loader/__init__.py", line 6, in <module>
from vllm.model_executor.model_loader.loader import (BaseModelLoader,
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/vllm/model_executor/model_loader/loader.py", line 45, in <module>
from vllm.model_executor.model_loader.utils import (ParamMapping,
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/vllm/model_executor/model_loader/utils.py", line 35, in <module>
module: Optional[transformers.PreTrainedModel] = None) -> bool:
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1955, in __getattr__
module = self._get_module(self._class_to_module[name])
File "/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1969, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'log' from 'torch.distributed.elastic.agent.server.api' (/data/miniconda3/envs/ascend-3.10.14/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py)
[ERROR] 2025-04-24-11:04:34 (PID:2035, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception
出问题时, 我的deepspeed版本是:deepspeed==0.13.1
两个解决方法:
- 卸载deepspeed, vllm就不会用这个后端. 解决问题.
pip uninstall deepspeed
example示例 耗时:real 0m35.842s user 0m51.076s sys 0m7.987sreal 0m35.958s user 0m48.466s sys 0m7.486s因为我这里只是一个简单的推理例子, 所以使用哪个区别都不大. - 或者升级deepspeed, 这个问题官网也有, 推荐是升级解决. 我这里升级到0.16.7
pip install deepspeed==0.16.7
example示例 耗时: