高性能应用服务 HAI

一、环境说明

本环境预装 DeepSeek-v3 0324版，支持function call，仅支持在高性能应用服务HAI“八卡旗舰型”算力套餐上运行。“八卡旗舰型”算力套餐需开白使用，若有使用需求需提工单进行审核申请。

二、使用说明

该环境为基础镜像环境，环境中包含两个容器环境：

1. DeepSeek-v3 0324模型容器

2. AnythingLLM 项目容器

实例创建后，会自动运行上述两个容器。由于deepseek-v3模型较大，首次加载需30分钟左右完成，加载完成后方可开始使用。

2.1 DeepSeek-v3 模型容器使用指引

1. DeepSeek容器进入

代码语言：javascript代码运行次数：0运行复制

sudo docker exec -it deepseek-v3 bash

2. 模型加载进度查看。模型首次加载耗时约30分钟，您可进入容器后输入如下命令查看加载进度。

代码语言：javascript代码运行次数：0运行复制

tail -f /cfs/ds3_infer.log

3. 您可输入如下命令查看显存占用情况

代码语言：javascript代码运行次数：0运行复制

nvidia-smi

2.2 API调用格式

DeepSeek v3采用sglang框架部署，兼容openai调用格式，您可在模型加载完成后，使用api进行调用测试。注意，要将127.0.0.1替换为您实例的公网ip。

代码语言：javascript代码运行次数：0运行复制

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ''" \
  -d '{
    "model": "/cfs",
    "messages": [
      {"role": "user", "content": "天为什么是蓝的？"}
    ],
    "temperature": 0.7
}' \
  "http://127.0.0.1:6399/v1/chat/completions"

2.3 function call 调用方式

代码语言：javascript代码运行次数：0运行复制

curl "http://127.0.0.1:6399/v1/chat/completions" -H "Content-Type: application/json" -d '{
    "temperature": 0,
    "max_tokens": 100,
    "model": "/cfs",
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "query_weather",
                "description": "Get weather of an city, the user should supply a city first",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {
                            "type": "string",
                            "description": "The city, e.g. Beijing"
                        }
                    },
                    "required": [
                        "city"
                    ]
                }
            }
        }
    ],
    "messages": [
        {
            "role": "user",
            "content": "Hows the weather like in Qingdao today"
        }
    ]
}'

可以看到输出

代码语言：javascript代码运行次数：0运行复制

{"id":"cb97767a4b0d4333a9f0cf6c5758d3d2","object":"chatpletion","created":1745570792,"model":"/cfs","choices":[{"index":0,"message":{"role":"assistant","content":null,"reasoning_content":null,"tool_calls":[{"id":"0","type":"function","function":{"name":"query_weather","arguments":"{\"city\": \"Qingdao\"}"}}]},"logprobs":null,"finish_reason":"tool_calls","matched_stop":null}],"usage":{"prompt_tokens":123,"total_tokens":145,"completion_tokens":22,"prompt_tokens_details":null}}

python代码

代码语言：javascript代码运行次数：0运行复制

import json
import random
from openai import OpenAI

def get_weather(location):
    # 模拟不同地点的天气情况
    weather_conditions = ["Sunny", "Cloudy", "Rainy", "Snowy"]
    temperature = random.randint(10, 30)  # 随机生成 10 到 30 度的温度

    # 这里简单返回一个字符串表示天气信息
    weather_info = f"The weather in {location} is {random.choice(weather_conditions)} with a temperature of {temperature}°C."
    return weather_info

def send_messages(messages):
    response = client.chatpletions.create(
        model="/cfs",
        messages=messages,
        tools=tools
    )
    return response.choices[0].message

client = OpenAI(
    api_key="your_api_key",
    base_url="http://127.0.0.1:6399/v1",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather of an location, the user shoud supply a location first",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA",
                    }
                },
                "required": ["location"]
            },
        }
    },
]

messages = [{"role": "user", "content": "How's the weather in Hangzhou?"}]
message = send_messages(messages)
print(f"User>\t {messages[0]['content']}")

tool = message.tool_calls[0]
tools_map = {
    "get_weather": get_weather
}
result = tools_map[tool.function.name](**json.loads(tool.function.arguments))

messages.append({"role": "tool", "tool_call_id": tool.id, "content": result})
message = send_messages(messages)
print(f"Model>\t {message.content}")

2.2 AnythingLLM使用指引

AnythingLLM提供了可视化模型交互页面，您可使用AnythingLLM快速测试体验模型效果。

实例创建完成后，会自动拉起 AnythingLLM。您可使用实例公网ip:6889端口的方式，连接 AnythingLLM。

启动后，需先进行简单的初始化配置。

LLM providers选择 Local AI。
Local AI Base URL处替换为：本机的实例公网ip:6399/v1，修改完成后会自动选中671B模型。chat model sleelection处展示“/cfs”即符合预期。
其余选项按需配置，注意，若您设定密码，请对密码进行保存，重置密码流程较为复杂。

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

高性能应用服务 HAI