在华为昇腾服务器Ascend 300I Pro 310P芯片( 310P3)安装QWQ32B大模型以及deepseek蒸馏版!

前提条件:服务器已安装docker
1.下载镜像: 1.0.0-300I-Duo-py311-openeuler24.03-lts
备注:官网镜像下载,需要申请,审批还得1,2天,这时你肯定想骂HW!没事,我已为您准备好了:请发私信!
申请地址: https://www.hiascend.com/developer/ascendhub/detail/af85b724a7e5469ebd7ea13c3439d48f
在这里插入图片描述
2.下载模型:魔乐社区https://modelers.cn/models/Models_Ecosystem/QwQ-32B

服务器上安装社区下载的比较快:

  pip install  modelscope
 
 modelscope download "Qwen/QwQ-32B" --local_dir "/home/models/qwq"

注意事项:模型上传到服务器需要给于模型下config.json权限

chmod   750   config.json

3.docker 启动

注意映射的模型文件到服务器中:

docker run -it -d --net=host --shm-size=50g     --privileged     --name qwq-i     --device=/dev/davinci_manager     --device=/dev/hisi_hdc     --device=/dev/devmm_svm     -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro     -v /usr/local/sbin:/usr/local/sbin:ro     -v /home/models/qwq:/home/models/qwq:rw     swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:1.0.0-300I-Duo-py311-openeuler24.03-lts

4.进入docker容器中(以下的操作全部是在docker容器中

编辑配置文件:

注意点:
ipAddress: 本地服务器IP
httpsEnabled : false, 关闭https
modelName:模型名称
modelWeightPath:模型路径(容器内的)
npuDeviceIds:显卡ID (根据自己情况,npu-smi info 查看)

vim /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
{
    
    
    "Version" : "1.1.0",
    "LogConfig" :
    {
    
    
        "logLevel" : "Info",
        "logFileSize" : 20,
        "logFileNum" : 20,
        "logPath" : "logs/mindservice.log"
    },

    "ServerConfig" :
    {
    
    
        "ipAddress" : "192.168.0.203",
        "managementIpAddress" : "127.0.0.2",
        "port" : 1025,
        "managementPort" : 1026,
        "metricsPort" : 1027,
        "allowAllZeroIpListening" : false,
        "maxLinkNum" : 1000,
        "httpsEnabled" : false,
        "fullTextEnabled" : false,
        "tlsCaPath" : "security/ca/",
        "tlsCaFile" : ["ca.pem"],
        "tlsCert" : "security/certs/server.pem",
        "tlsPk" : "security/keys/server.key.pem",
        "tlsPkPwd" : "security/pass/key_pwd.txt",
        "tlsCrlPath" : "security/certs/",
        "tlsCrlFiles" : ["server_crl.pem"],
        "managementTlsCaFile" : ["management_ca.pem"],
        "managementTlsCert" : "security/certs/management/server.pem",
        "managementTlsPk" : "security/keys/management/server.key.pem",
        "managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
        "managementTlsCrlPath" : "security/management/certs/",
        "managementTlsCrlFiles" : ["server_crl.pem"],
        "kmcKsfMaster" : "tools/pmt/master/ksfa",
        "kmcKsfStandby" : "tools/pmt/standby/ksfb",
        "inferMode" : "standard",
        "interCommTLSEnabled" : true,
        "interCommPort" : 1121,
        "interCommTlsCaPath" : "security/grpc/ca/",
        "interCommTlsCaFiles" : ["ca.pem"],
        "interCommTlsCert" : "security/grpc/certs/server.pem",
        "interCommPk" : "security/grpc/keys/server.key.pem",
        "interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
        "interCommTlsCrlPath" : "security/grpc/certs/",
        "interCommTlsCrlFiles" : ["server_crl.pem"],
        "openAiSupport" : "vllm"
    },

    "BackendConfig" : {
    
    
        "backendName" : "mindieservice_llm_engine",
        "modelInstanceNumber" : 1,
        "npuDeviceIds" : [[0,1,2,3]],
        "tokenizerProcessNumber" : 8,
        "multiNodesInferEnabled" : false,
        "multiNodesInferPort" : 1120,
        "interNodeTLSEnabled" : true,
        "interNodeTlsCaPath" : "security/grpc/ca/",
        "interNodeTlsCaFiles" : ["ca.pem"],
        "interNodeTlsCert" : "security/grpc/certs/server.pem",
        "interNodeTlsPk" : "security/grpc/keys/server.key.pem",
        "interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
        "interNodeTlsCrlPath" : "security/grpc/certs/",
        "interNodeTlsCrlFiles" : ["server_crl.pem"],
        "interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
        "interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
        "ModelDeployConfig" :
        {
    
    
            "maxSeqLen" : 32580,
            "maxInputTokenLen" : 30000,
            "truncation" : false,
            "ModelConfig" : [
                {
    
    
                    "modelInstanceType" : "Standard",
                    "modelName" : "qwen",
                    "modelWeightPath" : "/home/models/qwq",
                    "worldSize" : 4,
                    "cpuMemSize" : 5,
                    "npuMemSize" : -1,
                    "backendType" : "atb",
                    "trustRemoteCode" : false
                }
            ]
        },

        "ScheduleConfig" :
        {
    
    
            "templateType" : "Standard",
            "templateName" : "Standard_LLM",
            "cacheBlockSize" : 128,

            "maxPrefillBatchSize" : 50,
            "maxPrefillTokens" : 30000,
            "prefillTimeMsPerReq" : 150,
            "prefillPolicyType" : 0,

            "decodeTimeMsPerReq" : 50,
            "decodePolicyType" : 0,

            "maxBatchSize" : 200,
            "maxIterTimes" : 4096,
            "maxPreemptCount" : 0,
            "supportSelectBatch" : false,
            "maxQueueDelayMicroseconds" : 5000
        }
    }
}

启动

cd /usr/local/Ascend/mindie/latest/mindie-service/bin
./mindieservice_daemon

看到如下界面就启动成功了!

在这里插入图片描述

测试:如果防火墙没关,请放开1025端口!

sudo firewall-cmd --permanent --add-port=1025/tcp

sudo firewall-cmd --reload

接口地址:
post:http://192.168.0.202:1025/v1/chat/completions

{
    
    
"model": "qwen",                                     
"messages": [{
    
    "role": "user", "content": "你是谁"}], 
"max_tokens": 32768,                                  
"stream": false                 
}

在这里插入图片描述
显卡使用情况:达到88%
在这里插入图片描述
deepseek:
310P 芯片仅支持FP16精度,并不支持BF16或INT8等数据类型,因此需要到模型权重文件中修改config.json:
和上述的操作一致:只需要将下载的模型的config.json中的 dtype改为:float16后保存
在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/ssp584731180/article/details/146158459
今日推荐