前提条件:服务器已安装docker
1.下载镜像: 1.0.0-300I-Duo-py311-openeuler24.03-lts
备注:官网镜像下载,需要申请,审批还得1,2天,这时你肯定想骂HW!没事,我已为您准备好了:请发私信!
申请地址: https://www.hiascend.com/developer/ascendhub/detail/af85b724a7e5469ebd7ea13c3439d48f
2.下载模型:魔乐社区(https://modelers.cn/models/Models_Ecosystem/QwQ-32B)
服务器上安装社区下载的比较快:
pip install modelscope
modelscope download "Qwen/QwQ-32B" --local_dir "/home/models/qwq"
注意事项:模型上传到服务器需要给于模型下config.json权限
chmod 750 config.json
3.docker 启动
注意映射的模型文件到服务器中:
docker run -it -d --net=host --shm-size=50g --privileged --name qwq-i --device=/dev/davinci_manager --device=/dev/hisi_hdc --device=/dev/devmm_svm -v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro -v /usr/local/sbin:/usr/local/sbin:ro -v /home/models/qwq:/home/models/qwq:rw swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:1.0.0-300I-Duo-py311-openeuler24.03-lts
4.进入docker容器中(以下的操作全部是在docker容器中)
编辑配置文件:
注意点:
ipAddress: 本地服务器IP
httpsEnabled : false, 关闭https
modelName:模型名称
modelWeightPath:模型路径(容器内的)
npuDeviceIds:显卡ID (根据自己情况,npu-smi info 查看)
vim /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
{
"Version" : "1.1.0",
"LogConfig" :
{
"logLevel" : "Info",
"logFileSize" : 20,
"logFileNum" : 20,
"logPath" : "logs/mindservice.log"
},
"ServerConfig" :
{
"ipAddress" : "192.168.0.203",
"managementIpAddress" : "127.0.0.2",
"port" : 1025,
"managementPort" : 1026,
"metricsPort" : 1027,
"allowAllZeroIpListening" : false,
"maxLinkNum" : 1000,
"httpsEnabled" : false,
"fullTextEnabled" : false,
"tlsCaPath" : "security/ca/",
"tlsCaFile" : ["ca.pem"],
"tlsCert" : "security/certs/server.pem",
"tlsPk" : "security/keys/server.key.pem",
"tlsPkPwd" : "security/pass/key_pwd.txt",
"tlsCrlPath" : "security/certs/",
"tlsCrlFiles" : ["server_crl.pem"],
"managementTlsCaFile" : ["management_ca.pem"],
"managementTlsCert" : "security/certs/management/server.pem",
"managementTlsPk" : "security/keys/management/server.key.pem",
"managementTlsPkPwd" : "security/pass/management/key_pwd.txt",
"managementTlsCrlPath" : "security/management/certs/",
"managementTlsCrlFiles" : ["server_crl.pem"],
"kmcKsfMaster" : "tools/pmt/master/ksfa",
"kmcKsfStandby" : "tools/pmt/standby/ksfb",
"inferMode" : "standard",
"interCommTLSEnabled" : true,
"interCommPort" : 1121,
"interCommTlsCaPath" : "security/grpc/ca/",
"interCommTlsCaFiles" : ["ca.pem"],
"interCommTlsCert" : "security/grpc/certs/server.pem",
"interCommPk" : "security/grpc/keys/server.key.pem",
"interCommPkPwd" : "security/grpc/pass/key_pwd.txt",
"interCommTlsCrlPath" : "security/grpc/certs/",
"interCommTlsCrlFiles" : ["server_crl.pem"],
"openAiSupport" : "vllm"
},
"BackendConfig" : {
"backendName" : "mindieservice_llm_engine",
"modelInstanceNumber" : 1,
"npuDeviceIds" : [[0,1,2,3]],
"tokenizerProcessNumber" : 8,
"multiNodesInferEnabled" : false,
"multiNodesInferPort" : 1120,
"interNodeTLSEnabled" : true,
"interNodeTlsCaPath" : "security/grpc/ca/",
"interNodeTlsCaFiles" : ["ca.pem"],
"interNodeTlsCert" : "security/grpc/certs/server.pem",
"interNodeTlsPk" : "security/grpc/keys/server.key.pem",
"interNodeTlsPkPwd" : "security/grpc/pass/mindie_server_key_pwd.txt",
"interNodeTlsCrlPath" : "security/grpc/certs/",
"interNodeTlsCrlFiles" : ["server_crl.pem"],
"interNodeKmcKsfMaster" : "tools/pmt/master/ksfa",
"interNodeKmcKsfStandby" : "tools/pmt/standby/ksfb",
"ModelDeployConfig" :
{
"maxSeqLen" : 32580,
"maxInputTokenLen" : 30000,
"truncation" : false,
"ModelConfig" : [
{
"modelInstanceType" : "Standard",
"modelName" : "qwen",
"modelWeightPath" : "/home/models/qwq",
"worldSize" : 4,
"cpuMemSize" : 5,
"npuMemSize" : -1,
"backendType" : "atb",
"trustRemoteCode" : false
}
]
},
"ScheduleConfig" :
{
"templateType" : "Standard",
"templateName" : "Standard_LLM",
"cacheBlockSize" : 128,
"maxPrefillBatchSize" : 50,
"maxPrefillTokens" : 30000,
"prefillTimeMsPerReq" : 150,
"prefillPolicyType" : 0,
"decodeTimeMsPerReq" : 50,
"decodePolicyType" : 0,
"maxBatchSize" : 200,
"maxIterTimes" : 4096,
"maxPreemptCount" : 0,
"supportSelectBatch" : false,
"maxQueueDelayMicroseconds" : 5000
}
}
}
启动
cd /usr/local/Ascend/mindie/latest/mindie-service/bin
./mindieservice_daemon
看到如下界面就启动成功了!
测试:如果防火墙没关,请放开1025端口!
sudo firewall-cmd --permanent --add-port=1025/tcp
sudo firewall-cmd --reload
接口地址:
post:http://192.168.0.202:1025/v1/chat/completions
{
"model": "qwen",
"messages": [{
"role": "user", "content": "你是谁"}],
"max_tokens": 32768,
"stream": false
}
显卡使用情况:达到88%
deepseek:
310P 芯片仅支持FP16精度,并不支持BF16或INT8等数据类型,因此需要到模型权重文件中修改config.json:
和上述的操作一致:只需要将下载的模型的config.json中的 dtype改为:float16后保存