vllm报错out of memory解决

通过调低–gpu-memory-utilization的比例(默认为0.9),可以避免此问题

model = LLM(
    args.model_name_or_path,
    trust_remote_code=True,
    tensor_parallel_size=num_gpus,
    max_model_len = 2048,
    gpu_memory_utilization=0.8
)