LlamaGPT - Self-hosted chatgpt-like chatbot based on Llama 2

LlamaGPT is a self-hosted, offline, ChatGPT-like chatbot powered by Llama 2. 100% private, no data leaves your device.

insert image description here

Recommendation: Use NSDT editor to quickly build programmable 3D scenes

1. How to install LlamaGPT

LlamaGPT can be installed on any x86 or arm64 system.

First make sure you have Docker installed. Then, clone this repository and change into the directory:

git clone https://github.com/getumbrel/llama-gpt.git
cd llama-gpt

Now you can run LlamaGPT with any of the following models, depending on your hardware:

model size model used Minimum RAM required How to start LlamaGPT
7b Nous Hermes Llama 2 7B (GGML q4_0) 8GB docker compose up -d
13b Nous Hermes Llama 2 13B (GGML q4_0) 16GB docker compose -f docker-compose-13b.yml up -d
70b Meta Flame 2 70B Chat (GGML q4_0) 48GB docker compose -f docker-compose-70b.yml up -d

LlamaGPT can be http://localhost:3000accessed via .

To stop LlamaGPT, run:

docker compose down

2. Benchmark test

We tested the LlamaGPT model on the following hardware with the default system prompt and the user prompt: "How is the universe expanding?" Deterministic results are guaranteed when the temperature is set to 0. Build speed is an average of the previous 10 generations.

  • Nous Hermes Llama 2 7B (GGML q4_0)
equipment Generation speed
M1 Max MacBook Pro (10 64GB RAM) 8.2 tokens/second
Umbrel Home (16GB RAM) 2.7 tokens/second
Raspberry Pi 4 (8GB RAM) 0.9 tokens/second
  • Nous Hermes Llama 2 13B (GGML q4_0)
equipment Generation speed
M1 Max MacBook Pro (64GB RAM) 3.7 tokens/second
Umbrel Home (16GB RAM) 1.5 tokens/second
  • Meta Llama 2 70B Chat (GGML q4_0)

Unfortunately, we don't have any benchmarks for this model yet.


Original link: LlamaGPT self-hosted chatbot — BimAnt

Guess you like

Origin blog.csdn.net/shebao3333/article/details/132384070