LlamaGPT is a self-hosted, offline, ChatGPT-like chatbot powered by Llama 2. 100% private, no data leaves your device.
Recommendation: Use NSDT editor to quickly build programmable 3D scenes
1. How to install LlamaGPT
LlamaGPT can be installed on any x86 or arm64 system.
First make sure you have Docker installed. Then, clone this repository and change into the directory:
git clone https://github.com/getumbrel/llama-gpt.git
cd llama-gpt
Now you can run LlamaGPT with any of the following models, depending on your hardware:
model size | model used | Minimum RAM required | How to start LlamaGPT |
---|---|---|---|
7b | Nous Hermes Llama 2 7B (GGML q4_0) | 8GB | docker compose up -d |
13b | Nous Hermes Llama 2 13B (GGML q4_0) | 16GB | docker compose -f docker-compose-13b.yml up -d |
70b | Meta Flame 2 70B Chat (GGML q4_0) | 48GB | docker compose -f docker-compose-70b.yml up -d |
LlamaGPT can be http://localhost:3000
accessed via .
To stop LlamaGPT, run:
docker compose down
2. Benchmark test
We tested the LlamaGPT model on the following hardware with the default system prompt and the user prompt: "How is the universe expanding?" Deterministic results are guaranteed when the temperature is set to 0. Build speed is an average of the previous 10 generations.
- Nous Hermes Llama 2 7B (GGML q4_0)
equipment | Generation speed |
---|---|
M1 Max MacBook Pro (10 64GB RAM) | 8.2 tokens/second |
Umbrel Home (16GB RAM) | 2.7 tokens/second |
Raspberry Pi 4 (8GB RAM) | 0.9 tokens/second |
- Nous Hermes Llama 2 13B (GGML q4_0)
equipment | Generation speed |
---|---|
M1 Max MacBook Pro (64GB RAM) | 3.7 tokens/second |
Umbrel Home (16GB RAM) | 1.5 tokens/second |
- Meta Llama 2 70B Chat (GGML q4_0)
Unfortunately, we don't have any benchmarks for this model yet.
Original link: LlamaGPT self-hosted chatbot — BimAnt