How to Run LLaMA 3 with Ollama



What's LLaMA 3?

Meta Llama 3: The most capable openly available LLM to date

LLaMA 3 is a type of artificial intelligence (AI) model developed by Meta AI, a research laboratory that focuses on natural language processing (NLP) and other AI-related areas.

What makes LLaMA 3 special is its ability to understand and respond to a wide range of topics and questions, often with a high degree of accuracy and coherence. It's been trained on a massive dataset of text from the internet and can adapt to different contexts and styles.

Key features of LLaMA 3

LLaMA 3 has many potential applications, such as chatbots, virtual assistants, language translation, and content generation. It's an exciting development in the field of AI, and I'm happy to chat with you more about it!

Conversational dialogue: LLaMA 3 can engage in natural-sounding conversations, using context and understanding to respond to questions and statements.

Knowledge retrieval: It can access a vast knowledge base to provide accurate information on a wide range of topics.

Common sense: LLaMA 3 has been designed to understand common sense and real-world concepts, making its responses more relatable and human-like.

Fine-tuned and optimized: Llama 3 instruction-tuned models are fine-tuned and optimized for dialogue/chat use cases and outperform many of the available open-source chat models on common benchmarks.

Meta Llama 3 Pre-trained model performance

The most capable model

Llama 3 represents a large improvement over Llama 2 and other openly available models:

Trained on a dataset seven times larger than Llama 2

Double the context length of 8K from Llama 2

Encodes language much more efficiently using a larger token vocabulary with 128K tokens

Less than 1⁄3 of the false “refusals” when compared to Llama 2

How to run LLaMA 3 with Ollama

Llama 3 is now available to run using Ollama. To get started, Download Ollama and run Llama 3.

CLI

Open the terminal and run ollama run llama3

The initial release of Llama 3 includes two sizes:8B and 70B parameters:

# 8B Parameters
ollama run llama3:8b

# 70B Parameters
ollama run llama3:70b

API

Example using curl:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt":"Why is the sky blue?"
 }'

Ollama API documentation

Model variants

Instruct is fine-tuned for chat/dialogue use cases. Example:

ollama run llama3
ollama run llama3:70b

Pre-trained is the base model. Example:

ollama run llama3:text
ollama run llama3:70b-text

References

Introducing Meta Llama 3: The most capable openly available LLM to date

How to Install and Use Ollama WebUI on Windows

How to Run LLMs Locally with Ollama AI

Additional - Some Good GPU Plans for Ollama AI

Express GPU VPS - K620

$ 21.00/mo

1mo3mo12mo24mo

Order Now

12GB RAM
Dedicated GPU: Quadro K620
9 CPU Cores
160GB SSD
100Mbps Unmetered Bandwidth
OS: Linux / Windows 10/ Windows 11
Once per 4 Weeks Backup

Single GPU Specifications:
CUDA Cores: 384
GPU Memory: 2GB DDR3
FP32 Performance: 0.863 TFLOPS

Lite GPU Dedicated Server - K620

$ 49.00/mo

1mo3mo12mo24mo

Order Now

16GB RAM
GPU: Nvidia Quadro K620
Quad-Core Xeon E3-1270v3
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Maxwell
CUDA Cores: 384
GPU Memory: 2GB DDR3
FP32 Performance: 0.863 TFLOPS

For New Users Only – Limited Offer – First Come, First Served

Express GPU Dedicated Server - P620

$ 59.00/mo

1mo3mo12mo24mo

Order Now

32GB RAM
GPU: Nvidia Quadro P620
Eight-Core Xeon E5-2670
120GB + 960GB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Pascal
CUDA Cores: 512
GPU Memory: 2GB GDDR5
FP32 Performance: 1.5 TFLOPS

Professional GPU VPS - A4000

$ 129.00/mo

1mo3mo12mo24mo

Order Now

32GB RAM
24 CPU Cores
320GB SSD
300Mbps Unmetered Bandwidth

Once per 2 Weeks Backup
OS: Linux / Windows 10/ Windows 11
Dedicated GPU: Quadro RTX A4000
CUDA Cores: 6,144
Tensor Cores: 192
GPU Memory: 16GB GDDR6
FP32 Performance: 19.2 TFLOPS

Advanced GPU Dedicated Server - A5000

$ 269.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: Nvidia Quadro RTX A5000
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 8192
Tensor Cores: 256
GPU Memory: 24GB GDDR6
FP32 Performance: 27.8 TFLOPS

Enterprise GPU Dedicated Server - RTX A6000

$ 409.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: Nvidia Quadro RTX A6000
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 10,752
Tensor Cores: 336
GPU Memory: 48GB GDDR6
FP32 Performance: 38.71 TFLOPS

Multi-GPU Dedicated Server - 3xV100

$ 469.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: 3 x Nvidia V100
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Volta
CUDA Cores: 5,120
Tensor Cores: 640
GPU Memory: 16GB HBM2
FP32 Performance: 14 TFLOPS

Enterprise GPU Dedicated Server - A100

$ 639.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: Nvidia A100
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 6912
Tensor Cores: 432
GPU Memory: 40GB HBM2
FP32 Performance: 19.5 TFLOPS