How to Run LLaMA 3 with Ollama

Meta Llama 3 is the state-of-the-art , available in both 8B and 70B parameter sizes. Let's see how to run Llama 3 with Ollama.

What's LLaMA 3?

Meta Llama 3: The most capable openly available LLM to date

LLaMA 3 is a type of artificial intelligence (AI) model developed by Meta AI, a research laboratory that focuses on natural language processing (NLP) and other AI-related areas.

What makes LLaMA 3 special is its ability to understand and respond to a wide range of topics and questions, often with a high degree of accuracy and coherence. It's been trained on a massive dataset of text from the internet and can adapt to different contexts and styles.

Key features of LLaMA 3

LLaMA 3 has many potential applications, such as chatbots, virtual assistants, language translation, and content generation. It's an exciting development in the field of AI, and I'm happy to chat with you more about it!

Conversational dialogue: LLaMA 3 can engage in natural-sounding conversations, using context and understanding to respond to questions and statements.

Knowledge retrieval: It can access a vast knowledge base to provide accurate information on a wide range of topics.

Common sense: LLaMA 3 has been designed to understand common sense and real-world concepts, making its responses more relatable and human-like.

Fine-tuned and optimized: Llama 3 instruction-tuned models are fine-tuned and optimized for dialogue/chat use cases and outperform many of the available open-source chat models on common benchmarks.

Meta Llama 3 Instruct model performance
Meta Llama 3 Pre-trained model performance

The most capable model

Llama 3 represents a large improvement over Llama 2 and other openly available models:

Trained on a dataset seven times larger than Llama 2

Double the context length of 8K from Llama 2

Encodes language much more efficiently using a larger token vocabulary with 128K tokens

Less than 1⁄3 of the false “refusals” when compared to Llama 2

How to run LLaMA 3 with Ollama

Llama 3 is now available to run using Ollama. To get started, Download Ollama and run Llama 3.

CLI

Open the terminal and run ollama run llama3

The initial release of Llama 3 includes two sizes:8B and 70B parameters:

# 8B Parameters
ollama run llama3:8b

# 70B Parameters
ollama run llama3:70b

API

Example using curl:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt":"Why is the sky blue?"
 }'

Model variants

Instruct is fine-tuned for chat/dialogue use cases. Example:

ollama run llama3
ollama run llama3:70b

Pre-trained is the base model. Example:

ollama run llama3:text
ollama run llama3:70b-text

References

Additional - Some Good GPU Plans for Ollama AI
Summer Sale

Professional GPU VPS - A4000

90.3/mo
Save 50% (Was $179.00)
1mo3mo12mo24mo
Order Now
  • 32GB RAM
  • 24 CPU Cores
  • 320GB SSD
  • 300Mbps Unmetered Bandwidth
  • Once per 2 Weeks Backup
  • OS: Linux / Windows 10
  • Dedicated GPU: Quadro RTX A4000
  • CUDA Cores: 6,144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS
  • Available for Rendering, AI/Deep Learning, Data Science, CAD/CGI/DCC.

Advanced GPU - A4000

209.00/mo
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro RTX A4000
  • Microarchitecture: Ampere
  • Max GPUs: 2
  • CUDA Cores: 6144
  • Tensor Cores: 192
  • GPU Memory: 16GB GDDR6
  • FP32 Performance: 19.2 TFLOPS
  • Good Choice for 3D Rendering, Video Editing, AI/Deep Learning, Data Science, etc
Summer Sale

Advanced GPU - A5000

242.1/mo
Save 31% (Was $349.00)
1mo3mo12mo24mo
Order Now
  • 128GB RAM
  • Dual 12-Core E5-2697v2
  • 240GB SSD + 2TB SSD
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro RTX A5000
  • Microarchitecture: Ampere
  • Max GPUs: 2
  • CUDA Cores: 8192
  • Tensor Cores: 256
  • GPU Memory: 24GB GDDR6
  • FP32 Performance: 27.8 TFLOPS

Enterprise GPU - RTX A6000

409.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia Quadro RTX A6000
  • Microarchitecture: Ampere
  • Max GPUs: 1
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS

Enterprise GPU - A40

439.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: Nvidia A40
  • Microarchitecture: Ampere
  • Max GPUs: 1
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 37.48 TFLOPS
New Arrival

Multi-GPU - 3xRTX A5000

539.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: 3 x Quadro RTX A5000
  • Microarchitecture: Ampere
  • Max GPUs: 3
  • CUDA Cores: 8192
  • Tensor Cores: 256
  • GPU Memory: 24GB GDDR6
  • FP32 Performance: 27.8 TFLOPS
New Arrival

Multi-GPU - 3xRTX A6000

899.00/mo
1mo3mo12mo24mo
Order Now
  • 256GB RAM
  • Dual 18-Core E5-2697v4
  • 240GB SSD + 2TB NVMe + 8TB SATA
  • 100Mbps-1Gbps
  • OS: Windows / Linux
  • GPU: 3 x Quadro RTX A6000
  • Microarchitecture: Ampere
  • Max GPUs: 3
  • CUDA Cores: 10,752
  • Tensor Cores: 336
  • GPU Memory: 48GB GDDR6
  • FP32 Performance: 38.71 TFLOPS