How To Install TextGen WebUI and Use ANY MODEL Locally!



What is TextGen WebUI?

The Oobabooga Text-generation WebUI is an awesome open-source Web interface that allows you to run any open-source AI LLM models on your local computer for absolutely free! It provides a user-friendly interface to interact with these models and generate text, with features such as model switching, notebook mode, chat mode, and more. Next let's see how to install Oobabooga and use it to running Llama 2 locally or on a remote server.

Features of TextGen WebUI

- User-friendly interface: Oobabooga provides a simple and intuitive interface for generating text. Users can simply enter a prompt and select the desired LLM and generation settings.

- Support for multiple LLMs: Oobabooga supports a variety of LLMs, including GPT-3, GPT-J, and BLOOM. This allows users to choose the LLM that is best suited for their needs.

- Advanced generation settings: Oobabooga provides a number of advanced generation settings that allow users to control the quality and style of the generated text. These settings include temperature, top-p, and repetition penalty.

- Real-time feedback: Oobabooga provides real-time feedback on the generated text. This allows users to see how the text is changing as they adjust the generation settings.

- Code generation: Oobabooga can be used to generate code in a variety of programming languages. This makes it a valuable tool for developers and programmers.

System Requirements

Despite being the smallest 7B parameter model, it demands significant hardware resources for smooth operation. Keep in mind that GPU memory (VRAM) is crucial. You might be able to manage with lower-spec hardware, though performance was extremely slow.

The system requirements for installing TextGen WebUI are as follows:

- OS: Windows 10 or later, or Ubuntu 18.04 or later

- RAM: 8GB for 7B models, 16GB for 13B models, 32GB for 30B models, 64GB+ recommended

- CPU: Core 4+, Support AVX2 recommended

- GPU: Optional, if need GPU acceleration，16GB+ VRAM recommended

8 Steps to Install TextGen WebUI and Run Model Locally

There are different installation methods available, including one-click installers for Windows, Linux, and macOS, as well as manual installation using Conda. Detailed installation instructions can be found in the Text Generation Web UI repository. Below we will demonstrate step by step how to install it on an A5000 GPU Ubuntu Linux server.

Prerequisites

Before you begin this guide, you should have a regular, non-root user with sudo privileges and a basic firewall configured on your server. When you have an account available, log in as your non-root user to begin.

Step 1. Clone or Download Oobabooga Text Generation WebUI

First, let's download the Oobabooga installation code. There are two ways, one is to use git clone Oobabooga project code directly, the other is to download the Oobabooga zip package and then unzip it.

# way 1 - clone the Oobabooga git repo
$ git clone https://github.com/oobabooga/text-generation-webui.git

# way 2 - download Oobabooga zip package and unzip it
$ wget https://github.com/oobabooga/text-generation-webui/archive/refs/heads/main.zip
$ unzip main.zip

Step 2. Start Installing the TextGen WebUI Oobabooga

Enter the text-generation-webui directory (or text-generation-webui-main), and then execute the sudo start_linux.sh script. Note that different scripts need to be selected on different systems.

$ cd text-generation-webui/
$ sudo ./start_linux.sh

Step 3. Select your GPU Vendor When Asked

Since we are using NVIDIA's RTX A5000 graphics card, choose A here.

Step 4. Select NVIDIA CUDA Version

Step 5. Wait for the Automatic Installation to Complete

Next, we will enter the automatic installation process of Pytorch. All we have to do is wait.

The installation process takes about ten minutes. The output after the installation is completed is as follows:

Step 6. Download Llama 2 Models

After the installation is complete, you need to download the Llama 2 models before you can actually use it. The models should be placed in the folder text-generation-webui/models. They are usually downloaded from Hugging Face. Use wget to download a model from Hugging Face.

$ cd text-generation-webui/models/
$ wget https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf?download=true -O ./llama-2-7b-chat.Q4_K_M.gguf

Note: GGUF models are a single file and should be placed directly into models.

It is also possible to download it via the command-line with download-model.py script.

# python download-model.py organization/model
$ python3 download-model.py TheBloke/Llama-2-7B-Chat-GGUF

Note: Run python download-model.py --help to see all the options.

The above are two ways to download the model using the command line. We can also download it in the TexGen WebUI. Another way to download, you can use the "Model" tab of the UI to download the model from Hugging Face automatically.

When the prompt content displays "Done", the Model download is completed. Click the refresh button on the right side of the Model selection bar in the picture below, and then click the drop-down arrow. We will find that the model we just downloaded is there.

Step 7. Select a Model and Load It

Select the model llama-2-7b-chat.Q4_K_M.gguf, select llama.cpp in the Model loader, and click Load. You will see the model loading success message:

load llama 2 7b model - llama-2-7b-chat.Q4_K_M.gguf

Successfully loaded llama-2-7b-chat.Q4_K_M.gguf.
It seems to be an instruction-following model with template "Llama-v2". In the chat tab, instruct or chat-instruct modes should be used.

Step 8. Successfully Run Llama 2 Online, Enjoy It

Click the Chat tab in the upper left corner to enter the chat page, and then you can ask any questions you want.

Conclusion

This article shows how to install textgen webui Oobabooga to run Llama 2 locally or on a remote server. Oobabooga is a text-generation WebUI with a Chatbot where you can provide input prompts per your requirement. It is a different model that cannot be compared to any other Chatbot. The text-generation WebUI is more economical if you want to generate text using a chatbot model.

GPU Mart provides professional GPU hosting services optimized for high-performance computing projects. Here we recommend some Bare metal GPU server solutions suitable for running LLama 2 online. Choose the appropriate plan according to the model you want to use. For example, Llama 2 7B recommends using an 8GB graphics card, Llama 2 13B uses a 16GB or 24GB graphics card, and Llama 2 70B uses a 48GB and above graphics card. You can start your journey at any time and we will be happy to help you with any difficulties.

New Launch Offer

Professional GPU VPS- RTX Pro 2000

$ 95.20/mo

20% OFF Recurring (Was $119.00)

1mo3mo12mo24mo

Order Now

30GB RAM
16 CPU Cores
240GB SSD
300Mbps Unmetered Bandwidth

Once per 2 Weeks Backup
OS: Linux / Windows 10/ Windows 11
Dedicated GPU: Nvidia RTX Pro 2000
CUDA Cores: 4,352
Tensor Cores: 5th Gen
GPU Memory: 16GB GDDR7
FP32 Performance: 17 TFLOPS

New Launch Offer

Advanced GPU VPS- RTX Pro 4000

$ 159.00/mo

20% OFF Recurring (Was $199.00)

1mo3mo12mo24mo

Order Now

60GB RAM
24 CPU Cores
320GB SSD
500Mbps Unmetered Bandwidth

Once per 2 Weeks Backup
OS: Linux / Windows 10/ Windows 11
Dedicated GPU: Nvidia RTX Pro 4000
CUDA Cores: 8,960
Tensor Cores: 280
GPU Memory: 24GB GDDR7
FP32 Performance: 34 TFLOPS

New Launch Offer

Advanced GPU VPS- RTX Pro 5000

$ 261.75/mo

25% OFF Recurring (Was $349.00)

1mo3mo12mo24mo

Order Now

60GB RAM
24 CPU Cores
320GB SSD
500Mbps Unmetered Bandwidth

Once per 2 Weeks Backup
OS: Linux / Windows 10/ Windows 11
Dedicated GPU: Nvidia RTX Pro 5000
CUDA Cores: 14,080
Tensor Cores: 440
GPU Memory: 48GB GDDR7
FP32 Performance: 66.94 TFLOPS

New Launch Offer

Enterprise GPU VPS- RTX Pro 6000

$ 479.00/mo

20% OFF Recurring (Was $599.00)

1mo3mo12mo24mo

Order Now

90GB RAM
32 CPU Cores
400GB SSD
1000Mbps Unmetered Bandwidth

Once per 2 Weeks Backup
OS: Linux / Windows 10/ Windows 11
Dedicated GPU: Nvidia RTX Pro 6000
CUDA Cores: 24,064
Tensor Cores: 852
GPU Memory: 96GB GDDR7
FP32 Performance: 126 TFLOPS

Advanced GPU Dedicated Server - A4000

$ 209.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: Nvidia Quadro RTX A4000
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 6144
Tensor Cores: 192
GPU Memory: 16GB GDDR6
FP32 Performance: 19.2 TFLOPS

Advanced GPU Dedicated Server - A5000

$ 269.00/mo

1mo3mo12mo24mo

Order Now

128GB RAM
GPU: Nvidia Quadro RTX A5000
Dual 12-Core E5-2697v2
240GB SSD + 2TB SSD
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 8192
Tensor Cores: 256
GPU Memory: 24GB GDDR6
FP32 Performance: 27.8 TFLOPS

New Year Sale

Enterprise GPU Dedicated Server - RTX A6000

$ 356.00/mo

35% OFF Recurring (Was $549.00)

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: Nvidia Quadro RTX A6000
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 10,752
Tensor Cores: 336
GPU Memory: 48GB GDDR6
FP32 Performance: 38.71 TFLOPS

New Year Sale

Multi-GPU Dedicated Server - 3xRTX A5000

$ 349.50/mo

50% OFF Recurring (Was $699.00)

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: 3 x Quadro RTX A5000
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 8192
Tensor Cores: 256
GPU Memory: 24GB GDDR6
FP32 Performance: 27.8 TFLOPS

Multi-GPU Dedicated Server - 3xRTX A6000

$ 899.00/mo

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: 3 x Quadro RTX A6000
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 10,752
Tensor Cores: 336
GPU Memory: 48GB GDDR6
FP32 Performance: 38.71 TFLOPS

New Year Sale

Enterprise GPU Dedicated Server - A100

$ 399.50/mo

50% OFF Recurring (Was $799.00)

1mo3mo12mo24mo

Order Now

256GB RAM
GPU: Nvidia A100
Dual 18-Core E5-2697v4
240GB SSD + 2TB NVMe + 8TB SATA
100Mbps-1Gbps
OS: Windows / Linux

Single GPU Specifications:
Microarchitecture: Ampere
CUDA Cores: 6912
Tensor Cores: 432
GPU Memory: 40GB HBM2
FP32 Performance: 19.5 TFLOPS