LLaMA 2 is a family of generative text models that are optimized for assistant-like chat use cases or can be adapted for a variety of natural language generation tasks. It's a pre-trained and fine-tuned large language models (LLMs), ranging in scale from 7B to 70B parameters, from the AI group at Meta, the parent company of Facebook.
The Oobabooga Text-generation WebUI is an awesome open-source Web interface that allows you to run any open-source AI LLM models on your local computer for absolutely free! It provides a user-friendly interface to interact with these models and generate text, with features such as model switching, notebook mode, chat mode, and more. Next let's see how to install Oobabooga and use it to running Llama 2 locally or on a remote server.
- User-friendly interface: Oobabooga provides a simple and intuitive interface for generating text. Users can simply enter a prompt and select the desired LLM and generation settings.
- Support for multiple LLMs: Oobabooga supports a variety of LLMs, including GPT-3, GPT-J, and BLOOM. This allows users to choose the LLM that is best suited for their needs.
- Advanced generation settings: Oobabooga provides a number of advanced generation settings that allow users to control the quality and style of the generated text. These settings include temperature, top-p, and repetition penalty.
- Real-time feedback: Oobabooga provides real-time feedback on the generated text. This allows users to see how the text is changing as they adjust the generation settings.
- Code generation: Oobabooga can be used to generate code in a variety of programming languages. This makes it a valuable tool for developers and programmers.
Despite being the smallest 7B parameter model, it demands significant hardware resources for smooth operation. Keep in mind that GPU memory (VRAM) is crucial. You might be able to manage with lower-spec hardware, though performance was extremely slow.
The system requirements for installing Oobabooga are as follows:
- OS: Windows 10 or later, or Ubuntu 18.04 or later
- RAM: 8GB for 7B models, 16GB for 13B models, 32GB for 30B models, 64GB+ recommended
- CPU: Core 4+, Support AVX2 recommended
- GPU: Optional, if need GPU acceleration，16GB+ VRAM recommended
There are different installation methods available, including one-click installers for Windows, Linux, and macOS, as well as manual installation using Conda. Detailed installation instructions can be found in the Text Generation Web UI repository. Below we will demonstrate step by step how to install it on an A5000 GPU Ubuntu Linux server.
Before you begin this guide, you should have a regular, non-root user with sudo privileges and a basic firewall configured on your server. When you have an account available, log in as your non-root user to begin.
First, let's download the Oobabooga installation code. There are two ways, one is to use git clone Oobabooga project code directly, the other is to download the Oobabooga zip package and then unzip it.
# way 1 - clone the Oobabooga git repo $ git clone https://github.com/oobabooga/text-generation-webui.git # way 2 - download Oobabooga zip package and unzip it $ wget https://github.com/oobabooga/text-generation-webui/archive/refs/heads/main.zip $ unzip main.zip
Enter the text-generation-webui directory (or text-generation-webui-main), and then execute the sudo start_linux.sh script. Note that different scripts need to be selected on different systems.
$ cd text-generation-webui/ $ sudo ./start_linux.sh
Since we are using NVIDIA's RTX A5000 graphics card, choose A here.
Next, we will enter the automatic installation process of Pytorch. All we have to do is wait.
The installation process takes about ten minutes. The output after the installation is completed is as follows:
After the installation is complete, you need to download the Llama 2 models before you can actually use it. The models should be placed in the folder text-generation-webui/models. They are usually downloaded from Hugging Face. Use wget to download a model from Hugging Face.
$ cd text-generation-webui/models/ $ wget https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf?download=true -O ./llama-2-7b-chat.Q4_K_M.gguf
Note: GGUF models are a single file and should be placed directly into models.
It is also possible to download it via the command-line with download-model.py script.
# python download-model.py organization/model $ python3 download-model.py TheBloke/Llama-2-7B-Chat-GGUF
Note: Run python download-model.py --help to see all the options.
The above are two ways to download the model using the command line. We can also download it in the TexGen WebUI. Another way to download, you can use the "Model" tab of the UI to download the model from Hugging Face automatically.
When the prompt content displays "Done", the Model download is completed. Click the refresh button on the right side of the Model selection bar in the picture below, and then click the drop-down arrow. We will find that the model we just downloaded is there.
Select the model llama-2-7b-chat.Q4_K_M.gguf, select llama.cpp in the Model loader, and click Load. You will see the model loading success message:
Successfully loaded llama-2-7b-chat.Q4_K_M.gguf. It seems to be an instruction-following model with template "Llama-v2". In the chat tab, instruct or chat-instruct modes should be used.
Click the Chat tab in the upper left corner to enter the chat page, and then you can ask any questions you want.
This article shows how to install textgen webui Oobabooga to run Llama 2 locally or on a remote server. Oobabooga is a text-generation WebUI with a Chatbot where you can provide input prompts per your requirement. It is a different model that cannot be compared to any other Chatbot. The text-generation WebUI is more economical if you want to generate text using a chatbot model.
GPU Mart provides professional GPU hosting services optimized for high-performance computing projects. Here we recommend some Bare metal GPU server solutions suitable for running LLama 2 online. Choose the appropriate plan according to the model you want to use. For example, Llama 2 7B recommends using an 8GB graphics card, Llama 2 13B uses a 16GB or 24GB graphics card, and Llama 2 70B uses a 48GB and above graphics card. You can start your journey at any time and we will be happy to help you with any difficulties.
Advanced GPU - A4000
Advanced GPU - A5000
Enterprise GPU - RTX A6000
Multi-GPU - 3xRTX A5000
Multi-GPU - 3xRTX A6000
Enterprise GPU - A100