Ollama is an open source large language modeling service tool that helps users quickly run large models locally. With a simple install command, users can run open source large language models such as qwen locally with a single command. ollama greatly simplifies the process of deploying and managing LLMs in Docker containers, enabling users to quickly run large language models locally!
Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes. Specifically, the "8B" denotes that this model has 8 billion parameters, which are the variables the model uses to make predictions.
Llama3.1 8B balances performance and computational efficiency, making it suitable for a range of applications such as text generation, question answering, language translation, and code generation. Despite having fewer parameters compared to larger models like Llama 3.1 70B, it delivers impressive results in various natural language processing tasks. Additionally, Meta’s smaller models are competitive with closed and open models that have a similar number of parameters.
CPU >= 8 cores
RAM >= 16 GB
VRAM >= 8GB
NVIDIA RTX 3070 or better is recommended for optimal performance.
Ollama is available for MacOS, Ubuntu and Windows (preview).
Ollama Download: https://ollama.com/download/windows
Click Windows on the download page and then click the download button. Installation: Once the download is complete, double-click on the downloaded installer.
Once the download is complete, double-click the downloaded installer and click Install to install it. The installation completes without prompting and we need open a terminal.
Let's open a terminal, this article to CMD as an example, now Ollama has been installed, we need to enter the following command in the terminal to run llama3:8b large model of the language for the test.
Note: The initial run is a bit long, and you need to download several gigabytes of model files locally. Once downloaded, you can interact with the llama 3.1:8b model directly from the terminal.
# Run Llama 3.1:8b ollama run llama3.1
You can directly enter the question you want to ask in the program dialog box.
Use curl to send a query to the running server:
curl -X POST http://localhost:11434/api/generate -d '{ "model": "llama3", "prompt":"Why is the sky blue?", "stream": false }'
For more information on the use of the model, please refer to: https://github.com/ollama/ollama/blob/main/docs/api.md#list-local-models
Professional GPU VPS - A4000
Advanced GPU - A4000
Advanced GPU - A5000
Enterprise GPU - RTX A6000
Enterprise GPU - A40
Multi-GPU - 3xRTX A5000
Multi-GPU - 3xRTX A6000