Installing and Running Local LLMs Through Python in 5 Minutes
First Step in Building Multi-Agent Systems:
In my previous blog post, I explored options for creating networks of Large Language Models (LLMs) that can communicate and collaborate to complete tasks. These networks, referred to as multi-agent systems (MAS). The first step in building such systems is to install local LLMs and get them running through Python.
Here, I’ll walk you through how I installed Ollama, Meta’s open-source LLM framework, and got it up and running with Llama 2 on my Windows 10 gaming PC with a pretty powerful GPU. This setup will provide a solid foundation for experimenting with multi-agent systems.
Installing Local LLMs
To begin, I decided to use Ollama’s Llama 2 model. While Llama 3.2 is currently their most powerful LLM, I opted for Llama 2 for faster performance during testing. Here's how I set everything up:
Step 1: Install Ollama
Download and Install Ollama for Windows
Head over to Ollama’s GitHub releases and download the installer for Windows.
Follow the installation steps provided.
Install Ollama in Python
Install Anaconda on your machine if you dont have it already. If you already have a way to run python on your machine then skip this step
Open your Anaconda terminal and run:
pip install ollama
Step 2: Pull a Model
Once Ollama is installed, download the Llama 2 model using the Ollama CLI:
ollama pull llama2
This command fetches the model and makes it ready for use on your local machine.
Step 3: Install Jupyter Notebook
For easier experimentation, I set up Jupyter Notebook:
pip install jupyter notebook
To start the notebook, run:
jupyter notebook
This opens a browser interface for writing and testing Python scripts interactively.
Step 4: Chat with the LLM
Here’s a simple example of how I started chatting with the Llama 2 model, I ran the following in a notebook cell
import ollama
# Chat with the LLM using the Ollama server
response = ollama.chat(
model='llama2',
messages=[
{'role': 'user', 'content': 'What would you do if you write and run Python scripts?'}
]
)
# Print the response
print(response['message']['content'])
This script connects to the Ollama server, sends a prompt to the Llama 2 model, and prints the response. Congrats! You have a local LLM running in python!
Verifying GPU Usage
I was Curious whether my GPU was being utilized, I checked the Task Manager:
Navigate to Performance > GPU.
Look for activity under the Compute (CUDA) section.
Sure enough, the GPU was active! Below is a screenshot showing the activity, you can see the spikes in activity in the top plot that corresponds to GPU activity
Next Steps
With the local LLM running smoothly, the next step is to set up multi-agent systems using LangChain. Stay tuned!