Installing and Running Local LLMs Through Python in 5 Minutes

First Step in Building Multi-Agent Systems:

Jan 17, 2025

In my previous blog post, I explored options for creating networks of Large Language Models (LLMs) that can communicate and collaborate to complete tasks. These networks, referred to as multi-agent systems (MAS). The first step in building such systems is to install local LLMs and get them running through Python.

Here, I’ll walk you through how I installed Ollama, Meta’s open-source LLM framework, and got it up and running with Llama 2 on my Windows 10 gaming PC with a pretty powerful GPU. This setup will provide a solid foundation for experimenting with multi-agent systems.

Installing Local LLMs

To begin, I decided to use Ollama’s Llama 2 model. While Llama 3.2 is currently their most powerful LLM, I opted for Llama 2 for faster performance during testing. Here's how I set everything up:

Step 1: Install Ollama

Download and Install Ollama for Windows
- Head over to Ollama’s GitHub releases and download the installer for Windows.
- Follow the installation steps provided.
Install Ollama in Python
- Install Anaconda on your machine if you dont have it already. If you already have a way to run python on your machine then skip this step
- Open your Anaconda terminal and run:

pip install ollama

Step 2: Pull a Model

Once Ollama is installed, download the Llama 2 model using the Ollama CLI:

ollama pull llama2

This command fetches the model and makes it ready for use on your local machine.

Step 3: Install Jupyter Notebook

For easier experimentation, I set up Jupyter Notebook:

pip install jupyter notebook

To start the notebook, run:

jupyter notebook

This opens a browser interface for writing and testing Python scripts interactively.

Step 4: Chat with the LLM

Here’s a simple example of how I started chatting with the Llama 2 model, I ran the following in a notebook cell

import ollama

# Chat with the LLM using the Ollama server
response = ollama.chat(
    model='llama2',
    messages=[
        {'role': 'user', 'content': 'What would you do if you write and run Python scripts?'}
    ]
)

# Print the response
print(response['message']['content'])

This script connects to the Ollama server, sends a prompt to the Llama 2 model, and prints the response. Congrats! You have a local LLM running in python!

Verifying GPU Usage

I was Curious whether my GPU was being utilized, I checked the Task Manager:

Navigate to Performance > GPU.
Look for activity under the Compute (CUDA) section.

Sure enough, the GPU was active! Below is a screenshot showing the activity, you can see the spikes in activity in the top plot that corresponds to GPU activity

Next Steps

With the local LLM running smoothly, the next step is to set up multi-agent systems using LangChain. Stay tuned!

The AI Emergence

Discussion about this post