#SemanticKernel – 📎Chat Service demo running Llama2 LLM locally in Ubuntu

Hi!

Today’s post is a demo on how to interact with a local LLM using Semantic Kernel. In my previous post, I wrote about how to use LM Studio to host a local server. Today we will use ollama in Ubuntu to host the LLM.

Ollama

Ollama is an open-source language model platform designed for local interaction with large language models (LLMs). It provides developers with a convenient way to run LLMs on their own machines, allowing experimentation, fine-tuning, and customization. With Ollama, you can create and execute scripts directly, without relying on external tools. Notable features include Python and JavaScript libraries, integration of vision models, session management, and improved CPU support. Whether you’re a researcher, developer, or enthusiast, Ollama empowers you to explore and harness the capabilities of language models locally.

Run a local inference LLM server using Ollama

In their latest post, the Ollama team describes how to download and run locally a Llama2 model in a docker container, now also supporting the OpenAI API schema for chat calls (see OpenAI Compatibility).

They also describe the necessary steps to run this in a linux distribution. So, I got back to life on my Ubuntu using Windows Subsystem for Linux.

And if you want to know more, here are my Ubuntu specs:

neofetch with a display of the main specs of the Ubuntu distro running in WSL.

Now time to install ollama, run the server, and start a live journal track in a separate window using the following commands:

# install ollama
curl -fsSL https://ollama.com/install.sh | sh

# run ollama
ollama run llama2

/# show journal / logs in live model
journalctl -u ollama -f

The ollama server is up and running, hosting a llama2 model in the endpoint: http://localhost:11434/v1/chat/completions

Llama 2

In my previous post, I used Phi-2 as the LLM to test with Semantic Kernel. Ollama allows us to use a different set of models, this time I decided to test Llama 2.

Llama 2 is a family of transformer-based autoregressive causal language models. These models take a sequence of words as input and recursively predict—the next word(s).

Here are some key points about Llama 2:

Open Source: Llama 2 is Meta’s open-source large language model (LLM). Unlike some other language models, it is freely available for both research and commercial purposes.
Parameters and Features: Llama 2 comes in many sizes, with 7 billion to 70 billion parameters. It is designed to empower developers and researchers by providing access to state-of-the-art language models.
Applications: Llama 2 can be used for a wide range of applications, including text generation, inference, and fine-tuning. Its versatility makes it valuable for natural language understanding and creative tasks.
Global Support: Llama 2 has garnered support from companies, cloud providers, and researchers worldwide. These supporters appreciate its open approach and the potential it holds for advancing AI innovation.

Source: Conversation with Microsoft Copilot:

Llama. https://llama.meta.com/
Llama 2 is here – get it on Hugging Face. https://huggingface.co/blog/llama2
Download Llama. https://ai.meta.com/resources/models-and-libraries/llama-downloads/

📎 Semantic Kernel and Custom LLMs

If you want to learn more about Semantic Kernel, check the official repository here: https://aka.ms/ebsk

The whole sample can be found in: https://aka.ms/repo-skcustomllm01

In this new iteration, I added a few changes:

Create a shared class library “sk-customllm”. This class implements the Chat Completion Service from Semantic Kernel.
Added a few more fields to the models to work with the OpenAI API specification.

The new solution looks like this one:

This is the sample code of the main program. As you can see, it’s quite simple and runs in an uncomplicated way.

	// Copyright (c) 2024
	// Author : Bruno Capuano
	// Change Log :
	// – Sample console application to use llama2 LLM running locally in Ubuntu with Semantic Kernel
	//
	// The MIT License (MIT)
	//
	// Permission is hereby granted, free of charge, to any person obtaining a copy
	// of this software and associated documentation files (the "Software"), to deal
	// in the Software without restriction, including without limitation the rights
	// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
	// copies of the Software, and to permit persons to whom the Software is
	// furnished to do so, subject to the following conditions:
	//
	// The above copyright notice and this permission notice shall be included in
	// all copies or substantial portions of the Software.
	//
	// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
	// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
	// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
	// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
	// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
	// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
	// THE SOFTWARE.

	using Microsoft.Extensions.DependencyInjection;
	using Microsoft.SemanticKernel;
	using Microsoft.SemanticKernel.ChatCompletion;
	using sk_customllm;

	// llama2 in Ubuntu local in WSL
	var ollamaChat = new CustomChatCompletionService();
	ollamaChat.ModelUrl = "http://localhost:11434/v1/chat/completions";
	ollamaChat.ModelName = "llama2";

	// semantic kernel builder
	var builder = Kernel.CreateBuilder();
	builder.Services.AddKeyedSingleton<IChatCompletionService>("ollamaChat", ollamaChat);
	var kernel = builder.Build();

	// init chat
	var chat = kernel.GetRequiredService<IChatCompletionService>();
	var history = new ChatHistory();
	history.AddSystemMessage("You are a useful assistant that replies using a funny style and emojis. Your name is Goku.");
	history.AddUserMessage("hi, who are you?");

	// print response
	var result = await chat.GetChatMessageContentsAsync(history);
	Console.WriteLine(result[^1].Content);

view raw llama2semantickernellocalserver.cs hosted with ❤ by GitHub

In the next posts I’ll create a more complex system using local LLMs and Azure OpenAI services to host agents with Semantic Kernel.

Happy coding!

Greetings

El Bruno

#SemanticKernel – 📎Chat Service demo running Llama2 LLM locally in Ubuntu

Ollama

Run a local inference LLM server using Ollama

Llama 2

Source: Conversation with Microsoft Copilot:

📎 Semantic Kernel and Custom LLMs

2 comments

Leave a comment Cancel reply

Ollama

Run a local inference LLM server using Ollama

Llama 2

Source: Conversation with Microsoft Copilot:

📎 Semantic Kernel and Custom LLMs

Share this:

Related

2 comments

Leave a comment Cancel reply