Full Local RAG scenario using #Phi3, #SemanticKernel and TextMemory. Bonus: Test in CodeSpaces

Hi!

Today’s scenario is once again, using Phi-3. You know, a groundbreaking Small Language Model (SLM) that is redefining the capabilities of AI for developers and businesses alike.

In this blog post, we will explore the importance of leveraging Phi-3 for full-scale scenarios and how you can test these scenarios for free using the Ollama C# Playground.

The demo sceanrio below is designed to answer a specific question, “What is Bruno’s favourite super hero?“, using two different approaches.

  • Ask the question directly to the Phi-3 model. The model will answer declining a response, Phi-3 does not speak about Bruno.
  • Ask the question to the Phi-3 model, and add a semantic memory object with fan facts loaded. Now the response will be based on the semantic memory content.

This is the app running:

The Importance of a Full Scenario Using Only Phi-3 and local resources

Phi-3 represents a significant leap in the realm of Small Language Models, offering a unique blend of performance and efficiency. Unlike its larger counterparts, Phi-3 is designed to deliver high-quality results while maintaining a lightweight footprint, making it ideal for a wide range of applications. One of the key advantages of using Phi-3 is its ability to handle full scenarios independently. This means that developers can implement comprehensive solutions without relying on multiple models, thereby simplifying the development process and reducing integration complexities.

Explaining the Code

Let’s jump to the code. The file below is a C# console application that demonstrates the use of a local model hosted in Ollama and semantic memory for search.

Here’s a step-by-step breakdown of the program:

  1. The program starts by defining the question and announcing the two approaches it will use to answer it. The first approach is to ask the question directly to the Phi-3 model, and the second approach is to add facts to a semantic memory and ask the question again.
  2. The program creates a chat completion service using the Kernel.CreateBuilder() method. It adds Chat Cmpletion using a local model, and local text embedding generation to the builder, then builds the kernel.
  3. The program then asks the question directly to the Phi-3 model and prints the response.
  4. The program gets the embeddings generator service and creates a new semantic text memory with a volatile memory store and the embedding generator.
  5. The program adds facts to the memory collection. These facts are about Bruno and Gisela’s favourite super heroes and the last super hero movies they watched.
  6. The program creates a new text memory plugin with the semantic text memory and imports the plugin into the kernel.
  7. The program sets up the prompt execution settings and the kernel arguments, which include the question and the memory collection.
  8. Finally, the program asks the question again, this time using the semantic memory, and prints the response.

The program uses several external libraries, including:

  • Microsoft.Extensions.Configuration and Microsoft.Extensions.DependencyInjection for dependency injection and configuration.
  • Microsoft.KernelMemoryMicrosoft.SemanticKernelMicrosoft.SemanticKernel.ChatCompletionMicrosoft.SemanticKernel.Connectors.OpenAIMicrosoft.SemanticKernel.EmbeddingsMicrosoft.SemanticKernel.Memory, and Microsoft.SemanticKernel.Plugins.Memory for the semantic kernel and memory functionalities.

This program is a great example of how AI can be used to answer questions using both direct model querying and semantic memory.

// Copyright (c) 2024
// Author : Bruno Capuano
// Change Log :
// – Sample console application to use a local model hosted in ollama and semantic memory for search
//
// The MIT License (MIT)
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
#pragma warning disable SKEXP0001
#pragma warning disable SKEXP0003
#pragma warning disable SKEXP0010
#pragma warning disable SKEXP0011
#pragma warning disable SKEXP0050
#pragma warning disable SKEXP0052
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.KernelMemory;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using Microsoft.SemanticKernel.Embeddings;
using Microsoft.SemanticKernel.Memory;
using Microsoft.SemanticKernel.Plugins.Memory;
var question = "What is Bruno's favourite super hero?";
Console.WriteLine($"This program will answer the following question: {question}");
Console.WriteLine("1st approach will be to ask the question directly to the Phi-3 model.");
Console.WriteLine("2nd approach will be to add facts to a semantic memory and ask the question again");
Console.WriteLine("");
// Create a chat completion service
var builder = Kernel.CreateBuilder();
builder.AddOpenAIChatCompletion(
modelId: "phi3",
endpoint: new Uri("http://localhost:11434"),
apiKey: "apikey");
builder.AddLocalTextEmbeddingGeneration();
Kernel kernel = builder.Build();
Console.WriteLine($"Phi-3 response (no memory).");
var response = kernel.InvokePromptStreamingAsync(question);
await foreach (var result in response)
{
Console.Write(result);
}
// separator
Console.WriteLine("");
Console.WriteLine("==============");
Console.WriteLine("");
// get the embeddings generator service
var embeddingGenerator = kernel.Services.GetRequiredService<ITextEmbeddingGenerationService>();
var memory = new SemanticTextMemory(new VolatileMemoryStore(), embeddingGenerator);
// add facts to the collection
const string MemoryCollectionName = "fanFacts";
await memory.SaveInformationAsync(MemoryCollectionName, id: "info1", text: "Gisela's favourite super hero is Batman");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info2", text: "The last super hero movie watched by Gisela was Guardians of the Galaxy Vol 3");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info3", text: "Bruno's favourite super hero is Invincible");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info4", text: "The last super hero movie watched by Bruno was Aquaman II");
await memory.SaveInformationAsync(MemoryCollectionName, id: "info5", text: "Bruno don't like the super hero movie: Eternals");
TextMemoryPlugin memoryPlugin = new(memory);
// Import the text memory plugin into the Kernel.
kernel.ImportPluginFromObject(memoryPlugin);
OpenAIPromptExecutionSettings settings = new()
{
ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions,
};
var prompt = @"
Question: {{$input}}
Answer the question using the memory content: {{Recall}}";
var arguments = new KernelArguments(settings)
{
{ "input", question },
{ "collection", MemoryCollectionName }
};
Console.WriteLine($"Phi-3 response (using semantic memory).");
response = kernel.InvokePromptStreamingAsync(prompt, arguments);
await foreach (var result in response)
{
Console.Write(result);
}
Console.WriteLine($"");

Test This Scenario for Free Using Ollama C# Playground

To help you get started with Phi-3 and experience its capabilities firsthand, we are thrilled to introduce the Ollama C# Playground. This open-source project, available on GitHub, provides a user-friendly environment where you can experiment with Phi-3 without any cost. The Ollama C# Playground is designed to make it easy for developers to test and refine their AI scenarios, offering a range of tools and features that facilitate rapid prototyping and deployment.

Conclusion

By embracing Phi-3, local embeddings and utilizing the Ollama C# Playground, you can unlock new potential in your AI projects, streamline development processes, and deliver innovative solutions that drive tangible results.

And using Semantic Kernel is easy to later switch to Azure OpenAI Services to scale at Enterprise level!

Happy coding!

Greetings

El Bruno

More posts in my blog ElBruno.com.

More info in https://beacons.ai/elbruno


1 comment

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.