⚠️ This blog post was created with the help of AI tools. Yes, I used a bit of magic from language models to organize my thoughts and automate the boring parts, but the geeky fun and the 🤖 in C# are 100% mine.
Hi!
There’s a recurring debate every time we talk about AI agents:
Should I use Python or .NET?
After one too many conversations full of strong opinions and zero data, I decided to stop guessing and start measuring, and hey this all started in a fun conversations with friends, and at the end … it was the perfect excuse to put GitHub Copilot to work on a Saturday morning.
After the 8 min video, and more data:
Why this repo exists
This project started after a casual chat with friends.
The usual arguments came up:
- “Python is faster for AI.”
- “.NET scales better.”
- “It depends.”
All of those statements can be true — depending on the workload.
So instead of debating, Copilot built a MAF-PerformanceComparison, a small and reproducible way to compare Microsoft Agent Framework implementations in Python and .NET, using the same model, same workload, and the same metrics.
What is being measured
Each test run executes the same agent workflow and captures:
- Average time per iteration
- Minimum and maximum iteration time (to see spikes and jitter)
- Memory usage
The tests are executed with a local Ollama setup using the ministral-3 model, and scaled across different iteration counts:
- 100
- 500
- 1,000
- 5,000
- 10,000
This makes it easy to observe how behavior changes as workloads grow.
What you’ll find in the repo
The repo is structured so you can quickly explore or reproduce the results:
tests_results/- One folder per iteration size
- Raw metrics in JSON format
- A side-by-side comparison report
- A short analysis report with insights
You can start small (10 iterations for a quick demo) and scale up without changing the code.
👉 GitHub repository:
https://github.com/elbruno/MAF-PerformanceComparison/
Key takeaways
From these runs, a few patterns emerge:
- Python performs very well for short runs and prototyping, with low latency and smooth behavior.
- .NET tends to perform better for long-running workloads, especially when looking at average latency and memory usage at scale.
- There is no universal winner — context matters.
Your hardware, your model, and your concurrency level will influence the outcome more than language loyalty.
Measure first, then decide
This repo is not about proving one runtime “wins”.
It’s about giving you a simple way to measure performance on your own setup and make informed decisions. And of course, the chance to evolve this to really take performance metrics in both platforms.
And hey, this triggered a personal note for myself: learn more about performance metrics in an scenario like this one. A fun challenge for 2026!
Happy coding!
Greetings
El Bruno
More posts in my blog ElBruno.com.
More info in https://beacons.ai/elbruno

Leave a comment