From PDFs to Markdown in Seconds: FastAPI + MarkItDown + .NET (in Docker)

⚠️ This blog post was created with the help of AI tools. Yes, I used a bit of magic from language models to organize my thoughts and automate the boring parts, but the geeky fun and the 🤖 in C# are 100% mine.

Hi!

f you work with LLMs, you want structured text—not mystery meat. Here’s a tiny FastAPI service that wraps Microsoft’s MarkItDown to convert PDFs, Word, PPTs, and more into clean Markdown. We’ll run it in Docker and drive it with a C# console app.

Watch the video here:

Repo & Upstream

MarkItDownServer (this post’s code): https://github.com/elbruno/MarkItDownServer GitHub
Microsoft MarkItDown library: https://github.com/microsoft/markitdown (background & docs) GitHub

Prereqs

Docker Desktop
.NET 8/9 SDK (for the sample client)
Python is baked into the container build via the Dockerfile

Run the Server in Docker

git clone https://github.com/elbruno/MarkItDownServer
cd MarkItDownServer
docker build -t markitdownserver .
docker run -d --name markitdownserver -p 80:80 markitdownserver

These are the same steps described in the project README. GitHub

Try It with the .NET Client

cd src
dotnet run

The sample app posts a file to the server and prints Markdown back (as outlined in the README “Usage”). GitHub

Sample C# (HttpClient sketch)

(Paste a compact version of your src program if you want; otherwise use this snippet for the blog.)

using System.Net.Http;
using System.Net.Http.Headers;

var filePath = args.FirstOrDefault() ?? "sample.pdf";
using var http = new HttpClient { BaseAddress = new Uri("http://localhost") };

using var content = new MultipartFormDataContent();
await using var fs = File.OpenRead(filePath);
content.Add(new StreamContent(fs)
{
    Headers = { ContentType = MediaTypeHeaderValue.Parse("application/octet-stream") }
}, "file", Path.GetFileName(filePath));

var res = await http.PostAsync("/convert", content);   // adjust if your route differs
res.EnsureSuccessStatusCode();
var markdown = await res.Content.ReadAsStringAsync();
Console.WriteLine(markdown);

⚠️ Route note: The README explains the server “receives binary data from a file, converts to Markdown, and returns the Markdown.” If you renamed the route in app.py, update the client path accordingly. GitHub

(Optional) cURL Smoke Test

curl -X POST http://localhost/convert \
  -F "file=@path/to/your.docx" \
  -o output.md

If your endpoint name differs, change /convert to match your FastAPI route.

Why MarkItDown?

MarkItDown preserves structure (headings, lists, tables), producing LLM-friendly text. Great for retrieval pipelines and prompt stuffing with fewer tokens. See the upstream project for supported formats and details. GitHub+1

Where This Fits in Azure

Batch conversions in Azure Container Apps or AKS, fronted by a queue.
Searchable content via Azure AI Search (index Markdown or vectors).
Pipelines with Microsoft.Extensions.AI in .NET apps.

Resources

MarkItDownServer repo (code & README): https://github.com/elbruno/MarkItDownServer GitHub
Microsoft MarkItDown: https://github.com/microsoft/markitdown GitHub
FastAPI intro (nice for show notes): https://code.visualstudio.com/docs/python/tutorial-fastapi

Happy coding!

Greetings

El Bruno

One response to “From PDFs to Markdown in Seconds: FastAPI + MarkItDown + .NET (in Docker)”

Dew Drop – October 21, 2025 (#4523) – Morning Dew by Alvin Ashcraft

Oct 21, 2025 7:06 AM

[…] From PDFs to Markdown in Seconds: FastAPI + MarkItDown + .NET (in Docker) (Bruno Capuano) […]

LikeLike