GitHub Copilot and tokens: how to keep using AI without burning your budget in three prompts (some personal lessons learned!)

⚠️ This blog post was created with the help of AI tools. Yes, I used a bit of magic from language models to organize my thoughts and automate the boring parts, but the geeky fun and the 🤖 in C# are 100% mine.

Hi!

For a long time, many of us used GitHub Copilot as if it were unlimited magic: autocomplete, chat, agent mode, code review, increasingly powerful models, massive context, and long-running sessions that sometimes felt like a pair-programming marathon.

And it worked. Well, mostly.

Now, with usage-based billing and AI Credits, many developers are seeing something that used to be mostly invisible: every AI interaction has a cost. And that cost is not only about “asking a question.” It depends on the model, the context, input tokens, output tokens, cached tokens, tools, files, logs, MCP servers, and how long we let an agent keep working.

GitHub explains this in the Copilot billing documentation: interactions consume input, output, and cached tokens; each model has its own pricing; and the total is converted into AI Credits. The same documentation also explains an important detail: code completions and Next Edit Suggestions are not charged as AI Credits in paid plans.

Source: https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing

The natural reaction is to panic.

The useful reaction is to optimize.

Just like we optimize compute, storage, bandwidth, or GitHub Actions minutes, now we also need to optimize how we use tokens.

And yes, this applies to those of us using Copilot every day for .NET, AI, Azure, scripts, demos, documentation, refactors, and those beautiful moments when we tell the agent “just fix this test” and come back 20 minutes later to find a doctoral thesis in progress.

The real problem is not the prompt. It is the context

When we talk about tokens, we often think only about the text we type.

But in AI-assisted development tools, the expensive part is often everything that travels around the prompt:

chat history
open files
files attached as context
workspace search results
diffs
terminal output
build errors
long logs
tool calls
MCP server responses
custom instructions
agent memory
content the model decides to inspect while completing the task

A one-line question can be cheap.

A one-line question inside a conversation with 80 messages, 12 files, 3 logs, 5 tools, and an MCP server connected to half the universe… not so much.

The first optimization is mental: more context does not always mean a better answer.

Sometimes more context only means more tokens, more noise, and more chances for the model to get distracted.

1. Use autocomplete and Next Edit Suggestions before opening chat

Not everything needs a conversation.

For small tasks, Copilot directly in the editor is often the most efficient option:

completing a line
finishing a simple function
generating boilerplate
suggesting the next obvious change
completing a repeated pattern
adjusting names
writing a simple condition
generating a property, DTO, or mapping

If you can solve it with Tab, do not open a chat.

This is not just convenience. It is strategy. According to GitHub documentation, code completions and Next Edit Suggestions are not billed as AI Credits in paid plans.

Source: https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing

Simple rule:

Use autocomplete for micro-tasks.
Use inline edit for local changes.
Use chat for questions that require reasoning.
Use agent mode for well-scoped multi-file tasks.
Use cloud agents when you really want to delegate a workflow, not when you only need to change three lines.

The most expensive model in the world should not be helping you write public string Name { get; set; }.

That is what Tab is for. And coffee.

2. Choose the right model for the task

Not every model has the same cost or the same purpose.

The VS Code documentation recommends using lighter models for quick edits, boilerplate, and direct questions, and reserving reasoning models for complex refactors, architecture decisions, and multi-step debugging.

Source: https://code.visualstudio.com/docs/copilot/guides/optimize-usage

A practical pattern:

Task type	Recommended model
Simple question	Lightweight model or Auto
Boilerplate	Lightweight model
Code explanation	Lightweight or medium model
Simple tests	Lightweight or medium model
Complex debugging	Reasoning model
Architecture	Reasoning or frontier model
Large refactor	Powerful model, but with limited scope
Initial documentation	Lightweight or local model

GitHub also documents Auto Model Selection, which can choose a model based on task complexity, availability, and policies. The documentation also notes that Auto can improve efficiency by reserving more expensive models for tasks that actually need them.

Source: https://docs.github.com/en/copilot/concepts/auto-model-selection

My recommendation for most developers:

use Auto as the default
manually switch to a more powerful model only when you have a clear reason
switch back to Auto or a cheaper model when the complex task is done

Do not drive a truck to buy bread.

And do not use the most expensive model to ask how to center a div. Although, to be fair, sometimes centering a div does deserve an architecture review.

3. Start new chats when you change tasks

This is one of the simplest and most ignored optimizations.

The VS Code documentation is clear: when a conversation grows, it accumulates context from previous messages, tool outputs, and file contents. If you switch to an unrelated task inside the same session, the model still processes irrelevant history.

Source: https://code.visualstudio.com/docs/copilot/guides/optimize-usage

Bad pattern:

			
Chat 1:
- Debug tests
- Then architecture
- Then generate README
- Then review Dockerfile
- Then explain an Azure error
- Then ask for a tweet

		

Better pattern:

			
Chat 1: Debug tests
Chat 2: Architecture design
Chat 3: README
Chat 4: Dockerfile
Chat 5: Azure deployment issue

		

New task, new chat.

Yes, it sounds simple.

Yes, it works.

And yes, it also helps your human brain, which sometimes has a smaller context window than the model.

4. Use `/compact` and `/fork` when it makes sense

When a conversation has useful context but starts getting too large, you do not always need to throw it away.

You can compact it or fork it.

VS Code documents new sessions, forking, and compaction as ways to manage context and reduce unnecessary tokens.

Source: https://code.visualstudio.com/docs/copilot/guides/optimize-usage

Good practices:

use /compact when the conversation has useful information but too much history
use /fork when you want to explore an alternative without polluting the main conversation
start a new chat if the new task is unrelated
summarize the current state before continuing a long task

Useful prompt:

			
Summarize the current state, decisions made, files changed, and next steps. Keep it short and actionable.

Then copy that summary into a new conversation and continue with clean context.

Less noise. Fewer tokens. Better focus.

5. Do not ask it to analyze the whole repo if you only need three files

This is the classic one.

Expensive prompt:

Analyze this entire repository and tell me what is wrong.

Better prompt:

			
Analyze only these files:
- src/MyApp.Api/Program.cs
- src/MyApp.Core/Services/OrderService.cs
- tests/MyApp.Tests/OrderServiceTests.cs
Goal: find why this test is failing.
Do not edit files yet. First explain the likely cause.

		

The difference is huge.

The first prompt invites the model to explore, read, search, open files, infer architecture, and consume context.

The second prompt defines:

files
goal
limits
working mode
expected output

In AI coding, scope is part of the prompt.

Small scope, better result.

Infinite scope, surprise in the bill.

6. Separate planning, implementation, and validation

One of the most common mistakes with agent mode is asking for everything at once:

			
Analyze the issue, design the fix, implement it, run tests, fix errors, update docs, and create a summary.

That sounds productive.

But it can also trigger loops, tool calls, unnecessary changes, and high token consumption.

A better approach is to use phases.

Phase 1: plan

			
Create a short implementation plan. Do not modify files yet.
Focus only on the failing test and the minimal code path required to fix it.

Phase 2: scoped implementation

Implement step 1 only. Modify only the files listed in the plan.

Phase 3: validation

			
Run the relevant tests only. If they fail, explain the failure before changing code again.

Phase 4: cleanup

Now clean up the implementation without changing behavior. Keep the diff small.

The VS Code documentation also recommends planning before implementation to reduce rework and back-and-forth.

Source: https://code.visualstudio.com/docs/copilot/guides/optimize-usage

Sometimes the best prompt is not “do everything.”

It is “think first, touch little, validate quickly.”

7. Be careful with logs: do not paste a novel if you only need the error

Logs are one of the silent token killers.

Typical example:

Here is my build log

[paste 2,000 lines]

And the real error was in the last 20 lines.

Better:

			
Here are the last 40 lines of the failing build log. Focus on the first real error, not the cascading errors.

Or even:

			
This is the error:
CS0246: The type or namespace name 'X' could not be found.
Relevant files:
- Program.cs
- MyService.cs
What is the likely fix?

		

Good practices:

paste only the first relevant error
avoid full logs if errors are repeated
remove timestamps if they do not add value
remove duplicated stack traces
summarize what you already tried
include exact commands, not the whole terminal history

Copilot can help a lot with logs.

But it does not need to read your CI/CD diary.

8. Review your custom instructions

Custom instructions are fantastic.

They can also become a token backpack if nobody maintains them.

A good .github/copilot-instructions.md file is:

short
specific
current
based on real repository rules
clear about build/test commands
clear about important conventions

A bad one is:

too long
duplicated
contradictory
based on old architecture
full of rules nobody follows
full of generic instructions that apply to every repo on the planet

Example of useful instructions:

			
# Copilot instructions
- Use C# 13 and .NET 10 conventions.
- Keep changes minimal and focused.
- Do not introduce new dependencies without explaining why.
- Run `dotnet build -c Release` after code changes.
- Run relevant tests only unless asked for the full suite.
- Prefer Aspire service defaults when adding services.
- Do not modify generated files.

		

You do not need to write a national constitution for Copilot to understand your repo.

You need short rules that reduce repeated decisions.

9. Review the MCP servers and tools you have enabled

This is one of the areas where many developers may be consuming context without realizing it.

MCP is powerful because it lets agents connect to tools, resources, prompts, and external systems. But every server and every available tool can also affect context, tool selection, and the way the agent works.

The VS Code documentation explains that MCP servers can expose tools, resources, prompts, and apps. It also allows developers to enable, disable, install, configure, and manage MCP servers from VS Code.

Source: https://code.visualstudio.com/docs/copilot/customization/mcp-servers

Practical recommendations:

do not keep every MCP server enabled all the time
enable only what you need for the current workspace
disable experimental MCP servers when you are not using them
review duplicated or overly generic tools
review tool descriptions: if they are too long or confusing, they may hurt tool selection
avoid MCP servers that return huge responses by default
limit resources that add too much context
check whether a server is bringing more information than needed
review MCP logs when something behaves strangely

Example:

If you are working on a local .NET API, maybe you do not need all of these enabled at the same time:

browser automation
extended filesystem
cloud docs search
GitHub
Jira
Slack
database explorer
Kubernetes
Playwright
internal wiki

Every extra tool can be useful.

But it can also expand the agent’s decision space.

And when an agent has too many tools, sometimes the problem is not lack of capability. It is too many temptations.

My personal rule:

MCP servers should be workspace-specific, not personality traits.

Enable what you need. Turn off what you do not.

10. Use traditional tools for traditional work

Not everything needs AI.

For many tasks, traditional tools are better, faster, and cheaper:

formatter for formatting
linter for style
compiler for type errors
tests for validation
static analyzers for known rules
dependency scanners for known vulnerabilities
scripts for repeatable tasks

Copilot is excellent for reasoning, explaining, proposing, connecting ideas, and accelerating implementation.

But if you use a frontier model to discover a missing using, something went sideways.

Good pattern:

			
Run the build. Give Copilot only the first relevant compiler error. Ask for the minimal fix.

Bad pattern:

Ask Copilot to inspect the entire repository and find why the build might fail.

First, let deterministic tools do their job.

Then use AI where it adds value.

11. Local models: they do not replace Copilot, but they can complement it very well

We are going to see more PCs with interesting local AI capabilities: stronger GPUs, NPUs, compact workstations, and machines designed to run models locally. NVIDIA, for example, positions RTX Spark as compact PCs and laptops with NVIDIA AI and RTX graphics capabilities.

Source: https://www.nvidia.com/en-us/products/rtx-spark/

This raises an interesting question:

Does everything need to go to the cloud?

Not necessarily.

There are tasks where a local model may be enough:

summarizing logs
generating documentation drafts
explaining small code snippets
creating scaffolding
generating initial tests
transforming text
preparing prompts
analyzing snippets
creating session summaries
reviewing basic style

And there are tasks where cloud/frontier models still make a lot of sense:

complex multi-file debugging
deep reasoning
large migrations
high-risk refactors
architecture
long agentic workflows
direct integration with GitHub, PRs, issues, and CI

The idea is not “local vs cloud.”

The idea is local for simple work, cloud for work that really needs cloud.

This post is not about advanced BYOK, routing, or gateway architectures. That deserves its own post.

But as a baseline idea: if you can solve repetitive tasks with local models, you can reserve Copilot and powerful models for the tasks where they really shine.

12. Code review: use it where it adds the most value

Copilot code review can be very useful, but it also has a cost.

GitHub documentation explains that Copilot code review is billed in two ways: token consumption through AI Credits and GitHub Actions minutes for the agentic review infrastructure.

Source: https://docs.github.com/en/copilot/reference/copilot-billing/models-and-pricing

Recommendations:

do not enable high-effort automatic review for every PR without measuring it
use linters and analyzers for mechanical rules
reserve Copilot review for meaningful PRs
define when to use standard vs higher effort
avoid full AI review if you only changed documentation
review usage by repository/team if you are in an organization

Copilot review can be excellent.

But you do not need an AI reviewer philosophically inspecting a three-line README change.

13. Define usage profiles for your team

In enterprise teams, the worst-case scenario is letting everyone use any model, in any mode, for any task, with no guidance.

You do not need to block everything.

You need to educate people and provide clear profiles.

Daily coding profile

Auto Model Selection
autocomplete and Next Edit Suggestions first
short chat sessions
limited context
lightweight models for simple questions

Debugging profile

minimal logs
first relevant error
specific files
plan before changes
relevant tests, not always the full suite

Refactor profile

plan first
scope by folder or feature
changes in phases
more powerful model only during the hard part
frequent validation

Documentation profile

lightweight or local model
specific files
limited output
no agent mode unless needed

Agent mode profile

clear issue
clear stop condition
defined scope
defined validation commands
do not let it run unsupervised if the goal is ambiguous

This is not about saying “do not use Copilot.”

It is about saying “use the right mode for the right job.”

14. Quick checklist for developers

Before sending the next prompt, ask yourself:

Can I solve this with autocomplete?
Do I need chat, or is inline edit enough?
Do I need agent mode, or just an explanation?
Am I using Auto, or am I using a model that is too expensive for the task?
Does this conversation already have too much history?
Should I start a new chat?
Can I pass only two or three files?
Can I paste only the relevant error?
Can I ask for a plan first?
Do I have MCP servers enabled that I do not need?
Are my custom instructions too long?
Can a linter/test/build answer this before AI?
Could this task go to a local model?

If the answer is “yes” to several of these, you can probably save tokens without losing productivity.

15. Checklist for teams

For organizations, I would start here:

review usage by user, team, and repository
understand which models are used the most
review how much usage comes from agent mode
review Copilot code review usage
define internal model selection guidelines
promote Auto as the default
teach small, scoped prompts
clean up custom instructions by repository
review recommended and allowed MCP servers
create usage profiles by task type
measure before and after changes
do not block AI out of fear
govern it like any other cloud resource

Because this looks a lot like cloud cost optimization.

First, everyone celebrated how easy it was to create resources.

Then the bill arrived.

Then we learned FinOps.

Now we need something similar for AI-assisted development.

Conclusion

The new GitHub Copilot usage model does not mean we need to stop using AI to code.

It means we can no longer treat every interaction as free, infinite, and invisible.

The good news is that many optimizations are simple:

use autocomplete before chat
choose the model intentionally
start new chats
reduce context
limit logs
separate planning and implementation
review MCP servers
clean up custom instructions
use traditional tools when they fit
reserve powerful models for powerful problems
consider local models for simple tasks

The goal is not to use less Copilot.

The goal is to use fewer tokens and get better results.

Happy coding!

Greetings

El Bruno