Sorry, you have been rate-limited.

If you've been coding with GitHub Copilot in VS Code you know what I'm talking about.

"Please wait a moment before trying again."

"Server Error: Sorry, you have exceeded your Copilot token usage."

... This comes out of nowhere and then boom. Your whole session just stops. It's almost as if it's a sign from someone watching over you that it's time to go to bed because you've been coding for 2 days straight.

Here's the good news though. With my fix, I went from being rate limited constantly to running 2 Claude Opus 4.5 terminals together for hours on end doing scoped and heavily planned work. I'm talking creating and running tests overnight while I enjoy my beauty sleep. If the rate limit was a sign, then this fix was a gift.

Before You Start: The One Thing Everyone Forgets

Your project needs to be in a Git repository AND pushed to GitHub. The instructions file we're about to create won't work if your code is just sitting locally on your machine. Copilot reads these instructions from your GitHub repo.

So if you haven't already:

Initialize git in your project (git init)
Create a repo on GitHub
Push your code up

No GitHub repo = no custom instructions = no fix. Don't skip this part and then wonder why nothing works. I'm speaking from experience here.

Setting Up Instructions in VS Code

Before I show you my fix, you need to know how to set up custom instructions for Copilot in VS Code. This is the foundation of the whole solution.

Step 1: Create the Instructions Folder

Open your project folder in VS Code
In the root of your project, create a new folder called .github
Inside .github, create another folder called instructions

Your folder structure should look like this:

your-project/
├── .github/
│  └── instructions/
│    └── (your instructions files go here)
└── (your other files)

Step 2: Create Your Instructions File

Inside the instructions folder, create a new file. You can name it whatever you want, but make it descriptive. I called mine rate-limit-prevention.instructions.md
The file MUST end with .instructions.md for Copilot to recognize it

Step 3: Add the Front Matter

At the very top of your instructions file, you need to add something called "front matter" - this tells Copilot when to apply these instructions:

---
applyTo: '*'
---

The applyTo: '' means "apply these instructions to everything." You could also set it to specific file types like applyTo: '.js' for just JavaScript files.

Step 4: Push to GitHub

This is crucial. After creating your instructions file:

Commit your changes (git add . then git commit -m "Add Copilot instructions")
Push to GitHub (git push)

Copilot needs to see these instructions in your actual GitHub repository. Local-only files won't cut it.

Step 5: Verify It's Working

Open Copilot Chat in VS Code (Ctrl+Shift+I or Cmd+Shift+I on Mac)
Start a new conversation
Copilot should now follow your custom instructions automatically

My Rate Limit Prevention Instructions

Now here's the actual instructions file I use. Create your file in .github/instructions/rate-limit-prevention. instructions.md and paste this in:

---
applyTo: '*'
---

# Rate Limit Prevention (CRITICAL)

**You cannot recover from rate limits - the chat session will stop completely. Prevention is essential. Do NOT ask for confirmation - work autonomously but cautiously.**

## Rules

- **Be concise**: Keep responses short and to the point. Avoid verbose explanations.
- **Don't repeat back**: Don&#39;t echo the user&#39;s request or repeat code they&#39;ve already shared.
- **Plan first**: Before making ANY tool calls, create a complete mental plan. Identify the minimum number of calls needed.
- **Absolute minimum tool calls**: Only make tool calls that are strictly necessary.
- **Batch aggressively**: Combine multiple operations into single requests wherever possible.
- **Reuse information**: Never re-fetch data you already have from earlier in the conversation.
- **No exploratory calls**: No speculative searches or "just checking" requests.
- **Checkpoint silently**: After significant work, briefly note what&#39;s done so progress isn't lost if the session breaks.
- **If uncertain**: Make a reasonable decision and proceed - do not ask me to confirm.
- **Max 2 tool calls per response**: Never make more than 2 tool calls in a single response. Complete the response, then continue in the next one.

## Output Length Limits

**Responses that are too long will be cut off and fail. Stay within limits.**

- **Max ~300 lines per response**: If output would exceed this, split across multiple responses.
- **One file at a time**: When generating or editing multiple files, output one file per response, then continue.
- **Truncate examples**: Show only essential code snippets, not full files unless requested.
- **Use summaries**: For large changes, summarize what was done instead of showing everything.
- **Continue automatically**: If you need to split output, end with `... continuing` and proceed in the next response without asking.

## Visibility

When you take any rate-limit prevention action, output a short status line so I can monitor:

- `📦 BATCH: [what was combined]` - when combining multiple operations
- `♻️ REUSE: [what data]` - when reusing data instead of re-fetching
- `⏭️ SKIP: [what was skipped]` - when skipping an unnecessary call
- `💾 CHECKPOINT: [progress summary]` - when saving progress state
- `✂️ SPLIT: [what's being split]` - when breaking output into multiple responses

Why This Actually Works

Here's the thing about rate limits: they happen because you're using too many "tokens" too quickly. Every time Copilot reads your message, thinks about it, searches through files, and writes a response, it uses tokens. Use too many too fast, and boom, you're locked out.

What does this even mean?

Think of tokens like tickets at an arcade. You only get a certain number of tickets per hour. Every time Copilot does something, like reading a file, writing code, or even just explaining something, it costs tickets. If Copilot writes you a super long essay with tons of code examples, that's like spending 50 tickets at once. Do that a few times and you've used up all your tickets for the hour. Game over, you have to wait.

So what my instructions do is tell Copilot: "Hey, be smart about spending tickets."

Breaking Down Each Rule

"Be concise" - Instead of Copilot writing a 200-line explanation, it gives you the short version. Fewer tokens used.

"Don't repeat back" - Normally Copilot might say "So you want me to create a function that does X, Y, and Z..." and then repeat your whole question back. That's wasted tokens. This rule stops that.

"Plan first" - Instead of Copilot making 10 separate file searches to find what it needs, it thinks first and figures out it only needs 2 searches. Way fewer tokens.

"Max 2 tool calls per response" - Tool calls are expensive (token-wise). This keeps each response cheap.

"Max ~300 lines per response" - Long outputs eat tokens fast. By splitting things up, you spread the cost over time and avoid hitting the limit all at once.

The visibility emojis - These let you actually see when Copilot is saving tokens. It's pretty satisfying to see 📦 BATCH show up and know you just avoided a rate limit.

The Results

Since I started using these instructions, I've gone from hitting rate limits multiple times a day to almost never seeing them. The key is that Copilot is still just as helpful, it's just way more efficient about how it helps.

Give it a try. Drop that file into your project, push it to GitHub, start a new Copilot chat, and watch the difference. Your coding sessions will thank you.

Got questions or improvements? Let me know what works for you.

How I Fixed My Sleeping Schedule and the Copilot Rate Limit Error in VS Code