How AI Code Assistants Leak Your Secrets
Researchers extracted 2,702 hard-coded credentials from GitHub Copilot using crafted prompts. The implications for AI agents go far beyond code completion.
AI code assistants have transformed software development. GitHub Copilot alone has over 1.8 million paying subscribers and generates billions of lines of code annually. But there's a problem hiding in the training data — and it's one that threatens not just individual developers, but the entire agentic ecosystem being built on top of these models.
The models have memorized your secrets.
Researchers from the Chinese University of Hong Kong extracted 2,702 unique hard-coded credentials from GitHub Copilot using just 900 carefully crafted prompts — a success rate that makes this one of the most efficient secret extraction attacks ever documented against a production LLM.
The research: secrets memorized in training data
In a paper published on arXiv (2408.11006), researchers demonstrated that large language models trained on public code repositories memorize and can reproduce real credentials that developers accidentally committed. This isn't a theoretical risk — it's a demonstrated, repeatable attack.
The attack methodology was systematic. The team designed 900 prompts that mimicked realistic coding scenarios — configuration files, database connection strings, API client setup, CI/CD pipeline definitions. When Copilot completed these prompts, it frequently inserted real API keys, database passwords, and OAuth tokens from its training data.
The numbers are staggering:
- 1.2,702 unique hard-coded credentials extracted from Copilot's completions across AWS keys, database connection strings, API tokens, and private keys
- 2.99.4% success rate in jailbreaking Copilot's safety measures designed to prevent credential leakage
- 3.54 real email addresses and 314 physical addresses associated with GitHub users were also extracted in a separate study, demonstrating that PII memorization extends well beyond code
The source of these credentials? GitHub itself. GitGuardian reported that 10 million new secrets were detected on GitHub in 2022 alone. These leaked credentials in public repositories became training data for the next generation of AI coding tools — which then learned to reproduce them on demand.
Not just Copilot: a systemic problem
This isn't a Copilot-specific vulnerability. Any code-generation model trained on public repositories is at risk. Amazon CodeWhisperer was independently found to emit other people's API keys from its training data, as reported by The Register in September 2023. The problem is architectural: if the training data contains secrets, the model will learn to output them.
And the attack surface extends beyond code completion. In May 2025, KrebsOnSecurity reported that an xAI developer accidentally leaked a private API key on GitHub that granted access to over 60 private large language models — including unreleased Grok models built with proprietary data from SpaceX, Tesla, and X (formerly Twitter). The key remained valid and publicly accessible for approximately two months before being revoked. This key, like thousands of others, became part of the public corpus that AI models are trained on.
2. Key stays in git history even after "deletion"
3. Public repo scraped for LLM training data
4. Model memorizes key alongside code pattern
5. Another developer triggers similar pattern
6. Model outputs the original key as a "completion"
Why this matters for AI agents
The code assistant credential leak is alarming on its own. But the real danger lies in what comes next: AI agents that write, execute, and deploy code autonomously.
Consider the emerging workflow: a developer asks an AI agent to build a feature, the agent writes the code, configures the infrastructure, and deploys it. If the agent's code generation model has memorized credentials from its training data, those credentials could end up:
- Embedded in generated code — real API keys hardcoded into configuration files, connection strings, or test fixtures
- Deployed to production — if the agent has deployment access, memorized credentials could end up in live systems
- Mixed with the agent's own credentials — if the agent holds real API keys in its context alongside memorized ones, the blast radius of any leak compounds
And here's the critical insight: agents that hold credentials in their context window are doubly vulnerable. They can leak the credentials they were given and credentials memorized from training data. The two attack vectors compound each other.
The jailbreaking problem
One might hope that safety measures would prevent credential leakage. GitHub has invested significantly in Copilot's content filters. But the Hong Kong University research demonstrated a 99.4% success rate in bypassing these filters.
The researchers used techniques familiar from the broader jailbreaking literature: role-playing prompts, hypothetical scenarios, incremental extraction, and context manipulation. The safety measures, while well-intentioned, cannot fundamentally prevent a model from outputting information it has memorized. The information is baked into the weights.
This has direct parallels to agent security. If an agent's LLM backbone can be jailbroken to reveal memorized secrets, it can certainly be manipulated to reveal secrets explicitly placed in its context window. Content filters are a speed bump, not a wall.
The only real defense: keep secrets out of context
The credential memorization problem teaches us a fundamental lesson: any secret that touches an LLM — whether through training data or runtime context — should be considered compromised. Content filters fail. Safety measures get bypassed. The only reliable defense is to ensure secrets never enter the model's context in the first place.
For training data, this means better secret detection and scrubbing before model training — an imperfect solution since detection can't catch every credential format. But for runtime agents, the solution is architectural: don't give the agent the credential.
- Credentials never enter the agent's context window — they're injected server-side by the proxy, so there's nothing for prompt injection or jailbreaking to extract
- No credentials in training data pipelines — if your agent never sees your API keys, those keys can never end up in future model training data
- Scoped permissions with user approval — agents request access to specific APIs, users approve, and the proxy enforces scope limits
- Instant revocation without credential rotation — revoke an agent's access without affecting any other system using the same underlying credential
A pattern that keeps repeating
The Copilot credential extraction research is not an isolated finding. It fits a pattern we've seen across the AI industry: models absorb sensitive information from their training data, safety measures prove insufficient, and the attack surface grows with every new capability.
As AI agents gain the ability to write code, manage infrastructure, and make API calls on our behalf, the stakes only increase. Every credential memorized in training data, every API key placed in an agent's context, every secret stored in an environment variable — these are all liabilities waiting to be exploited.
The lesson is simple: secrets and LLMs don't mix. Not in training data. Not in context windows. Not ever. The only safe credential is the one the model never sees.
Sources
- "Researchers Extract 2,702 Credentials from GitHub Copilot" — SecurityBoulevard, March 2025
- "xAI Dev Leaks API Key for Private SpaceX, Tesla LLMs" — KrebsOnSecurity, May 2025
- "An Empirical Study on the Memorization of Copilot" — arXiv:2408.11006, 2024
- "Amazon CodeWhisperer Generates Other People's API Keys" — The Register, September 2023
- "10 Million Secrets Detected on GitHub in 2022" — GitGuardian State of Secrets Sprawl Report, 2023