Implementing JetBrains' Observation Masking: 80% Context Reduction for AI Agents

03 Dec 2025

📚 The QL Journey (Part 6 of 10)

→ Next: Part 7 (Coming soon!)

Implementing JetBrains’ Observation Masking: 80% Context Reduction for AI Agents

Applying cutting-edge research to solve context overflow in LLM-powered coding assistants

The Problem: Context Window Bloat

If you’re building or using AI coding agents, you’ve probably hit this wall: your agent’s context window fills up fast, and eventually you get cryptic errors like:

{'error': 'Unable to trim conversation context!'}

The issue? AI agents “take notes” on every generated output—file contents, test logs, command outputs—iteratively adding information to their context. After a while, this memory log piles up so high that the context window is exceeded.

Sound familiar?

The Research: JetBrains’ December 2025 Study

On December 1, 2025, JetBrains Research published a timely paper:

“Cutting Through the Noise: Smarter Context Management for LLM-Powered Agents”

Read the full blog post^⇗ Paper (arXiv:2508.21433)^⇗

Their experiments compared two context management approaches:

Approach	Cost Reduction	Solve Rate Impact
Observation Masking	52%	+2.6% ✅
LLM Summarization	~50%	baseline
Raw (no management)	0%	baseline

The surprise: Simple observation masking outperformed the more sophisticated LLM summarization approach—while being cheaper to implement (no extra API calls).

What is Observation Masking?

In agent conversations, each turn typically has three parts:

Reasoning — The agent’s thinking and decisions
Action — What tool the agent called
Observation — The tool’s output (file contents, test results, etc.)

The key insight from JetBrains: 90-95% of tokens are observations, not reasoning.

Before Masking

Turn 1 (OLD):
  Reasoning: "I'll read the file to understand the bug..."     ✓ Useful
  Action: [Tool: read_file] auth.py                           ✓ Useful  
  Observation: [500 lines of code...]                         ← 50KB! 😰

Turn 180 (RECENT):
  [Everything preserved]

After Masking

Turn 1 (OLD):
  Reasoning: "I'll read the file to understand the bug..."     ✓ KEEP
  Action: [Tool: read_file] auth.py                           ✓ KEEP
  Observation: [Result omitted - auth.py, 12KB]               ← MASKED 🎭

Turn 178-180 (RECENT):
  [Full content preserved]                                    ✓ KEEP ALL

Why it works:

Agent still sees what it decided and why (reasoning preserved)
Agent still sees what tools were called (actions preserved)
Only the verbose outputs from older turns are masked
Recent turns stay complete for immediate context

Our Implementation

We applied this research to our AI coding assistant to handle conversations with 180+ messages.

The Core Function

def mask_observations(
    messages: list,
    keep_recent: int = 10,           # Full detail for last N turns
    mask_tool_output: bool = True,   # Mask observation content
    mask_threshold_chars: int = 500  # Only mask outputs > this size
) -> list:
    """
    Apply observation masking per JetBrains 2025 research.
    
    Keeps agent reasoning and actions intact while replacing
    verbose tool outputs with placeholders for older turns.
    """

Adaptive Thresholds

One size doesn’t fit all. We added adaptive parameters based on conversation length:

# For very large conversations (100+ turns), be more aggressive
if turn_count > 100:
    keep_recent = 15       # Fewer recent turns kept full
    mask_threshold = 300   # More aggressive masking
else:
    keep_recent = 20
    mask_threshold = 500

This matches JetBrains’ finding that hyperparameter tuning matters—different agent scaffolds need different window sizes.

Results

Scenario	Before	After
180 messages	❌ CONTEXT OVERFLOW	✅ SUCCESS
Context reduction	—	~80%
Quality impact	—	None observed

The 80% reduction comes from:

Tool outputs being 90-95% of tokens
Masking ~85% of turns (keeping recent 15-20 full)

Key Takeaways

Simple beats complex — Observation masking outperformed LLM summarization despite being simpler
Preserve reasoning, mask data — The agent’s decisions are valuable; verbose tool outputs are not
Adaptive parameters matter — Tune the masking window for your specific agent
No extra API calls — Unlike summarization, masking doesn’t require expensive LLM calls (which can add 7%+ to costs)
Check your agent’s behavior — Some agents include retry attempts in history; adjust window size accordingly

Try It Yourself

The approach is straightforward to implement:

Identify observation content in your agent’s message format
Keep recent N turns complete (10-20 depending on your agent)
Replace older observations with placeholders like [Result omitted - filename, X bytes]
Preserve reasoning and action content from all turns
Tune parameters for your specific use case

References

JetBrains Blog: Cutting Through the Noise: Smarter Context Management for LLM-Powered Agents^⇗
Paper: arXiv:2508.21433^⇗

What’s Next

This is Part 6 of The QL Journey series (non-sequential - we skip ahead to the good stuff!):

Part 1: From 25 Aliases to One Command - Where it all started
Parts 2-5: Coming soon - Backup systems, watchers, semantic search
Part 7+: More QL features as they’re built

The observation masking technique saved this project from context death. Without it, QL Chat would have been unusable at 700+ message conversations.

Part 6 of the QL Journey series.

If you liked this post, you can share it with your followers^⇗ and/or follow me on Twitter!

nightowlcoder@home:~$

Archive

About

RSS