Keep the
signal.

Drop the noise.

Open-source middleware that compresses RAG context before it hits the LLM - cutting token costs by ~50% with <3% accuracy loss.

V0.1.1 · LLMLINGUA-2 · MIT

RAW CONTEXT30/30 TOKENS · 0%

TheretrievalaugmentedgenerationpipelineessentiallyworksbyfetchingrelevantdocumentchunksfromalargevectordatabaseandthenbasicallyfeedingthemdirectlyintotheLLMcontextwindowforprocessing

SIGNAL: 15 TOKENS PRESERVED · NOISE: 15 TOKENS REMOVED

~50%

TOKEN REDUCTION

<3pt

F1 SCORE DROP

85ms

AVG LATENCY

CODE CHANGES W/ OPENAI SDK

Try It Live[API]

ContextPASTE YOUR TEXT

Compression Mode

Compression Ratio0.5

How It Works[3 STEPS]

RETRIEVE

Your vector DB returns raw document chunks. Verbose, overlapping, and expensive.

COMPRESS

Winnow runs token-level compression guided by your query. Relevant tokens survive. Filler is removed.

GENERATE

Your LLM receives a ~50% shorter prompt. Same answer. Half the cost.

Benchmark ResultsSQUAD · LLMLINGUA-2

420

AVG TOKENS (IN)

210

AVG TOKENS (OUT)

~50%

REDUCTION

~85ms

LATENCY

PRESET	TOKENS IN	TOKENS OUT	REDUCTION	F1 SCORE	F1 DROP
AGGRESSIVE (0.3)	420	147	~65%	73.4	5.0 PT
BALANCED (0.5)	420	210	~50%	76.1	2.3 PT
LIGHT (0.7)	420	294	~30%	77.6	<1 PT

WINNOW V0.1.1 · LLMLINGUA-2FULL RESULTS →

IntegrationDROP-IN

Add Winnow
in minutes

Python SDK, LangChain integration, raw HTTP, or OpenAI-compatible proxy - just swap your base URL. Zero config required.

from winnow import Winnow

client = Winnow(base_url="http://localhost:8000")

# Sentence-only compression
result = client.compress(text=input_text, compression_ratio=0.5)

# Question-guided compression (RAG-aware)
guided = client.compress(
    text=input_text,
    compression_ratio=0.5,
    rag_mode=True,
    question="What is the warranty period?"
)

print(result["output"])
print(result["original_tokens"])    # 420
print(result["compressed_tokens"])  # 210

Self-Host
In Seconds

One command · No cloud · No API key

$ docker run -p 8000:8000 itsaryanchauhan/winnow

→ localhost:8000

→ /health

→ /compress

Keep thesignal.