Keep the
signal.

Drop the noise.

Open-source middleware that compresses RAG context before it hits the LLM - cutting token costs by ~50% with <3% accuracy loss.

V0.1.1 · LLMLINGUA-2 · MIT
RAW CONTEXT30/30 TOKENS · 0%
TheretrievalaugmentedgenerationpipelineessentiallyworksbyfetchingrelevantdocumentchunksfromalargevectordatabaseandthenbasicallyfeedingthemdirectlyintotheLLMcontextwindowforprocessing
SIGNAL: 15 TOKENS PRESERVED · NOISE: 15 TOKENS REMOVED
01
~50%
TOKEN REDUCTION
02
<3pt
F1 SCORE DROP
03
85ms
AVG LATENCY
04
0
CODE CHANGES W/ OPENAI SDK
Try It Live[API]
ContextPASTE YOUR TEXT
Compression Mode
Compression Ratio0.5
POWERED BY HUGGING FACE SPACESAPI DOCS →
How It Works[3 STEPS]
01

RETRIEVE

Your vector DB returns raw document chunks. Verbose, overlapping, and expensive.

02

COMPRESS

Winnow runs token-level compression guided by your query. Relevant tokens survive. Filler is removed.

03

GENERATE

Your LLM receives a ~50% shorter prompt. Same answer. Half the cost.

Benchmark ResultsSQUAD · LLMLINGUA-2
420
AVG TOKENS (IN)
210
AVG TOKENS (OUT)
~50%
REDUCTION
~85ms
LATENCY
PRESETTOKENS INTOKENS OUTREDUCTIONF1 SCOREF1 DROP
AGGRESSIVE (0.3)420147~65%73.45.0 PT
BALANCED (0.5)420210~50%76.12.3 PT
LIGHT (0.7)420294~30%77.6<1 PT
WINNOW V0.1.1 · LLMLINGUA-2FULL RESULTS →
IntegrationDROP-IN
/

Add Winnow
in minutes

Python SDK, LangChain integration, raw HTTP, or OpenAI-compatible proxy - just swap your base URL. Zero config required.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from winnow import Winnow

client = Winnow(base_url="http://localhost:8000")

# Sentence-only compression
result = client.compress(text=input_text, compression_ratio=0.5)

# Question-guided compression (RAG-aware)
guided = client.compress(
    text=input_text,
    compression_ratio=0.5,
    rag_mode=True,
    question="What is the warranty period?"
)

print(result["output"])
print(result["original_tokens"])    # 420
print(result["compressed_tokens"])  # 210

Self-Host
In Seconds

One command · No cloud · No API key

$ docker run -p 8000:8000 itsaryanchauhan/winnow
→ localhost:8000
→ /health
→ /compress