API
Reference

Base URL: http://localhost:8000 · Self-hosted, no API key required.

4 ENDPOINTS
POST/v1/compress

Compress a single input string, reducing token count while preserving meaning.

REQUEST BODY
FIELDTYPEREQDESCRIPTION
inputstringYESThe text to compress.
questionstringNOOptional question to guide RAG-mode compression.
compression_ratiofloatNOTarget compression ratio 0.1–0.9. Default: 0.5
protected_stringsstring[]NOStrings that must not be removed. Default: []
rag_modebooleanNOEnable RAG-optimized compression. Default: false
diffbooleanNOReturn a diff showing removed tokens. Default: false
price_per_million_tokensfloatNOToken price for savings estimate. Default: 0
RESPONSE
{
  "output": "...",
  "original_tokens": 420,
  "compressed_tokens": 210,
  "ratio": 2.0,
  "diff": null,
  "estimated_savings_usd": null
}
POST/v1/compress/batch

Compress multiple inputs in a single request.

REQUEST BODY
FIELDTYPEREQDESCRIPTION
inputsstring[]YESArray of text strings to compress.
questionstringNOOptional question to guide RAG-mode compression for all inputs.
compression_ratiofloatNOTarget compression ratio 0.1–0.9. Default: 0.5
protected_stringsstring[]NOStrings that must not be removed. Default: []
rag_modebooleanNOEnable RAG-optimized compression. Default: false
diffbooleanNOReturn a diff showing removed tokens. Default: false
price_per_million_tokensfloatNOToken price for savings estimate. Default: 0
RESPONSE
{
  "results": [
    { "output": "...", "original_tokens": 420, "compressed_tokens": 210, "ratio": 2.0, "diff": null, "estimated_savings_usd": null },
    { "output": "...", "original_tokens": 380, "compressed_tokens": 190, "ratio": 2.0, "diff": null, "estimated_savings_usd": null }
  ],
  "count": 2
}
POST/v1/chat/completions

OpenAI-compatible proxy. Automatically compresses user messages before forwarding to OpenAI. Requires an OpenAI API key via the Authorization header.

REQUEST BODY
FIELDTYPEREQDESCRIPTION
modelstringYESThe OpenAI model to forward the request to.
messagesobject[]YESStandard OpenAI messages array (role + content).
questionstringNOOptional question to guide RAG-mode compression for user messages.
compression_ratiofloatNOCompression ratio for user messages. Default: 0.5
protected_stringsstring[]NOStrings to preserve during compression. Default: []
rag_modebooleanNOEnable RAG-optimized compression. Default: false
RESPONSE
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "choices": [
    {
      "message": { "role": "assistant", "content": "..." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 210,
    "completion_tokens": 42,
    "total_tokens": 252
  }
}
GET/health

Health check. Returns server status.

RESPONSE
{
  "status": "ok"
}