API
Reference
Base URL: http://localhost:8000 · Self-hosted, no API key required.
4 ENDPOINTS
POST
/v1/compressCompress a single input string, reducing token count while preserving meaning.
REQUEST BODY
| FIELD | TYPE | REQ | DESCRIPTION |
|---|---|---|---|
| input | string | YES | The text to compress. |
| question | string | NO | Optional question to guide RAG-mode compression. |
| compression_ratio | float | NO | Target compression ratio 0.1–0.9. Default: 0.5 |
| protected_strings | string[] | NO | Strings that must not be removed. Default: [] |
| rag_mode | boolean | NO | Enable RAG-optimized compression. Default: false |
| diff | boolean | NO | Return a diff showing removed tokens. Default: false |
| price_per_million_tokens | float | NO | Token price for savings estimate. Default: 0 |
RESPONSE
{
"output": "...",
"original_tokens": 420,
"compressed_tokens": 210,
"ratio": 2.0,
"diff": null,
"estimated_savings_usd": null
}POST
/v1/compress/batchCompress multiple inputs in a single request.
REQUEST BODY
| FIELD | TYPE | REQ | DESCRIPTION |
|---|---|---|---|
| inputs | string[] | YES | Array of text strings to compress. |
| question | string | NO | Optional question to guide RAG-mode compression for all inputs. |
| compression_ratio | float | NO | Target compression ratio 0.1–0.9. Default: 0.5 |
| protected_strings | string[] | NO | Strings that must not be removed. Default: [] |
| rag_mode | boolean | NO | Enable RAG-optimized compression. Default: false |
| diff | boolean | NO | Return a diff showing removed tokens. Default: false |
| price_per_million_tokens | float | NO | Token price for savings estimate. Default: 0 |
RESPONSE
{
"results": [
{ "output": "...", "original_tokens": 420, "compressed_tokens": 210, "ratio": 2.0, "diff": null, "estimated_savings_usd": null },
{ "output": "...", "original_tokens": 380, "compressed_tokens": 190, "ratio": 2.0, "diff": null, "estimated_savings_usd": null }
],
"count": 2
}POST
/v1/chat/completionsOpenAI-compatible proxy. Automatically compresses user messages before forwarding to OpenAI. Requires an OpenAI API key via the Authorization header.
REQUEST BODY
| FIELD | TYPE | REQ | DESCRIPTION |
|---|---|---|---|
| model | string | YES | The OpenAI model to forward the request to. |
| messages | object[] | YES | Standard OpenAI messages array (role + content). |
| question | string | NO | Optional question to guide RAG-mode compression for user messages. |
| compression_ratio | float | NO | Compression ratio for user messages. Default: 0.5 |
| protected_strings | string[] | NO | Strings to preserve during compression. Default: [] |
| rag_mode | boolean | NO | Enable RAG-optimized compression. Default: false |
RESPONSE
{
"id": "chatcmpl-...",
"object": "chat.completion",
"choices": [
{
"message": { "role": "assistant", "content": "..." },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 210,
"completion_tokens": 42,
"total_tokens": 252
}
}GET
/healthHealth check. Returns server status.
RESPONSE
{
"status": "ok"
}