Why STL — Fewer Tokens, Same Result

STL is a flat, line-oriented structured-output format for LLM agents. Equivalent calls run roughly 30% fewer tokens than JSON, type safety carries over, the parser is one line of Python — and as a side effect, a 4B local model can finally generate it after three example shots.

Where the savings happen

Every LLM-agent system pays tokens at the same step — when the model emits a structured call:

Today

LLM

→

JSON

→

Parser

→

Action

The model emits a JSON blob. A parser validates it. The program reads parameters and acts. Every token the model writes in that middle box is billed to you. That is where STL trims the bill.

The model and the parser are interchangeable parts of the chain — you can swap GPT for Claude for Llama, or Python's JSON parser for Rust's. The middle protocol is also swappable, just less often treated that way. STL is what the middle box looks like when you optimize it for token cost.

Same chain, lighter middle

Today

LLM

→

JSON

→

Parser

→

Action

With STL

LLM

→

STL

→

Parser

→

Action

Same chain. Same parser concept. Same downstream program. Schema validation, type safety, streaming, retries — all carry over. The single thing that changes is the number of tokens the model has to emit.

This is a one-line change in your stack. Your agent framework — tool registry, retry logic, observability — keeps working unchanged. Only the parse step at the boundary swaps from json.loads() to stl_parser.parse(). Token savings show up on the very next call.

Three reasons STL is shorter

Flatter structure

JSON nests objects; each level adds an opening brace, a closing brace, and a quoted key with a colon. STL is one line per call. Modifiers all live inside a single ::mod(...) block — zero nesting.

Lighter key syntax

STL writes key=value where JSON writes "key": "value". That's four characters saved per key — two quotes, one colon, one space. Multiply by 10 keys per call and the savings stack.

No envelope tax

A JSON tool call wraps the body: {"function": "X", "parameters": {...}}. STL puts the function name in the anchor: [Tool:X]. The envelope tokens disappear.

Full benchmark → Calibrated token measurements across tokenizers, model families, and tool-call shapes live on a dedicated page: /articles/stl-token-efficiency

A real tool call, side-by-side

Same agent task: "Search the web for the query 'STL semantic tension language' and return the top 5 results."

JSON tool call (current)

{
  "tool": "web_search",
  "parameters": {
    "query": "STL semantic tension language",
    "top_k": 5,
    "freshness": "month",
    "safe": true
  }
}

Tokens (cl100k)~52

Brace pairs2

Escape riskmedium

STL tool call

[Tool:WebSearch] → [Query:STL_SemanticTension]
  ::mod(
    action="search",
    top_k=5,
    freshness="month",
    safe=true
  )

Tokens (cl100k)~36

Brace pairs0

Escape risklow

Roughly 30% fewer tokens, zero brace nesting, single line per call. Both forms carry the same semantic content; the parser on the program side does the same job in both worlds.

Bonus: small models can finally do this

An indirect consequence of the three reasons above: because STL is flat and uses syntax LLMs already see in their training data, smaller models can generate it reliably. JSON's structural strictness ate the smaller end of the model market; STL hands it back.

Why 4B struggles with JSON

Mismatched braces "...args": {"key": "val"} ,"x":2}" — missing close
Trailing comma {"a":1, "b":2,} — invalid JSON, model emits anyway
Unescaped quotes in strings {"q":"don't quote me"} — breaks
Mid-stream truncation at token limit: half-closed JSON, no recovery
Nested object confusion 3+ levels deep → model loses depth tracking

Every failure mode above is structural. JSON demands precise paired punctuation. 100B+ models learned to obey because the training cost is amortized; 4B models lack the parameter budget.

Why 4B succeeds with STL

STL is flat. No nesting. One line per edge. Modifiers are key=value pairs separated by commas in a single block. The training distribution already contains millions of key=value patterns from config files, command-line invocations, and Python kwargs.

Three example shots are typically enough to teach a 4B model the pattern:

SHOT 1[Tool:WebSearch] → [Query:OpenAI_GPT5] ::mod(action="search", top_k=3)

SHOT 2[Tool:ReadFile] → [Path:Config_YAML] ::mod(action="read", encoding="utf-8")

SHOT 3[Tool:SendEmail] → [Recipient:alice] ::mod(action="send", subject="Hello", body="Hi Alice")

USER: search the web for "vector databases for RAG"

[Tool:WebSearch] → [Query:VectorDB_RAG] ::mod(action="search", top_k=5)

4B model picks up the pattern after three examples. No special fine-tuning required.

What you keep, what you lose, what you gain

Capability	JSON tool call	STL tool call
Schema validation at parse	✓ JSON Schema	✓ STL Schema (Lark + Pydantic)
Type safety	✓	✓ via modifier types
Streamable parse	~ partial (NDJSON or manual)	✓ line-oriented native
Round-trip to JSON	native	✓ `stl_parser.to_json()`
OpenAPI / function-calling spec	✓ mature	~ STL-schema equivalent, less ecosystem yet
Token efficiency	~30% more	✓ baseline
4B-model viability	~ breaks frequently	✓ works with 3-shot
Human-readable on the wire	brace soup	reads like a sentence
Existing parser tooling	universal	~ `stl_parser` (Python), more coming

Honest costs. STL trades ecosystem maturity for protocol fit. OpenAPI tooling assumes JSON. Most observability platforms log JSON. If your stack is deeply JSON-coupled, the migration is incremental: start by emitting STL only inside the LLM-to-parser boundary, convert to JSON immediately after parse. You still get the model-side savings.

Try it in 5 minutes

Take any agent you currently run with JSON tool calls. Do three things:

1. Install the parser

pip install stl-parser

2. Replace your system-prompt's tool-call instruction

You can call tools by emitting one STL statement per call, one statement per line.
Format:
  [Tool:<ToolName>] -> [<Target>] ::mod(action="<verb>", <arg>=<value>, ...)

Available tools:
  - WebSearch (target: Query node; args: action, top_k, freshness)
  - ReadFile (target: Path node; args: action, encoding)
  - SendEmail (target: Recipient node; args: action, subject, body)

Examples:
  [Tool:WebSearch] -> [Query:STL_Lang] ::mod(action="search", top_k=5)
  [Tool:ReadFile] -> [Path:Config_YAML] ::mod(action="read", encoding="utf-8")

3. Parse on the program side

from stl_parser import parse

result = parse(model_output)
for stmt in result.statements:
    tool = stmt.source.name.split(":")[1]   # e.g. "WebSearch"
    args = stmt.modifiers                    # dict of typed kwargs
    dispatch(tool, **args)

That is the whole migration. The rest of your agent — retries, observability, tool registry — keeps working.

Repositories and tools

STL ecosystem entry points:
Language spec → github.com/scos-lab/semantic-tension-language
Parser + tools → github.com/scos-lab/STL-TOOLS (PyPI: stl-parser)
Graph engine → github.com/scos-lab/stg-engine (PyPI: stg-engine)
Empirical reports → stl-lang.org/articles