Why STL — Fewer Tokens, Same Result
Where the savings happen
Every LLM-agent system pays tokens at the same step — when the model emits a structured call:
The model and the parser are interchangeable parts of the chain — you can swap GPT for Claude for Llama, or Python's JSON parser for Rust's. The middle protocol is also swappable, just less often treated that way. STL is what the middle box looks like when you optimize it for token cost.
Same chain, lighter middle
json.loads() to stl_parser.parse().
Token savings show up on the very next call.
Three reasons STL is shorter
Flatter structure
JSON nests objects; each level adds an opening brace, a closing brace, and a quoted key with a colon.
STL is one line per call. Modifiers all live inside a single ::mod(...) block — zero nesting.
Lighter key syntax
STL writes key=value where JSON writes "key": "value". That's four characters
saved per key — two quotes, one colon, one space. Multiply by 10 keys per call and the savings stack.
No envelope tax
A JSON tool call wraps the body: {"function": "X", "parameters": {...}}.
STL puts the function name in the anchor: [Tool:X]. The envelope tokens disappear.
/articles/stl-token-efficiency A real tool call, side-by-side
Same agent task: "Search the web for the query 'STL semantic tension language' and return the top 5 results."
JSON tool call (current)
{
"tool": "web_search",
"parameters": {
"query": "STL semantic tension language",
"top_k": 5,
"freshness": "month",
"safe": true
}
} STL tool call
[Tool:WebSearch] → [Query:STL_SemanticTension] ::mod( action="search", top_k=5, freshness="month", safe=true )
Roughly 30% fewer tokens, zero brace nesting, single line per call. Both forms carry the same semantic content; the parser on the program side does the same job in both worlds.
Bonus: small models can finally do this
An indirect consequence of the three reasons above: because STL is flat and uses syntax LLMs already see in their training data, smaller models can generate it reliably. JSON's structural strictness ate the smaller end of the model market; STL hands it back.
Why 4B struggles with JSON
- Mismatched braces "...args": {"key": "val"} ,"x":2}" — missing close
- Trailing comma {"a":1, "b":2,} — invalid JSON, model emits anyway
- Unescaped quotes in strings {"q":"don't quote me"} — breaks
- Mid-stream truncation at token limit: half-closed JSON, no recovery
- Nested object confusion 3+ levels deep → model loses depth tracking
Every failure mode above is structural. JSON demands precise paired punctuation. 100B+ models learned to obey because the training cost is amortized; 4B models lack the parameter budget.
Why 4B succeeds with STL
STL is flat. No nesting. One line per edge. Modifiers are key=value pairs separated by commas in a single block. The training distribution already contains millions of key=value patterns from config files, command-line invocations, and Python kwargs.
Three example shots are typically enough to teach a 4B model the pattern:
[Tool:WebSearch] → [Query:VectorDB_RAG] ::mod(action="search", top_k=5)
4B model picks up the pattern after three examples. No special fine-tuning required.
What you keep, what you lose, what you gain
| Capability | JSON tool call | STL tool call |
|---|---|---|
| Schema validation at parse | ✓ JSON Schema | ✓ STL Schema (Lark + Pydantic) |
| Type safety | ✓ | ✓ via modifier types |
| Streamable parse | ~ partial (NDJSON or manual) | ✓ line-oriented native |
| Round-trip to JSON | native | ✓ stl_parser.to_json() |
| OpenAPI / function-calling spec | ✓ mature | ~ STL-schema equivalent, less ecosystem yet |
| Token efficiency | ~30% more | ✓ baseline |
| 4B-model viability | ~ breaks frequently | ✓ works with 3-shot |
| Human-readable on the wire | brace soup | reads like a sentence |
| Existing parser tooling | universal | ~ stl_parser (Python), more coming |
Try it in 5 minutes
Take any agent you currently run with JSON tool calls. Do three things:
1. Install the parser
pip install stl-parser
2. Replace your system-prompt's tool-call instruction
You can call tools by emitting one STL statement per call, one statement per line. Format: [Tool:<ToolName>] -> [<Target>] ::mod(action="<verb>", <arg>=<value>, ...) Available tools: - WebSearch (target: Query node; args: action, top_k, freshness) - ReadFile (target: Path node; args: action, encoding) - SendEmail (target: Recipient node; args: action, subject, body) Examples: [Tool:WebSearch] -> [Query:STL_Lang] ::mod(action="search", top_k=5) [Tool:ReadFile] -> [Path:Config_YAML] ::mod(action="read", encoding="utf-8")
3. Parse on the program side
from stl_parser import parse
result = parse(model_output)
for stmt in result.statements:
tool = stmt.source.name.split(":")[1] # e.g. "WebSearch"
args = stmt.modifiers # dict of typed kwargs
dispatch(tool, **args) That is the whole migration. The rest of your agent — retries, observability, tool registry — keeps working.
Repositories and tools
Language spec →
github.com/scos-lab/semantic-tension-language Parser + tools →
github.com/scos-lab/STL-TOOLS (PyPI: stl-parser)
Graph engine →
github.com/scos-lab/stg-engine (PyPI: stg-engine)
Empirical reports →
stl-lang.org/articles