If you started at the prompt and reached the harness, the final stage is owning the model itself. Nous Research's Hermes open models, its function-calling standard, and the data flywheel — with code.

2026년 6월 19일· TRAIL Labs

Open ModelsHermesNous ResearchFine-tuning

Open Models — Owning the Model Layer Itself

> Part 5 (final) of "The Evolution of Driving LLMs." ① Prompting · ② Vibe coding · ③ Agents · ④ Harness engineering · ⑤ Open models.

Up to here we rented the model. We typed prompts into a chat box (Part 1), handed off code (Part 2), handed off the loop (Part 3), and designed a harness around it (Part 4). The final stage is owning the model itself.

Moving from renting a model to placing an open-weight model inside your harness and tuning it on your own data — owning the whole stack

Why own the model

Renting a frontier model via API is convenient, but it has a price. You're dragged along when pricing or policy changes, you have to send data out, and you can't reshape the model to your domain. Open-weight models flip all three: you control cost and privacy, and you can tune the model on your own data.

The standout here is Nous Research's Hermes family. Through its most recent version, Hermes 4.3 (August 2025, built on ByteDance's Seed 36B), Hermes has established itself as an open model strong at system-prompt adherence, steerability, and function calling. What's interesting is that Nous ships more than a model — it also opens an agent framework (Hermes Agent) and a native desktop app (Hermes Desktop, released 2026, MIT). You can own the model + agent + harness as a whole.

Function calling has a standard too

Remember the structured output (tool_use) from Part 1? Open models have the same thing. The Hermes Function Calling standard puts tool definitions (JSON schema) in <tools>, calls in <tool_call>, and results in <tool_response>.

<!-- Hermes function calling — tool definitions as JSON schema in <tools> -->
<tools>
[{"name": "emit_slides",
  "parameters": {"type": "object",
    "properties": {"slides": {"type": "array"}},
    "required": ["slides"]}}]
</tools>

<!-- the model calls in this format -->
<tool_call>
{"name": "emit_slides", "arguments": {"slides": [/* … */]}}
</tool_call>

Only the format differs — it's exactly the idea from Part 1's tool_use: forcing the output to be an interface. So a harness designed for a frontier model ports to an open model almost as-is.

The harness is a data factory

This is where Part 4's side effect comes alive. A well-designed harness leaves structured data on every run: input → tool call → verified output. That is, directly, an instruction-tuning dataset.

Then a flywheel spins. The more you use the harness, the more domain data accumulates → you fine-tune an open model (e.g. Hermes) to your domain on that data → your existing workflows become the eval → a better-fit model runs the harness again. The structured outputs our generation pipeline leaves via tool_use, and the held-out judge we use for verification, are exactly the raw material for this flywheel.

Honestly, this stage is still closer to a hypothesis — it presumes enough data, quality control, and sustained investment. But the direction is clear: from renting a model toward owning the model, harness, and data as a whole.

Closing the series

We've passed through five stages — typing prompts into a chat box, building by vibes, handing the loop to an agent, designing a harness around it, and finally owning the model layer. The one line running through all of it: driving LLMs is moving from individual instinct to a system a team designs and owns.

The way we build content automation sits right on this arc — treating prompts as interfaces, bolting verification onto generation, raising the quality floor with a harness, and getting better off that data. Trail Studio is what that system looks like applied to content.

Sources: Hermes 3 Technical Report (arXiv) · Hermes-Function-Calling (GitHub) · Hermes 3 Llama 3.1 8B (Hugging Face)

Engineering2026년 6월 17일

Harness Engineering — What Makes the Same Model Behave Differently

Running agents safely and consistently is about the harness around the model — scoped tools, hooks, context layering, verification loops, executable knowledge. The MCP server we built (self-call = SSOT) and hook patterns, in code.

Engineering2026년 6월 15일

Agents — When the Model Started Running the Loop

OpenClaw (formerly Moltbot) put autonomous agents in the spotlight — the model uses tools, observes results, and runs the loop itself. The perceive→plan→act→observe loop, and the permission/HITL problem broad autonomy brings — with code.

Engineering2026년 6월 13일

Vibe Coding — Erasing the Friction of Building

Vibe coding, the term Karpathy popularized — describe the intent and the LLM writes the code. What it solved, where it breaks, and how to bolt a verification loop (isolate, run, check) onto it — with code.