Agently 4: Turn “Agent Shipping” into an Engineering Problem You Can Control¶

When we co-build with product and engineering teams, we keep hearing the same sentence: “The demo was easy. Production is hard.”
The hard part is not whether the model can answer. It’s whether the system can survive real traffic, real data, and real dependencies—while staying testable, observable, and maintainable.

Agently 4 is our answer: bring LLM uncertainty back inside an engineering boundary with controllable outputs, orchestrated workflows, and traceable execution—so agents can move from prototypes to reliable systems.

How We Designed Agently: Uncertainty, Contained¶

Schema first, generation second: turn free-form text into verifiable contracts via structured outputs. See Output Format Control.
Workflow first, autonomy inside nodes: use TriggerFlow to make multi-step agent behavior predictable and reviewable. See TriggerFlow Overview.
Evidence first, then ship: tool calls, metadata, and runtime events are retrievable for debugging and evaluation. See Tools, Response Result, and Runtime Stream.

Scenario 1: “JSON output” breaks the API contract¶

We’ve watched teams require JSON, ship quickly, and then hit production where the model returns missing keys, extra prose, or inconsistent structures. A single parse failure can cascade into queue backlogs, retry storms, and alerts across the stack.

What Agently does: define the contract with output(), validate critical paths with ensure_keys, and retry predictably.

The key difference is where this reliability comes from. Instead of relying on provider-specific switches (e.g. response_format / JSON schema flags) to “guarantee” valid JSON, Agently’s schema alignment, streaming parsing, key validation, and retries are handled inside the framework pipeline. As long as the backend can do normal chat/completion, you can keep your contract stable even when you switch models or inference servers—practically decoupled from model-provider features, and friendly to most modern 7B+ instruction models released since 2024.

Business impact: - Stable integration with downstream services - Controlled failure modes (retry/fallback) instead of cascading incidents - Faster delivery: product requirements become schemas engineers can enforce

from agently import Agently

agent = Agently.create_agent()

release = (
    agent.input("Write a weekly release announcement for enterprise customers")
    .output(
        {
            "title": (str, "Title"),
            "highlights": [(str, "3-5 highlights")],
            "compatibility": (str, "Compatibility notes"),
            "risk_notes": (str, "Risk and rollback notes"),
        }
    )
    .start(
        ensure_keys=["title", "highlights[*]", "risk_notes"],
        max_retries=1,
        raise_ensure_failure=False,
    )
)
print(release)

Scenario 2: The more tools you add, the more “mysterious” it gets¶

Agents inevitably need tools: ticketing, CRM, approvals, internal APIs, databases. The first tool is easy. The tenth is where you start seeing wrong parameters, unexpected tool timing, and failures nobody can reproduce.

What Agently does: standardized tool registration (built-ins + decorators) plus tool-call tracing via extra.

Just as importantly, Agently’s “tool planning” is framework-native: deciding whether to use a tool, which tool to use, and how to build arguments is implemented as a built-in planning step—so it doesn’t depend on whether your provider implements function calling / tool calling. In other words, even with a plain chat endpoint, Agently can still run tool chains and leave an auditable trail.

Business impact: - Lower long-term integration/maintenance cost - Faster debugging with auditable tool call evidence - Clearer safety boundaries: what can be called, with what data

from agently import Agently

agent = Agently.create_agent()

@agent.tool_func
def lookup_order(order_id: str) -> str:
    return f"order:{order_id}"

agent.use_tools(lookup_order)
response = agent.input("Check order A1001 and explain the status").get_response()

extra = response.result.full_result_data.get("extra", {})
print("[tool_calls]", extra.get("tool_calls") or extra.get("tool_logs"))
print("[answer]", response.result.get_text())

Scenario 3: Multi-step agents work in tests, but you can’t trust them in production¶

Fully autonomous “plan → act → plan → answer” loops look great—until production. The agent drifts, loops, calls risky tools at the wrong time, or fails without a recoverable path.

What Agently does: TriggerFlow turns multi-step behavior into an event-driven graph with branching, collection, and result control. See Emit + When.

Business impact: - Predictable behavior you can QA and roll out gradually - Clear collaboration boundaries between business logic and model-driven nodes - Controlled failure paths instead of “whatever the model decides”

Scenario 4: Low-code orchestration becomes unmaintainable—and migration feels risky¶

A pattern we see often: teams start with visual builders (n8n / Dify / Coze) to validate workflows quickly. Then the graph grows: more branches, more state, more shared subflows—and suddenly maintenance becomes the bottleneck. Diff/review is hard, reuse turns into copy/paste, and CI/CD + tests become awkward.

TriggerFlow is built for this exact “second stage”: it translates low-code mental models into code in a very direct way—to(...) as nodes, when(...) as signal-driven branches, collect(...) as joins, for_each(...)/batch(...) as concurrency—and gives you version control, tests, code review, and long-term maintainability.

The real unlock is combining TriggerFlow’s signal model with Agently’s Instant mode: during a single model request, you can capture completed structured nodes and trigger downstream actions immediately—something many low-code tools struggle to express reliably.

Below is a companion-robot style example: while the agent streams speech, it also emits action signals as soon as each actions[*] item becomes complete.

import asyncio
from agently import Agently, TriggerFlow, TriggerFlowEventData

agent = Agently.create_agent()


class CompanionRobot:
    async def speak_delta(self, text_delta: str):
        await asyncio.sleep(0)

    async def do_action(self, action: dict):
        print(f"\n[robot action] {action}\n", end="", flush=True)
        await asyncio.sleep(0.2)


robot = CompanionRobot()
flow = TriggerFlow()


async def plan_and_stream(data: TriggerFlowEventData):
    request = (
        agent.input({"user": data.value, "role": "companion"})
        .output(
            {
                "speech": (str, "What to say, warm and supportive"),
                "actions": [
                    {
                        "type": (str, "Robot action type, e.g. 'wave'/'nod'/'dance'"),
                        "args": (dict, "Action parameters"),
                    }
                ],
            }
        )
    )

    async for instant in request.get_async_generator(type="instant"):
        if instant.path == "speech" and instant.delta:
            data.put_into_stream(instant.delta)
            await robot.speak_delta(instant.delta)
        if instant.wildcard_path == "actions[*]" and instant.is_complete:
            await data.async_emit("Robot.Action", instant.value)

    data.stop_stream()
    return "done"


async def exec_action(data: TriggerFlowEventData):
    await robot.do_action(data.value)
    return "ok"


flow.to(plan_and_stream).end()
flow.when("Robot.Action").to(exec_action).end()

for event in flow.get_runtime_stream("I feel a bit down today. Can you keep me company?", timeout=None):
    print(event, end="", flush=True)

Scenario 5: Real traffic arrives, and downstream systems melt (timeouts, rate limits, retry storms)¶

A single user request can fan out into many tool calls. Under concurrency, bottlenecks usually show up in third-party APIs, gateways, or DB pools—and failures can amplify via retries.

What Agently does: workflow-level concurrency limits via batch / for_each(concurrency=...). See Concurrency.

Business impact: - Protect downstream dependencies and reduce cascade failures - Capacity planning becomes a knob, not guesswork - Better cost control by avoiding wasteful parallelism

Scenario 6: Users feel it’s slow—and worse, it’s a black box¶

Latency is not just time; it’s perceived time. If nothing happens for 2 seconds, users assume the system is stuck. Multi-step flows make this worse: was it retrieval, tools, or generation?

What Agently does: streaming by design (delta/instant/specific) and TriggerFlow runtime streams to surface progress. See Streaming.

Business impact: - Lower perceived latency and more interactive UX - Better product experiences: progress views, step panels, structured UI cards

Scenario 7: Answers must match the Knowledge Base—and be traceable¶

One of the most common “enterprise RAG” roadblocks is deceptively simple: how do you prove the answer actually comes from the knowledge base?
In production, stakeholders will ask: - “Which document, which paragraph supports this claim?” - “If the knowledge base doesn’t contain it, can you say ‘unknown’ instead of making it up?”

If you just paste retrieved text into a prompt, this often turns into prompt-tuning guesswork: sometimes the model cites, sometimes it doesn’t; sometimes it “cites” but the citation is vague; sometimes it invents a claim and presents it as sourced.

With Agently, you can make this a clean engineering contract: retrieval results are structured (id/document/metadata), you inject them explicitly, require “answer + citations that point to ids”, and enforce it with ensure_keys. See Knowledge Base and Output Format Control.

Business impact: - Auditable answers: every reply carries source ids and quotes for review/compliance - Better grounding: “answer only from retrieval_results” reduces drift and hallucinations - Clear iteration path: distinguish “retrieval miss” from “generation non-compliance”

from agently import Agently
from agently.integrations.chromadb import ChromaCollection

# 1) Build / reuse a KB (demo)
embedding = Agently.create_agent()
collection = ChromaCollection(collection_name="demo", embedding_agent=embedding)
collection.add(
    [
        {
            "id": "kb-q3-001",
            "document": "Q3 goal: reduce churn; key actions: improve onboarding and winback.",
            "metadata": {"dept": "sales", "doc": "OKR-2025Q3"},
        }
    ]
)

# 2) Retrieve evidence (each item has id/document/metadata/distance)
query = "What is our Q3 focus?"
retrieval_results = collection.query(query, top_n=5)

# 3) Answer with citations pointing back to retrieval_results[*].id
agent = Agently.create_agent()
result = (
    agent.input(query)
    .info({"retrieval_results": retrieval_results})
    .output(
        {
            "answer": (str, "Final answer (must be grounded in retrieval evidence)"),
            "citations": [
                {
                    "source_id": (str, "Must in {retrieval_results.[].id}"),
                    "quote": (str, "Direct quote from {retrieval_results}"),
                    "why": (str, "How this quote supports the answer"),
                }
            ],
            "not_found": (bool, "true when evidence is insufficient"),
        }
    )
    .instruct(
        "You must answer only from {retrieval_results}; "
        "every key claim must have a citation (source_id + quote); "
        "if evidence is insufficient, set not_found=true and explain what's missing."
    )
    .start(
        ensure_keys=["answer", "citations[*].source_id", "citations[*].quote", "not_found"],
        max_retries=1,
        raise_ensure_failure=False,
    )
)
print(result)

Closing Note¶

Agently won’t choose your model or run your GPU cluster. What we do is turn the most failure-prone parts of agent shipping into building blocks you can encode in code reviews, SLAs, and runbooks—so your agent can actually ship, operate, and iterate.