Lobster Workflow Automation

The Problem: Inconsistent Agent Behavior

AI agents are powerful but unreliable when left to their own process decisions. Before Lobster, our agents ran work through heartbeat-triggered checks and manual coordination. The results were predictable:

DoD fields got missed. Agents would close tickets without filling in Deliverable, Acceptance Criteria, or Method fields — the Definition of Done existed on paper but wasn’t enforced.
Tickets got duplicated. Context-limit restarts caused agents to lose memory of what they’d just created. One incident produced 14 duplicate tickets in 94 seconds.
Quality varied by session. Whether an agent followed the right process depended on what context it loaded, what model was running, and how much of the conversation history survived compaction.

The core insight: agents don’t need more instructions — they need guardrails that make the right process the only process.

What Lobster Is

Lobster is a workflow DSL that defines structured, resumable agent workflows as declarative pipelines. Each pipeline is a sequence of typed steps — shell commands, LLM tasks, API calls, approval gates — that execute in order with error handling and retry logic.

The key properties that make it useful for agent orchestration:

Declarative. The pipeline definition is the process. No ambiguity about what should happen next.
Resumable. If a step fails or times out, the pipeline can resume from where it left off rather than starting over.
Enforced. Agents execute within the pipeline’s guardrails. They can’t skip steps, reorder them, or decide to “do it differently this time.”
Observable. Every step produces structured output that can be logged, audited, and fed into the next step.

Integration with OpenClaw

Lobster runs as a first-class OpenClaw plugin. Pipelines execute internally through the lobster tool — no shell exec required, no exec security layer involvement. This means:

Pipelines can call openclaw.invoke steps to use any agent tool (web search, file operations, LLM tasks)
The llm-task step type runs structured JSON-in/JSON-out LLM calls with schema validation
Pipeline execution is isolated from the agent’s conversation context — no risk of context pollution

20+ Pipelines in Production

By June 2026, Lobster had moved beyond “helpful CLI wrapper” into the operating layer for agent work: OpenClaw pods, Claude Code, and costaff all share the same core workflow patterns, with path rewriting and federation checks keeping container, laptop, and WSL environments aligned.

Pipelines are organized by the lifecycle phase they govern:

DoD Enforcement

Pipeline	Purpose
`dod-verify`	9-check compliance scan (deliverable, AC, method, Jira comment, memory updated, etc.)
`jira-close` / `dod-transition`	DoD-gated ticket closure — runs dod-verify as prerequisite, strips `work:*` labels, and blocks close on failure

The DoD verification pipeline replaced a manual heartbeat check that ran every 2 hours. It now runs automatically before any ticket can transition to Done, and reports all failing checks in a single run instead of failing on the first one. Subsequent hardening added deploy verification, custom-field validation, Story Points validation, and automatic removal of work:* labels when a ticket reaches Done so closed work does not re-enter pickup queues.

Work Lifecycle

Pipeline	Purpose
`work-start`	In Progress transition with story points validation, required field checks, and pre-start comment surfacing
`commit-first`	Pre-commit checks before pushing code
`ticket-pickup`	Webhook-triggered autonomous claim from backlog
`work-decompose`	Decompose parent ticket into sub-tasks using workflow templates
`mcp-gateway-call`	In-pod MCP Gateway tool calls without raw shell/curl ceremony

The ticket pickup pipeline is triggered by Jira webhooks flowing through a custom Go relay service (deployed on K8s with Cloudflare tunnel). When a ticket transitions to To Do, the webhook fires, NATS delivers it to the agent, and the pipeline evaluates whether to claim it — checking WIP limits, agent labels, priority, and blockers.

The workflow layer has also been hardened around real autonomy failures. work-start now surfaces existing Jira comments before transition so agents see duplicate notes, blockers, and human context before claiming work. The autonomous pickup path was corrected to treat the agent’s own active In Progress ticket as resumable work rather than a blocker, while still blocking on other active WIP. A small mcp-gateway-call workflow gives OpenClaw agents a deterministic MCP tool-call path inside pods, avoiding ad-hoc Node/curl snippets when Jira or other gateway tools are needed from constrained runtime contexts.

Creation & Dedup

Pipeline	Purpose
`jira-create`	Ticket creation with two-layer duplicate gate and configurable issue type
`dedup-wrapper`	Standalone duplicate prevention — searches for similar summaries before allowing creation

The search-before-create gate is a hard gate, not advisory. It returns exit 0 (safe), exit 1 (duplicates found, review required), or exit 2 (search failed, do not create). This exists because of the 14-duplicate incident.

Two-layer dedup. The original Jira search + LLM similarity gate had a blind spot: Jira Cloud’s search index takes 10–30 seconds to reflect newly created tickets. Three exact-duplicate pairs slipped through because the second creation call ran before the first ticket was indexed. The fix added a local file-based creation cache as a first-pass dedup layer — the agent checks its own recent-creation log before querying Jira. The Jira search + LLM scoring remains as the secondary check for catching older near-duplicates.

Configurable issue type. The pipeline originally hardcoded all tickets as Tasks. Bug reports created through the pipeline were mistyped — FAD-674 (a bug) was created as a Task because there was no override. An issue_type arg was added with Task as the default, allowing callers to specify Bug, Story, or other types.

Memory & Learning

Pipeline	Purpose
`self-learning`	Tidy → promote → archive → share cycle for `.learnings/` entries
`memory-maintenance`	Curate daily notes into MEMORY.md, archive old entries
`heartbeat-reflection`	Scheduled scan → promote → federate → share (crontab-driven)

The self-learning pipeline replaced manual heartbeat-driven reflection. It processes .learnings/ERRORS.md and .learnings/LEARNINGS.md entries, checks evidence counters (Recurrence/Confirmation/Disconfirmation), promotes mature entries to MEMORY.md, archives stale ones, and shares cross-agent learnings via Graphiti.

Operationalization. The heartbeat-reflection pipeline existed but was never scheduled — raw learnings accumulated (249 entries, 216 active) without graduating to stable knowledge or cross-agent sharing. Scheduling it via crontab with a wrapper script that manages MCP Gateway port-forward lifecycle closed the loop. Two path bugs in the OpenClaw deployment were also fixed: the scripts_dir default pointed to the desktop path instead of the pod path, and the workflow file was seeded to the wrong directory (missing the workspace/ segment). Both issues followed a recurring pattern where pipelines federated from the desktop environment carried host-specific paths into the container environment.

That portability problem became a first-class design concern. Federation now rewrites embedded Lobster paths per target, and OpenClaw seeds carry container-native workflow/script paths instead of laptop-only ~/.local/share/lobster assumptions. Costaff received the same canonical lifecycle setup for duplicate search, ticket creation, work start, DoD verification, comments, and closure, with jira-close path references fixed so the WSL environment is not coupled to Spencer’s laptop paths.

Planning & Coordination

Pipeline	Purpose
`sprint-planning`	Weekly planning review workflow
`agent-solicitation`	NATS request-response for collecting agent input, then LLM merge into structured plan
`nats-request-response`	Batch inter-agent messaging with JetStream polling

The solicitation pipeline automates what was previously a manual process: sending NATS messages to each agent asking for sprint contributions, waiting for responses, polling JetStream history (because push delivery is unreliable), and merging responses into a structured plan via LLM.

Research & Knowledge

Pipeline	Purpose
`research-push`	Validate research doc → classify by domain → get GitHub token → push to agent-research repo → regenerate INDEX.json → update README

The research-push pipeline solved a specific gap: research docs written during sessions were never making it to the agent-research repository. Agents would produce valuable research artifacts, mark the Jira ticket Done, and skip the git commit/push step — leaving files uncommitted. Five files spanning March 8–12 were found sitting uncommitted when the gap was discovered.

The pipeline enforces the full commit cycle as a structured workflow. It accepts a file path and optional Jira ticket key, validates the file exists and lives in an indexable content directory (research/, designs/, plans/, investigations/, fixes/, audits/), classifies it by domain subdirectory, gets a short-lived GitHub write token from the credential broker, pushes the doc via the GitHub Contents API (handling both create and update via SHA detection), regenerates INDEX.json by running the index generator, and updates the repo README.

Portfolio

Pipeline	Purpose
`portfolio-audit`	Cross-reference resolved Jira tickets with page commit history → flag stale pages
`portfolio-promote`	Scan for Spencer’s approval comments → rewrite internal links to public paths → push to public tier → close ticket
`github-status`	In-pod GitHub issue/PR digest via REST for OpenClaw agents without `gh` CLI

The portfolio audit pipeline detects stale portfolio pages by cross-referencing resolved Jira tickets with page commit history. It runs a 7-step workflow: validate inputs and compute lookback window → acquire GitHub read token → fetch commit history for all portfolio pages → query Jira for recently resolved tickets → match tickets to pages using a three-tier label resolution (multi-label → single-label → fallback) → optionally classify flagged pages via LLM → generate a structured report.

Three iterations of bug fixes hardened this pipeline into production-grade tooling. The initial version had a hardcoded 50-ticket limit, no last-audit timestamp tracking, and broken ticket-to-page matching. The second iteration fixed matching logic and added tmpdir cleanup (preventing stale data contamination between runs), but pagination still failed — Jira Cloud silently ignores the start_at parameter and requires keyset pagination via next_page_token, which the MCP gateway HTTP wrapper strips from responses. The final fix replaced the pagination loop with a single call at limit:500 (both the gateway and Jira honor it despite the schema documenting 1–50) plus a fail-loud guardrail that throws if the result count hits the limit, converting silent truncation into an immediate error.

A later June 2026 audit pass added another lesson: audit outputs need classification quality and mapping hygiene, not just fresh timestamps. The pipeline was expanded with report fields that separate intentional unmapped pages from actionable gaps, regression coverage for gap reporting, and follow-up triage that created targeted refresh tickets instead of dumping every age-only finding into the same bucket. The github-status pipeline filled a related OpenClaw gap: agents in pods have no gh CLI and no GitHub MCP server, so the pipeline reads mutable GitHub issue/PR state through REST and produces an edit-aware digest for reviews and follow-ups.

The Shift: Before and After

Before Lobster:

Agent decides whether to check DoD fields → sometimes skips them
Agent creates tickets from memory → duplicates when context is lost
Agent picks up work based on what it remembers → misses priority ordering
Quality depends on which model, what context, how the prompt landed

After Lobster:

Pipeline enforces 9 DoD checks before any ticket closes → zero missed fields
Two-layer dedup gate (local cache + Jira search) before any ticket creation → zero duplicates since deployment
Pipeline evaluates WIP limits, priority, and blockers before claiming → consistent pickup behavior
Process is identical regardless of model, context, or session state

What This Demonstrates

Process engineering for AI agents. The same principles that make CI/CD pipelines reliable for software — declarative definitions, automated gates, structured error handling — apply to agent workflows. Agents are more reliable when the process is externalized from their reasoning into an enforced pipeline.

Incremental adoption. Lobster wasn’t deployed all at once. It started with DoD verification (the most painful gap), expanded to ticket creation (the most embarrassing failures), and grew to cover the full work lifecycle. Each pipeline was motivated by a specific incident or quality gap.

Self-auditing infrastructure. The portfolio audit pipeline is Lobster auditing its own ecosystem — using structured workflows to detect when portfolio documentation has fallen behind the work those same workflows govern. The three-iteration bug-fix cycle that hardened it mirrors the broader Lobster pattern: deploy, discover edge cases in production, harden with structured fixes.

Observable automation. Every pipeline run produces structured output that feeds into monitoring. Failed DoD checks get logged. Duplicate detection events get tracked. Pickup decisions get audited. The system is transparent about what it’s doing and why.