Skip to content

Lobster Workflow Automation

AI agents are powerful but unreliable when left to their own process decisions. Before Lobster, our agents ran work through heartbeat-triggered checks and manual coordination. The results were predictable:

  • DoD fields got missed. Agents would close tickets without filling in Deliverable, Acceptance Criteria, or Method fields — the Definition of Done existed on paper but wasn’t enforced.
  • Tickets got duplicated. Context-limit restarts caused agents to lose memory of what they’d just created. One incident produced 14 duplicate tickets in 94 seconds.
  • Quality varied by session. Whether an agent followed the right process depended on what context it loaded, what model was running, and how much of the conversation history survived compaction.

The core insight: agents don’t need more instructions — they need guardrails that make the right process the only process.

Lobster is a workflow DSL that defines structured, resumable agent workflows as declarative pipelines. Each pipeline is a sequence of typed steps — shell commands, LLM tasks, API calls, approval gates — that execute in order with error handling and retry logic.

The key properties that make it useful for agent orchestration:

  • Declarative. The pipeline definition is the process. No ambiguity about what should happen next.
  • Resumable. If a step fails or times out, the pipeline can resume from where it left off rather than starting over.
  • Enforced. Agents execute within the pipeline’s guardrails. They can’t skip steps, reorder them, or decide to “do it differently this time.”
  • Observable. Every step produces structured output that can be logged, audited, and fed into the next step.

Lobster runs as a first-class OpenClaw plugin. Pipelines execute internally through the lobster tool — no shell exec required, no exec security layer involvement. This means:

  • Pipelines can call openclaw.invoke steps to use any agent tool (web search, file operations, LLM tasks)
  • The llm-task step type runs structured JSON-in/JSON-out LLM calls with schema validation
  • Pipeline execution is isolated from the agent’s conversation context — no risk of context pollution

Pipelines are organized by the lifecycle phase they govern:

PipelinePurpose
dod-verify9-check compliance scan (deliverable, AC, method, Jira comment, memory updated, etc.)
jira-closeDoD-gated ticket closure — runs dod-verify as prerequisite, blocks close on failure

The DoD verification pipeline replaced a manual heartbeat check that ran every 2 hours. It now runs automatically before any ticket can transition to Done, and reports all failing checks in a single run instead of failing on the first one.

PipelinePurpose
work-startIn Progress transition with story points validation
commit-firstPre-commit checks before pushing code
ticket-pickupWebhook-triggered autonomous claim from backlog
work-decomposeDecompose parent ticket into sub-tasks using workflow templates

The ticket pickup pipeline is triggered by Jira webhooks flowing through a custom Go relay service (deployed on K8s with Cloudflare tunnel). When a ticket transitions to To Do, the webhook fires, NATS delivers it to the agent, and the pipeline evaluates whether to claim it — checking WIP limits, agent labels, priority, and blockers.

PipelinePurpose
jira-createTicket creation with two-layer duplicate gate and configurable issue type
dedup-wrapperStandalone duplicate prevention — searches for similar summaries before allowing creation

The search-before-create gate is a hard gate, not advisory. It returns exit 0 (safe), exit 1 (duplicates found, review required), or exit 2 (search failed, do not create). This exists because of the 14-duplicate incident.

Two-layer dedup. The original Jira search + LLM similarity gate had a blind spot: Jira Cloud’s search index takes 10–30 seconds to reflect newly created tickets. Three exact-duplicate pairs slipped through because the second creation call ran before the first ticket was indexed. The fix added a local file-based creation cache as a first-pass dedup layer — the agent checks its own recent-creation log before querying Jira. The Jira search + LLM scoring remains as the secondary check for catching older near-duplicates.

Configurable issue type. The pipeline originally hardcoded all tickets as Tasks. Bug reports created through the pipeline were mistyped — FAD-674 (a bug) was created as a Task because there was no override. An issue_type arg was added with Task as the default, allowing callers to specify Bug, Story, or other types.

PipelinePurpose
self-learningTidy → promote → archive → share cycle for .learnings/ entries
memory-maintenanceCurate daily notes into MEMORY.md, archive old entries
heartbeat-reflectionScheduled scan → promote → federate → share (crontab-driven)

The self-learning pipeline replaced manual heartbeat-driven reflection. It processes .learnings/ERRORS.md and .learnings/LEARNINGS.md entries, checks evidence counters (Recurrence/Confirmation/Disconfirmation), promotes mature entries to MEMORY.md, archives stale ones, and shares cross-agent learnings via Graphiti.

Operationalization. The heartbeat-reflection pipeline existed but was never scheduled — raw learnings accumulated (249 entries, 216 active) without graduating to stable knowledge or cross-agent sharing. Scheduling it via crontab with a wrapper script that manages MCP Gateway port-forward lifecycle closed the loop. Two path bugs in the OpenClaw deployment were also fixed: the scripts_dir default pointed to the desktop path instead of the pod path, and the workflow file was seeded to the wrong directory (missing the workspace/ segment). Both issues followed a recurring pattern where pipelines federated from the desktop environment carried host-specific paths into the container environment.

PipelinePurpose
sprint-planningWeekly planning review workflow
agent-solicitationNATS request-response for collecting agent input, then LLM merge into structured plan
nats-request-responseBatch inter-agent messaging with JetStream polling

The solicitation pipeline automates what was previously a manual process: sending NATS messages to each agent asking for sprint contributions, waiting for responses, polling JetStream history (because push delivery is unreliable), and merging responses into a structured plan via LLM.

PipelinePurpose
research-pushValidate research doc → classify by domain → get GitHub token → push to agent-research repo → regenerate INDEX.json → update README

The research-push pipeline solved a specific gap: research docs written during sessions were never making it to the agent-research repository. Agents would produce valuable research artifacts, mark the Jira ticket Done, and skip the git commit/push step — leaving files uncommitted. Five files spanning March 8–12 were found sitting uncommitted when the gap was discovered.

The pipeline enforces the full commit cycle as a structured workflow. It accepts a file path and optional Jira ticket key, validates the file exists and lives in an indexable content directory (research/, designs/, plans/, investigations/, fixes/, audits/), classifies it by domain subdirectory, gets a short-lived GitHub write token from the credential broker, pushes the doc via the GitHub Contents API (handling both create and update via SHA detection), regenerates INDEX.json by running the index generator, and updates the repo README.

PipelinePurpose
portfolio-auditCross-reference resolved Jira tickets with page commit history → flag stale pages
portfolio-promoteScan for Spencer’s approval comments → rewrite internal links to public paths → push to public tier → close ticket

The portfolio audit pipeline detects stale portfolio pages by cross-referencing resolved Jira tickets with page commit history. It runs a 7-step workflow: validate inputs and compute lookback window → acquire GitHub read token → fetch commit history for all portfolio pages → query Jira for recently resolved tickets → match tickets to pages using a three-tier label resolution (multi-label → single-label → fallback) → optionally classify flagged pages via LLM → generate a structured report.

Three iterations of bug fixes hardened this pipeline into production-grade tooling. The initial version had a hardcoded 50-ticket limit, no last-audit timestamp tracking, and broken ticket-to-page matching. The second iteration fixed matching logic and added tmpdir cleanup (preventing stale data contamination between runs), but pagination still failed — Jira Cloud silently ignores the start_at parameter and requires keyset pagination via next_page_token, which the MCP gateway HTTP wrapper strips from responses. The final fix replaced the pagination loop with a single call at limit:500 (both the gateway and Jira honor it despite the schema documenting 1–50) plus a fail-loud guardrail that throws if the result count hits the limit, converting silent truncation into an immediate error.

Before Lobster:

  • Agent decides whether to check DoD fields → sometimes skips them
  • Agent creates tickets from memory → duplicates when context is lost
  • Agent picks up work based on what it remembers → misses priority ordering
  • Quality depends on which model, what context, how the prompt landed

After Lobster:

  • Pipeline enforces 9 DoD checks before any ticket closes → zero missed fields
  • Two-layer dedup gate (local cache + Jira search) before any ticket creation → zero duplicates since deployment
  • Pipeline evaluates WIP limits, priority, and blockers before claiming → consistent pickup behavior
  • Process is identical regardless of model, context, or session state

Process engineering for AI agents. The same principles that make CI/CD pipelines reliable for software — declarative definitions, automated gates, structured error handling — apply to agent workflows. Agents are more reliable when the process is externalized from their reasoning into an enforced pipeline.

Incremental adoption. Lobster wasn’t deployed all at once. It started with DoD verification (the most painful gap), expanded to ticket creation (the most embarrassing failures), and grew to cover the full work lifecycle. Each pipeline was motivated by a specific incident or quality gap.

Self-auditing infrastructure. The portfolio audit pipeline is Lobster auditing its own ecosystem — using structured workflows to detect when portfolio documentation has fallen behind the work those same workflows govern. The three-iteration bug-fix cycle that hardened it mirrors the broader Lobster pattern: deploy, discover edge cases in production, harden with structured fixes.

Observable automation. Every pipeline run produces structured output that feeds into monitoring. Failed DoD checks get logged. Duplicate detection events get tracked. Pickup decisions get audited. The system is transparent about what it’s doing and why.