An SRE agent built on AWS Bedrock AgentCore. It lives in Slack, thinks with Claude Opus 4.6,
and keeps your infrastructure in check — with human-in-the-loop safety for every dangerous action.
@mention Orbit in any Slack channel. It processes your request through a serverless pipeline with built-in safety rails.
1
Slack Trigger
User @mentions Orbit in a Slack thread. The message hits API Gateway, gets signature-verified, deduplicated, and kicks off a Step Functions workflow.
2
Agent Processing
Step Functions invokes the Orbit agent on AgentCore via callback pattern. Claude Opus 4.6 processes the request with access to CloudWatch, Datadog, Jira, Confluence, and more.
3
Safe Response
Every tool call passes through a four-tier permission guard. Structural shell bypasses and catastrophic commands are auto-denied, dangerous actions require Slack approval, and safe commands auto-allow. Responses are chunked and posted back to the thread.
Architecture
Main Request Flow
From @mention to response — follow the path of a Slack message through the entire serverless pipeline.
Slack / API Gateway
Lambda Functions
Step Functions
AgentCore Runtime
DynamoDB
Click to watch a request flow through the system
click to expand
Slack Workspace
@mention Orbit API Gateway POST /slack/events Approve / Reject API Gateway POST /slack/actions
Two API Gateway HTTP routes receive all Slack traffic. Every request is verified with HMAC-SHA256 before any processing occurs.
The Events route handles @mentions; the Actions route handles interactive button clicks from the HITL approval flow.
click to expand
Verification Lambda
1. HMAC-SHA256 signature check
2. Dedup via DynamoDB (1h TTL)
3. Start Step Functions
4. Return 200 within 3s
Timeout: 5 seconds Why 3s? Slack retries if it doesn't get 200 within 3 seconds. This Lambda must ACK fast, then start async processing via Step Functions. Dedup: DynamoDB table with 1-hour TTL prevents duplicate processing from Slack's retry mechanism (up to 3 retries).
The event_id is used as the partition key for atomic conditional puts.
Timeout: 10 seconds Race prevention: Uses DynamoDB ConditionExpression — only succeeds if status is still PENDING. Second click fails safely. Two modes: Tool-level approval (agent polls DynamoDB) and workflow-level approval (SendTaskSuccess to Step Functions).
click to expand
Step Functions (callback pattern)
▶PostThinking — post "Thinking…" to Slack
▶InvokeAgentWithCallback — waitForTaskToken
▶PostResult — update thread with response
▶Error handlers — 4 catch states
Callback pattern: Step Functions generates a unique task token and PAUSES at zero cost. The agent processes asynchronously and calls SendTaskSuccess when done.
Retry config: 6 attempts, 2s initial delay, 2x backoff, FULL jitter (prevents thundering herd). Timeouts: 8h max execution, 1h heartbeat deadline. Error states: PostAgentError, PostHeartbeatTimeout, PostTimeout, PostError, PostErrorNoThinking — each posts a specific error message back to the Slack thread.
click to expand
invoke_agent Lambda
Generate session ID from Slack thread — sha256(channel:thread_ts)
Invoke AgentCore with task_token + prompt
Timeout: 30 seconds Session ID: Deterministic — slack-thread-{sha256(channel:thread_ts)[:40]}. All messages in the same Slack thread share a session, enabling multi-turn conversation context. Thread history: Fetches full thread via Slack conversations.replies API and passes it to AgentCore for context injection.
click to expand
AgentCore Runtime (Orbit)
Spawns background thread, returns ACK
Claude Opus 4.6 processes the request
Sends SFN heartbeats every 30 min
Calls SendTaskSuccess when done
Tool Permission Guard (tool_guard_hook)
SAFE auto-allow — Read, Grep, CloudWatch, Lumigo, etc.
When the tool guard classifies a command as dangerous, the agent pauses and asks a human reviewer via Slack buttons. Fail-closed on timeout.
Click to watch the HITL approval flow in action
click
Agent detects danger
Tool classified as DANGEROUS tier
The tool_guard_hook runs before every tool call. When a bash command matches dangerous patterns (rm -rf, kill -9, etc.) or a WebFetch targets an untrusted domain, the agent initiates the approval flow.
click
post_approval_request
Post Slack buttons Store approval_id in DynamoDB
Generates a unique approval_id, stores the tool call context (command, arguments, reason) in DynamoDB, and posts a Slack message with [Approve] and [Reject] buttons to the thread.
Slack Buttons
ApproveReject
Reviewer clicks to decide
click
handle_interactivity
Atomic DynamoDB update Prevents double-click
Uses DynamoDB ConditionExpression: only succeeds if status = PENDING. If two reviewers click simultaneously, only the first write wins. Updates the Slack message to show who approved/rejected and when.
Try typing a bash command to see how the four-tier permission guard classifies it in real-time. Structural shell bypasses and catastrophic commands are auto-denied, dangerous commands require HITL approval, and safe commands auto-allow.
Enter a command above to see its classification
Try these examples:
ls -la /var/log
cat /etc/hosts
rm -rf /tmp/cache
kill -9 1234
:(){ :|:& };:
mkfs.ext4 /dev/sda1
chmod 777 /etc/passwd
systemctl stop nginx
dd if=/dev/zero of=/dev/sda
python3 -c "import os"
kubectl get pods
sed -i 's/foo/bar/' config
xargs rm *.log
shutdown -h now
rm -rf /
echo test | bash
eval "rm -rf /"
bash -c "whoami"
nc -l 4444
Infrastructure
Lambda Functions
12 Python 3.12 Lambda functions on arm64. Lambdas needing slack_sdk share a Lambda Layer.