Architecture
A production-grade agent with clear separation between channels, inference, tools, and storage. Designed to run on constrained hardware without compromising capability.
Overview
The system has four layers:
- Channels — Telegram, Discord, Slack, Matrix, IRC, WhatsApp, XMPP, SMS, email, web chat, ScuttleBot (how users reach the agent)
- Core loop — message routing, tool dispatch, memory management
- Inference — local model + cloud LLM cascade
- Tools + Storage — external integrations and persistent state
All channels run concurrently via the ChannelRunner, sharing the same AgentLoop instance. Each channel implements a simple protocol: start(loop), stop(), send_message(user_id, text).
┌─────────────┐
│ Telegram │
│ Bot │
└──────┬──────┘
│
┌──────────┐ ┌──────┴──────┐ ┌──────────────┐
│ SMS ├────────┤ AgentLoop ├────────┤ Tool │
│ (Termux) │ │ (core) │ │ Registry │
└──────────┘ └──────┬──────┘ └──────┬───────┘
│ │
┌──────────┐ ┌──────┴──────┐ ┌──────┴───────┐
│ Web Chat ├────────┤ Inference │ │ Calendar │
│ (public) │ │ Cascade │ │ Email │
└──────────┘ └─────────────┘ │ Jira │
local → light → heavy │ Search │
│ Deploy │
Security boundary: │ MCP Gateway │
Web visitors get a └──────────────┘
sandboxed WebAgent
with NO tool access
Inference cascade
Messages flow through a three-tier cascade. Each tier is optional — if one isn't configured, it's skipped:
| Tier | Default | When it fires | Cost |
|---|---|---|---|
| Local | Phi-3.5 Mini (Q4) | Simple messages, routing decisions | Free (on-device) |
| Light | Gemini Flash | Medium complexity, longer context | Very cheap |
| Heavy | Claude Sonnet | Complex reasoning, tool orchestration | Standard API pricing |
The local model acts as a router: it classifies message complexity and decides which tier to forward to. Simple greetings stay local. Complex planning tasks go to the heavy tier. This keeps API costs low while maintaining high quality for tasks that need it.
Security model
The security architecture has one critical boundary:
Web visitors interact with a sandboxed WebAgent that has ZERO access to internal systems. It uses the cloud LLM for conversation only. No tools, no memory stores, no file access, no email, no calendar.
The internal AgentLoop — accessed via Telegram and SMS — has full tool access. These channels are authenticated (Telegram user IDs, phone number allowlists).
Defense in depth
- Input sanitization — all web inputs are cleaned and length-limited before reaching the LLM
- Rate limiting — per-IP chat and form rate limits prevent abuse
- Session caps — web sessions expire and message counts are limited
- Suspicious input logging — prompt injection attempts are flagged
- HTTPS enforcement — Cloudflare Tunnel terminates TLS; HTTP redirects to HTTPS
- CSP headers —
script-src 'self'blocks all inline scripts - Blessing gate — destructive operations (deploy, Cursor tasks) require Telegram approval
Memory system
Four independent SQLite databases handle different types of persistence:
| Store | File | What it holds |
|---|---|---|
| Conversation | conversations.db |
Full chat history per channel. Used for context in subsequent messages. |
| Structured | memories.db |
Extracted facts, preferences, relationships. The agent remembers what you told it. |
| Plans | plans.db |
Multi-step plans with status tracking. "Plan a trip to Tokyo" creates actionable steps. |
| Knowledge | knowledge.db |
Persistent key-value store for reference information. Survives conversation resets. |
All stores use async SQLite (via aiosqlite) and initialize lazily inside the event loop. Web visitors have their own ephemeral in-memory sessions — they never touch these databases.
Tool dispatch
Tools register with a central ToolRegistry. Each tool declares its name, description, and parameter schema. The LLM generates [TOOL:name] blocks in its responses, which the agent loop parses and dispatches.
The dispatch flow:
- LLM generates a response containing
[TOOL:calendar] create meeting tomorrow at 2pm - Agent loop extracts the tool call
- Registry looks up the tool by name
- Tool executes (API call, DB query, etc.)
- Result feeds back into the next LLM turn
Tools can be backed by MCP servers. The MCPGatewayTool wraps any MCP server as a standard tool, with automatic fallback to REST implementations if the MCP connection fails.
Web presence
The built-in web server (Starlette ASGI) provides:
- Landing page — configurable via
static/index.htmlwith persona placeholders - Chat widget — SSE-based chat with the sandboxed WebAgent
- Intake form — collects leads, notifies you on Telegram
- Blog engine — zero-dependency markdown-to-HTML with frontmatter parsing
- Lead outreach — auto-qualifies leads via LLM and sends branded follow-up emails
Everything runs behind Cloudflare Tunnel for TLS termination, DDoS protection, and edge caching.
Agent loop
The AgentLoop is the central conversation engine. For each incoming message:
- Load conversation history from memory
- Build the message context (system prompt + history + current message)
- Check goal alignment (12-Week-Year integration)
- Route to appropriate inference tier
- Parse and dispatch any tool calls
- Extract structured memories from the conversation
- Save conversation and return the response
The loop handles tool call chains — if a tool result triggers another tool call, it continues dispatching until the LLM produces a final response.
Sovereign engine
The autonomous engine handles complex, multi-step tasks that go beyond a single conversation turn. Triggered via /engine in Telegram or engine: <task> in any channel.
It runs in a separate context with its own LLM session, using the heavy cloud tier. The engine can:
- Break tasks into steps and execute them sequentially
- Use all registered tools
- Access structured memory and plans for context
- Report progress back through the channel
Destructive operations go through the BlessingGate — a human-in-the-loop approval system that sends a Telegram message asking for confirmation before proceeding.
MCP gateway
The agent acts as an MCP client. MCP servers are launched on-demand (lazy initialization) and communicate over stdio. The MCPGatewayTool wraps each server as a standard tool:
- Server starts on first tool use, not at boot
- Automatic reconnection on failure
- Optional REST fallback tools (e.g., Jira REST if Atlassian MCP fails)
- Environment variables auto-mapped from config sections
Project structure
src/palmtop/
├── __main__.py # Entry point
├── persona.py # Persona config → system prompts
├── brand.py # HTML email template (persona-driven)
├── config/settings.py # Config loader (TOML → dataclasses)
├── core/
│ ├── loop.py # AgentLoop — main conversation engine
│ ├── engine.py # Sovereign engine (autonomous tasks)
│ ├── blessing.py # Human-in-the-loop approval gate
│ ├── goal_aligner.py # 12-Week-Year goal alignment
│ ├── monitor.py # Proactive monitoring
│ └── tracing.py # Observability (SQLite / Langfuse)
├── inference/
│ ├── local.py # llama.cpp backend
│ └── cloud.py # Anthropic / Google / OpenAI backends
├── channels/
│ ├── telegram.py # Telegram bot
│ ├── sms.py # Termux SMS
│ └── sms_listener.py # Dual-channel SMS listener
├── tools/ # Calendar, email, Jira, search, deploy...
├── memory/ # Conversation, structured, plans
├── knowledge/ # SQLite knowledge base
├── mcp/ # MCP client, server, gateway
├── voice/ # STT + TTS
├── cursor/ # Cursor Cloud Agents bridge
└── web/
├── app.py # Starlette ASGI server
├── agent.py # Sandboxed WebAgent
├── blog.py # Blog engine
├── outreach.py # Lead qualification + auto-email
└── static/ # Landing page, CSS, JS, blog posts