Architecture

A production-grade agent with clear separation between channels, inference, tools, and storage. Designed to run on constrained hardware without compromising capability.

Overview

The system has four layers:

  1. Channels — Telegram, Discord, Slack, Matrix, IRC, WhatsApp, XMPP, SMS, email, web chat, ScuttleBot (how users reach the agent)
  2. Core loop — message routing, tool dispatch, memory management
  3. Inference — local model + cloud LLM cascade
  4. Tools + Storage — external integrations and persistent state

All channels run concurrently via the ChannelRunner, sharing the same AgentLoop instance. Each channel implements a simple protocol: start(loop), stop(), send_message(user_id, text).

                    ┌─────────────┐
                    │  Telegram   │
                    │    Bot      │
                    └──────┬──────┘
                           │
┌──────────┐        ┌──────┴──────┐        ┌──────────────┐
│   SMS    ├────────┤  AgentLoop  ├────────┤  Tool        │
│ (Termux) │        │  (core)     │        │  Registry    │
└──────────┘        └──────┬──────┘        └──────┬───────┘
                           │                      │
┌──────────┐        ┌──────┴──────┐        ┌──────┴───────┐
│ Web Chat ├────────┤  Inference  │        │ Calendar     │
│ (public) │        │  Cascade    │        │ Email        │
└──────────┘        └─────────────┘        │ Jira         │
                    local → light → heavy  │ Search       │
                                           │ Deploy       │
                    Security boundary:     │ MCP Gateway  │
                    Web visitors get a     └──────────────┘
                    sandboxed WebAgent
                    with NO tool access

Inference cascade

Messages flow through a three-tier cascade. Each tier is optional — if one isn't configured, it's skipped:

TierDefaultWhen it firesCost
Local Phi-3.5 Mini (Q4) Simple messages, routing decisions Free (on-device)
Light Gemini Flash Medium complexity, longer context Very cheap
Heavy Claude Sonnet Complex reasoning, tool orchestration Standard API pricing

The local model acts as a router: it classifies message complexity and decides which tier to forward to. Simple greetings stay local. Complex planning tasks go to the heavy tier. This keeps API costs low while maintaining high quality for tasks that need it.

Security model

The security architecture has one critical boundary:

Web visitors interact with a sandboxed WebAgent that has ZERO access to internal systems. It uses the cloud LLM for conversation only. No tools, no memory stores, no file access, no email, no calendar.

The internal AgentLoop — accessed via Telegram and SMS — has full tool access. These channels are authenticated (Telegram user IDs, phone number allowlists).

Defense in depth

Memory system

Four independent SQLite databases handle different types of persistence:

StoreFileWhat it holds
Conversation conversations.db Full chat history per channel. Used for context in subsequent messages.
Structured memories.db Extracted facts, preferences, relationships. The agent remembers what you told it.
Plans plans.db Multi-step plans with status tracking. "Plan a trip to Tokyo" creates actionable steps.
Knowledge knowledge.db Persistent key-value store for reference information. Survives conversation resets.

All stores use async SQLite (via aiosqlite) and initialize lazily inside the event loop. Web visitors have their own ephemeral in-memory sessions — they never touch these databases.

Tool dispatch

Tools register with a central ToolRegistry. Each tool declares its name, description, and parameter schema. The LLM generates [TOOL:name] blocks in its responses, which the agent loop parses and dispatches.

The dispatch flow:

  1. LLM generates a response containing [TOOL:calendar] create meeting tomorrow at 2pm
  2. Agent loop extracts the tool call
  3. Registry looks up the tool by name
  4. Tool executes (API call, DB query, etc.)
  5. Result feeds back into the next LLM turn

Tools can be backed by MCP servers. The MCPGatewayTool wraps any MCP server as a standard tool, with automatic fallback to REST implementations if the MCP connection fails.

Web presence

The built-in web server (Starlette ASGI) provides:

Everything runs behind Cloudflare Tunnel for TLS termination, DDoS protection, and edge caching.

Agent loop

The AgentLoop is the central conversation engine. For each incoming message:

  1. Load conversation history from memory
  2. Build the message context (system prompt + history + current message)
  3. Check goal alignment (12-Week-Year integration)
  4. Route to appropriate inference tier
  5. Parse and dispatch any tool calls
  6. Extract structured memories from the conversation
  7. Save conversation and return the response

The loop handles tool call chains — if a tool result triggers another tool call, it continues dispatching until the LLM produces a final response.

Sovereign engine

The autonomous engine handles complex, multi-step tasks that go beyond a single conversation turn. Triggered via /engine in Telegram or engine: <task> in any channel.

It runs in a separate context with its own LLM session, using the heavy cloud tier. The engine can:

Destructive operations go through the BlessingGate — a human-in-the-loop approval system that sends a Telegram message asking for confirmation before proceeding.

MCP gateway

The agent acts as an MCP client. MCP servers are launched on-demand (lazy initialization) and communicate over stdio. The MCPGatewayTool wraps each server as a standard tool:

Project structure

src/palmtop/
├── __main__.py          # Entry point
├── persona.py           # Persona config → system prompts
├── brand.py             # HTML email template (persona-driven)
├── config/settings.py   # Config loader (TOML → dataclasses)
├── core/
│   ├── loop.py          # AgentLoop — main conversation engine
│   ├── engine.py        # Sovereign engine (autonomous tasks)
│   ├── blessing.py      # Human-in-the-loop approval gate
│   ├── goal_aligner.py  # 12-Week-Year goal alignment
│   ├── monitor.py       # Proactive monitoring
│   └── tracing.py       # Observability (SQLite / Langfuse)
├── inference/
│   ├── local.py         # llama.cpp backend
│   └── cloud.py         # Anthropic / Google / OpenAI backends
├── channels/
│   ├── telegram.py      # Telegram bot
│   ├── sms.py           # Termux SMS
│   └── sms_listener.py  # Dual-channel SMS listener
├── tools/               # Calendar, email, Jira, search, deploy...
├── memory/              # Conversation, structured, plans
├── knowledge/           # SQLite knowledge base
├── mcp/                 # MCP client, server, gateway
├── voice/               # STT + TTS
├── cursor/              # Cursor Cloud Agents bridge
└── web/
    ├── app.py           # Starlette ASGI server
    ├── agent.py         # Sandboxed WebAgent
    ├── blog.py          # Blog engine
    ├── outreach.py      # Lead qualification + auto-email
    └── static/          # Landing page, CSS, JS, blog posts