EP13 advanced

TradingAgents Architecture: A Deep Dive

Inside the event-driven multi-agent system — 5 core agents, a message bus, shared state, and the decision pipeline from signal to execution.

TradingAgents isn’t one program. It’s five programs that talk to each other, disagree, negotiate, and occasionally override each other. Understanding the architecture before touching the code saves you from the kind of debugging sessions that eat entire weekends.

I’m going to walk through each component, how they communicate, and where the real complexity hides.

System Overview

┌──────────────────────────────────────────────────────┐
│                    Coordinator                        │
│            (orchestration + decision flow)            │
├──────────┬───────────┬───────────┬──────────────────┤
│  Market  │ Analysis  │   Risk    │    Execution     │
│  Data    │  Agent    │   Agent   │    Agent         │
│  Agent   │           │           │                  │
├──────────┴───────────┴───────────┴──────────────────┤
│              Message Bus (pub/sub)                    │
├─────────────────────────────────────────────────────┤
│          Shared State Store (portfolio,              │
│          positions, orders, signals)                 │
└─────────────────────────────────────────────────────┘

Five agents. One message bus. One shared state store. That’s the entire system.

Agent 1: Market Data Agent

This agent does one thing: get data into the system. It runs continuously, ingesting both real-time and historical data.

Real-time mode: Maintains WebSocket connections to market data providers. When a tick arrives, it publishes a MarketTick event to the message bus. Other agents subscribe to the ticks they care about.

Historical mode: On startup or when a new instrument is added, it backfills historical data into the shared state store. This is also what feeds the backtesting engine — same agent, same code, different data source.

class MarketDataAgent:
    def __init__(self, bus: MessageBus, state: StateStore, sources: list[DataSource]):
        self.bus = bus
        self.state = state
        self.sources = sources

    async def run(self):
        for source in self.sources:
            source.on_tick(self._handle_tick)
            await source.connect()

    def _handle_tick(self, tick: MarketTick):
        self.state.update_price(tick.symbol, tick.price, tick.volume)
        self.bus.publish("market.tick", tick)
        # Also check for price anomalies
        if self._is_anomaly(tick):
            self.bus.publish("market.anomaly", tick)

The anomaly detection is simple but useful: if a price moves more than 3 standard deviations from its 20-period mean in a single tick, it flags it. The Risk Agent listens for these flags.

Agent 2: Analysis Agent

This is where the intelligence lives. The Analysis Agent consumes market data and produces trading signals.

It runs multiple analysis strategies in parallel:

  • Technical indicators: RSI, MACD, Bollinger Bands, moving average crossovers
  • Pattern recognition: Head-and-shoulders, double bottoms, breakout detection
  • Statistical models: Mean reversion Z-scores, momentum factors, volatility regime detection
  • ML models: Trained models that output predicted returns or signal probabilities

Each strategy produces a Signal object:

@dataclass
class Signal:
    symbol: str
    direction: str          # "long" | "short" | "flat"
    confidence: float       # 0.0 to 1.0
    strategy_name: str
    timestamp: datetime
    metadata: dict          # strategy-specific context

The agent doesn’t make trading decisions. It just produces signals. The decision happens downstream.

class AnalysisAgent:
    def __init__(self, bus: MessageBus, strategies: list[Strategy]):
        self.bus = bus
        self.strategies = strategies
        self.bus.subscribe("market.tick", self._on_tick)

    def _on_tick(self, tick: MarketTick):
        for strategy in self.strategies:
            signal = strategy.evaluate(tick)
            if signal:
                self.bus.publish("analysis.signal", signal)

The separation matters. When a trade loses money, you can trace back: was the signal wrong, or was the signal right but execution was bad? If signals are mixed with decisions, you lose that diagnostic clarity.

Agent 3: Risk Agent

The Risk Agent is the system’s immune system. It monitors three things: portfolio-level risk, individual position risk, and incoming signal risk.

Portfolio risk: Total exposure, sector concentration, correlation risk, VaR (Value at Risk). If the portfolio is 80% in tech stocks and a new signal says “buy more tech,” the Risk Agent will flag or veto it.

Position risk: Per-position drawdown, holding period, unrealized P&L. If a position has been losing for 15 consecutive days, the Risk Agent can force a close.

Signal risk: Before any signal becomes an order, the Risk Agent checks it against constraints.

class RiskAgent:
    def __init__(self, bus: MessageBus, state: StateStore, limits: RiskLimits):
        self.bus = bus
        self.state = state
        self.limits = limits
        self.bus.subscribe("analysis.signal", self._on_signal)

    def _on_signal(self, signal: Signal):
        portfolio = self.state.get_portfolio()
        assessment = self._assess(signal, portfolio)
        self.bus.publish("risk.assessment", assessment)

    def _assess(self, signal: Signal, portfolio: Portfolio) -> RiskAssessment:
        checks = []

        # Max position size
        proposed_size = self._calculate_position_size(signal, portfolio)
        if proposed_size > self.limits.max_position_pct * portfolio.total_value:
            checks.append(RiskCheck("position_size", "FAIL",
                f"Position would be {proposed_size/portfolio.total_value:.1%} of portfolio"))

        # Max drawdown
        if portfolio.current_drawdown > self.limits.max_drawdown:
            checks.append(RiskCheck("drawdown", "FAIL",
                f"Portfolio drawdown {portfolio.current_drawdown:.1%} exceeds limit"))

        # Correlation check
        existing_symbols = [p.symbol for p in portfolio.positions]
        avg_corr = self._avg_correlation(signal.symbol, existing_symbols)
        if avg_corr > self.limits.max_correlation:
            checks.append(RiskCheck("correlation", "WARN",
                f"Avg correlation {avg_corr:.2f} with existing positions"))

        passed = all(c.status != "FAIL" for c in checks)
        return RiskAssessment(signal=signal, approved=passed, checks=checks,
                            suggested_size=proposed_size if passed else 0)

The Risk Agent has veto power. If it says no, the order doesn’t go through. Period. No override from the Analysis Agent. This is a design choice: it’s better to miss a good trade than to take a catastrophic one.

Agent 4: Execution Agent

Once a signal passes risk checks, the Execution Agent turns it into a real order.

This is more complex than “send order to broker.” The Execution Agent handles:

  • Order type selection: Market, limit, stop-loss. Depends on urgency and spread.
  • Order splitting: A large order split into smaller chunks to reduce market impact.
  • Broker API integration: Authentication, order submission, status polling, fill confirmation.
  • Slippage tracking: Records the difference between expected price and actual fill price.
class ExecutionAgent:
    def __init__(self, bus: MessageBus, state: StateStore, broker: BrokerAPI):
        self.bus = bus
        self.state = state
        self.broker = broker
        self.bus.subscribe("risk.assessment", self._on_assessment)

    def _on_assessment(self, assessment: RiskAssessment):
        if not assessment.approved:
            return

        order = self._create_order(assessment)
        fill = self.broker.submit_order(order)

        self.state.update_position(fill)
        self.bus.publish("execution.fill", fill)

        # Track slippage
        slippage = abs(fill.price - order.expected_price) / order.expected_price
        self.bus.publish("execution.slippage", {
            "symbol": order.symbol,
            "slippage_bps": slippage * 10000,
        })

Agent 5: Coordinator

The Coordinator doesn’t analyze data or manage risk. It manages the other agents.

Responsibilities:

  • Startup sequence: Ensures Market Data Agent is connected and backfilled before Analysis Agent starts generating signals.
  • Health monitoring: Pings each agent. If one stops responding, it restarts it or enters a safe mode (close all positions, stop new orders).
  • Decision pipeline: Enforces the signal → risk → execution flow. No shortcuts.
  • Configuration: Pushes updated parameters to agents without restarting the system.
class Coordinator:
    def __init__(self, agents: dict[str, Agent], bus: MessageBus):
        self.agents = agents
        self.bus = bus

    async def start(self):
        # Strict startup order
        await self.agents["market_data"].start()
        await self._wait_for_backfill()
        await self.agents["analysis"].start()
        await self.agents["risk"].start()
        await self.agents["execution"].start()

        # Monitor health
        while True:
            await asyncio.sleep(30)
            for name, agent in self.agents.items():
                if not agent.is_healthy():
                    await self._handle_unhealthy(name, agent)

The Message Bus

Agents communicate through publish/subscribe. No direct calls between agents. This decoupling is what makes the system testable — you can run the Analysis Agent alone with mock market data and verify its signals without a live broker connection.

The event types flow in one direction:

market.tick → analysis.signal → risk.assessment → execution.fill
                                                 → execution.slippage
market.anomaly → risk.alert

Each event is immutable. Once published, it can’t be modified. Agents react to events by publishing new events, never by mutating existing ones.

Shared State Store

The state store is the system’s single source of truth for mutable data: current positions, pending orders, cash balance, and historical fills.

All agents read from it. Only specific agents write to specific sections:

SectionWritten ByRead By
PricesMarket Data AgentAll
PositionsExecution AgentRisk, Analysis
OrdersExecution AgentRisk, Coordinator
SignalsAnalysis AgentRisk

This prevents conflicts. The Risk Agent never modifies positions directly — it can only approve or reject signals that the Execution Agent then acts on.

Scaling Considerations

Horizontal scaling of Analysis Agents: If you run 20 strategies, you can split them across multiple Analysis Agent instances. Each publishes to the same message bus. The Risk Agent aggregates signals regardless of source.

Vertical scaling of Market Data Agent: Real-time data ingestion is I/O-bound, not CPU-bound. Scaling up means more WebSocket connections, more memory for tick buffers, and faster network. One beefy machine usually beats three small ones for data ingestion.

Database choice: The state store can be in-memory (fastest, not durable), Redis (fast, somewhat durable), or PostgreSQL (slowest, fully durable). For live trading, I use Redis with periodic snapshots to PostgreSQL. For backtesting, in-memory only — no reason to persist intermediate state.

When Is This Worth the Complexity?

Not every portfolio needs five agents and a message bus.

SituationRecommendation
< $100K, < 10 instrumentsSingle-script backtester is fine
$100K-$1M, 10-50 instrumentsMulti-agent adds value in risk management
> $1M or > 50 instrumentsMulti-agent is practically required
Multiple strategies running simultaneouslyMulti-agent from day one
Regulatory reporting requirementsMulti-agent — you need the audit trail

The audit trail point is underrated. Every event on the message bus can be logged. When a regulator asks “why did you buy 10,000 shares at 14:32?”, you can replay the exact sequence: this tick arrived, this signal fired, this risk check passed, this order was submitted. Try doing that with a single-file script.

The Honest Assessment

This architecture is battle-tested in institutional trading. It’s not experimental. But it carries real operational overhead. You’re running five processes, a message bus, and a state store. Monitoring, alerting, and deployment are all more complex than a single Python script.

Start with the single-script approach. When you feel the pain — signals getting missed because risk checks are blocking the event loop, or you can’t debug why a trade happened — that’s when the multi-agent architecture earns its keep.