TradingAgents Architecture: A Deep Dive
Inside the event-driven multi-agent system — 5 core agents, a message bus, shared state, and the decision pipeline from signal to execution.
TradingAgents isn’t one program. It’s five programs that talk to each other, disagree, negotiate, and occasionally override each other. Understanding the architecture before touching the code saves you from the kind of debugging sessions that eat entire weekends.
I’m going to walk through each component, how they communicate, and where the real complexity hides.
System Overview
┌──────────────────────────────────────────────────────┐
│ Coordinator │
│ (orchestration + decision flow) │
├──────────┬───────────┬───────────┬──────────────────┤
│ Market │ Analysis │ Risk │ Execution │
│ Data │ Agent │ Agent │ Agent │
│ Agent │ │ │ │
├──────────┴───────────┴───────────┴──────────────────┤
│ Message Bus (pub/sub) │
├─────────────────────────────────────────────────────┤
│ Shared State Store (portfolio, │
│ positions, orders, signals) │
└─────────────────────────────────────────────────────┘
Five agents. One message bus. One shared state store. That’s the entire system.
Agent 1: Market Data Agent
This agent does one thing: get data into the system. It runs continuously, ingesting both real-time and historical data.
Real-time mode: Maintains WebSocket connections to market data providers. When a tick arrives, it publishes a MarketTick event to the message bus. Other agents subscribe to the ticks they care about.
Historical mode: On startup or when a new instrument is added, it backfills historical data into the shared state store. This is also what feeds the backtesting engine — same agent, same code, different data source.
class MarketDataAgent:
def __init__(self, bus: MessageBus, state: StateStore, sources: list[DataSource]):
self.bus = bus
self.state = state
self.sources = sources
async def run(self):
for source in self.sources:
source.on_tick(self._handle_tick)
await source.connect()
def _handle_tick(self, tick: MarketTick):
self.state.update_price(tick.symbol, tick.price, tick.volume)
self.bus.publish("market.tick", tick)
# Also check for price anomalies
if self._is_anomaly(tick):
self.bus.publish("market.anomaly", tick)
The anomaly detection is simple but useful: if a price moves more than 3 standard deviations from its 20-period mean in a single tick, it flags it. The Risk Agent listens for these flags.
Agent 2: Analysis Agent
This is where the intelligence lives. The Analysis Agent consumes market data and produces trading signals.
It runs multiple analysis strategies in parallel:
- Technical indicators: RSI, MACD, Bollinger Bands, moving average crossovers
- Pattern recognition: Head-and-shoulders, double bottoms, breakout detection
- Statistical models: Mean reversion Z-scores, momentum factors, volatility regime detection
- ML models: Trained models that output predicted returns or signal probabilities
Each strategy produces a Signal object:
@dataclass
class Signal:
symbol: str
direction: str # "long" | "short" | "flat"
confidence: float # 0.0 to 1.0
strategy_name: str
timestamp: datetime
metadata: dict # strategy-specific context
The agent doesn’t make trading decisions. It just produces signals. The decision happens downstream.
class AnalysisAgent:
def __init__(self, bus: MessageBus, strategies: list[Strategy]):
self.bus = bus
self.strategies = strategies
self.bus.subscribe("market.tick", self._on_tick)
def _on_tick(self, tick: MarketTick):
for strategy in self.strategies:
signal = strategy.evaluate(tick)
if signal:
self.bus.publish("analysis.signal", signal)
The separation matters. When a trade loses money, you can trace back: was the signal wrong, or was the signal right but execution was bad? If signals are mixed with decisions, you lose that diagnostic clarity.
Agent 3: Risk Agent
The Risk Agent is the system’s immune system. It monitors three things: portfolio-level risk, individual position risk, and incoming signal risk.
Portfolio risk: Total exposure, sector concentration, correlation risk, VaR (Value at Risk). If the portfolio is 80% in tech stocks and a new signal says “buy more tech,” the Risk Agent will flag or veto it.
Position risk: Per-position drawdown, holding period, unrealized P&L. If a position has been losing for 15 consecutive days, the Risk Agent can force a close.
Signal risk: Before any signal becomes an order, the Risk Agent checks it against constraints.
class RiskAgent:
def __init__(self, bus: MessageBus, state: StateStore, limits: RiskLimits):
self.bus = bus
self.state = state
self.limits = limits
self.bus.subscribe("analysis.signal", self._on_signal)
def _on_signal(self, signal: Signal):
portfolio = self.state.get_portfolio()
assessment = self._assess(signal, portfolio)
self.bus.publish("risk.assessment", assessment)
def _assess(self, signal: Signal, portfolio: Portfolio) -> RiskAssessment:
checks = []
# Max position size
proposed_size = self._calculate_position_size(signal, portfolio)
if proposed_size > self.limits.max_position_pct * portfolio.total_value:
checks.append(RiskCheck("position_size", "FAIL",
f"Position would be {proposed_size/portfolio.total_value:.1%} of portfolio"))
# Max drawdown
if portfolio.current_drawdown > self.limits.max_drawdown:
checks.append(RiskCheck("drawdown", "FAIL",
f"Portfolio drawdown {portfolio.current_drawdown:.1%} exceeds limit"))
# Correlation check
existing_symbols = [p.symbol for p in portfolio.positions]
avg_corr = self._avg_correlation(signal.symbol, existing_symbols)
if avg_corr > self.limits.max_correlation:
checks.append(RiskCheck("correlation", "WARN",
f"Avg correlation {avg_corr:.2f} with existing positions"))
passed = all(c.status != "FAIL" for c in checks)
return RiskAssessment(signal=signal, approved=passed, checks=checks,
suggested_size=proposed_size if passed else 0)
The Risk Agent has veto power. If it says no, the order doesn’t go through. Period. No override from the Analysis Agent. This is a design choice: it’s better to miss a good trade than to take a catastrophic one.
Agent 4: Execution Agent
Once a signal passes risk checks, the Execution Agent turns it into a real order.
This is more complex than “send order to broker.” The Execution Agent handles:
- Order type selection: Market, limit, stop-loss. Depends on urgency and spread.
- Order splitting: A large order split into smaller chunks to reduce market impact.
- Broker API integration: Authentication, order submission, status polling, fill confirmation.
- Slippage tracking: Records the difference between expected price and actual fill price.
class ExecutionAgent:
def __init__(self, bus: MessageBus, state: StateStore, broker: BrokerAPI):
self.bus = bus
self.state = state
self.broker = broker
self.bus.subscribe("risk.assessment", self._on_assessment)
def _on_assessment(self, assessment: RiskAssessment):
if not assessment.approved:
return
order = self._create_order(assessment)
fill = self.broker.submit_order(order)
self.state.update_position(fill)
self.bus.publish("execution.fill", fill)
# Track slippage
slippage = abs(fill.price - order.expected_price) / order.expected_price
self.bus.publish("execution.slippage", {
"symbol": order.symbol,
"slippage_bps": slippage * 10000,
})
Agent 5: Coordinator
The Coordinator doesn’t analyze data or manage risk. It manages the other agents.
Responsibilities:
- Startup sequence: Ensures Market Data Agent is connected and backfilled before Analysis Agent starts generating signals.
- Health monitoring: Pings each agent. If one stops responding, it restarts it or enters a safe mode (close all positions, stop new orders).
- Decision pipeline: Enforces the signal → risk → execution flow. No shortcuts.
- Configuration: Pushes updated parameters to agents without restarting the system.
class Coordinator:
def __init__(self, agents: dict[str, Agent], bus: MessageBus):
self.agents = agents
self.bus = bus
async def start(self):
# Strict startup order
await self.agents["market_data"].start()
await self._wait_for_backfill()
await self.agents["analysis"].start()
await self.agents["risk"].start()
await self.agents["execution"].start()
# Monitor health
while True:
await asyncio.sleep(30)
for name, agent in self.agents.items():
if not agent.is_healthy():
await self._handle_unhealthy(name, agent)
The Message Bus
Agents communicate through publish/subscribe. No direct calls between agents. This decoupling is what makes the system testable — you can run the Analysis Agent alone with mock market data and verify its signals without a live broker connection.
The event types flow in one direction:
market.tick → analysis.signal → risk.assessment → execution.fill
→ execution.slippage
market.anomaly → risk.alert
Each event is immutable. Once published, it can’t be modified. Agents react to events by publishing new events, never by mutating existing ones.
Shared State Store
The state store is the system’s single source of truth for mutable data: current positions, pending orders, cash balance, and historical fills.
All agents read from it. Only specific agents write to specific sections:
| Section | Written By | Read By |
|---|---|---|
| Prices | Market Data Agent | All |
| Positions | Execution Agent | Risk, Analysis |
| Orders | Execution Agent | Risk, Coordinator |
| Signals | Analysis Agent | Risk |
This prevents conflicts. The Risk Agent never modifies positions directly — it can only approve or reject signals that the Execution Agent then acts on.
Scaling Considerations
Horizontal scaling of Analysis Agents: If you run 20 strategies, you can split them across multiple Analysis Agent instances. Each publishes to the same message bus. The Risk Agent aggregates signals regardless of source.
Vertical scaling of Market Data Agent: Real-time data ingestion is I/O-bound, not CPU-bound. Scaling up means more WebSocket connections, more memory for tick buffers, and faster network. One beefy machine usually beats three small ones for data ingestion.
Database choice: The state store can be in-memory (fastest, not durable), Redis (fast, somewhat durable), or PostgreSQL (slowest, fully durable). For live trading, I use Redis with periodic snapshots to PostgreSQL. For backtesting, in-memory only — no reason to persist intermediate state.
When Is This Worth the Complexity?
Not every portfolio needs five agents and a message bus.
| Situation | Recommendation |
|---|---|
| < $100K, < 10 instruments | Single-script backtester is fine |
| $100K-$1M, 10-50 instruments | Multi-agent adds value in risk management |
| > $1M or > 50 instruments | Multi-agent is practically required |
| Multiple strategies running simultaneously | Multi-agent from day one |
| Regulatory reporting requirements | Multi-agent — you need the audit trail |
The audit trail point is underrated. Every event on the message bus can be logged. When a regulator asks “why did you buy 10,000 shares at 14:32?”, you can replay the exact sequence: this tick arrived, this signal fired, this risk check passed, this order was submitted. Try doing that with a single-file script.
The Honest Assessment
This architecture is battle-tested in institutional trading. It’s not experimental. But it carries real operational overhead. You’re running five processes, a message bus, and a state store. Monitoring, alerting, and deployment are all more complex than a single Python script.
Start with the single-script approach. When you feel the pain — signals getting missed because risk checks are blocking the event loop, or you can’t debug why a trade happened — that’s when the multi-agent architecture earns its keep.