EP10 advanced

Qlib: Microsoft's AI-Powered Quantitative Trading Framework

A 5-stage quant research pipeline with 158 pre-built factors, 27 models, and one YAML file to run a full backtest.

Most quant trading frameworks make you build everything from scratch. Data pipelines, feature engineering, model training, backtesting — each piece a separate headache. Qlib takes a different approach. Microsoft open-sourced it in 2020, and it ships a complete pipeline where a single YAML config file can take you from raw market data to backtest results.

I spent two weeks with it. Here’s what I learned.

The 5-Stage Pipeline

Qlib’s architecture is a linear flow:

Raw Data → Factor Engine → ML Models → Portfolio Strategy → Backtesting

Each stage feeds the next. You can swap components at any stage without touching the others. Want to replace LightGBM with a Transformer? Change one line in the config. Want different factors? Point to a different dataset handler.

StageWhat It DoesKey Components
Data EngineDownloads, stores, caches market dataqlib.init(), local data cache
Factor EngineComputes features from raw OHLCVAlpha158, Alpha360, custom expressions
ML ModelsTrains on factor data, predicts returns27 built-in models
StrategyConverts predictions to trading signalsTopK, WeightStrategy
BacktestingSimulates trades, computes metricsBacktest module, risk analysis

Alpha158: Domain Expert Knowledge in a Box

This is the part that saves you months of work. Alpha158 is a pre-built dataset handler containing 158 technical factors. These aren’t random — they come from quantitative research literature and domain experts.

The factors fall into categories:

  • Price-based: 20-day moving average, MACD, Bollinger Bands, price momentum over various windows
  • Volatility: Rolling standard deviation, ATR, high-low spread ratios
  • Volume: Volume-weighted averages, OBV (on-balance volume), volume momentum
  • Cross-sectional: Rank-based features, relative strength indicators

You don’t configure these individually. Point your YAML at Alpha158 and all 158 factors get computed automatically.

data_handler_config: &data_handler_config
  class: Alpha158
  module_path: qlib.contrib.data.handler
  kwargs:
    start_time: "2008-01-01"
    end_time: "2020-08-01"
    instruments: csi300

Alpha360: When You Want Raw Features

Alpha360 takes a different philosophy. Instead of hand-crafted indicators, it gives you 360 raw time-series features — basically rolling windows of OHLCV data at multiple timeframes. The idea: let the ML model figure out what patterns matter.

Alpha360 uses lazy processing. Features are computed on-demand, not pre-materialized. This matters when you’re experimenting with different training windows, because you’re not recomputing everything each time.

My take: start with Alpha158 for interpretability. Switch to Alpha360 when you want the model to discover patterns that humans might miss. In practice, LightGBM on Alpha158 is a very tough baseline to beat.

27 Built-In Models

Qlib ships models across four families:

ModelTypeBest For
LinearBaselineSanity checks, feature importance
LightGBMGradient boostingFast training, interpretable, strong baseline
CatBoostGradient boostingCategorical features, robust defaults
XGBoostGradient boostingWhen you need fine-grained tuning
LSTMRecurrent neural netSequential patterns, regime detection
GRURecurrent neural netLighter alternative to LSTM
TransformerAttention-basedLong-range dependencies
ALSTMAttention + LSTMHybrid sequential modeling
TCNTemporal convolutionParallel training, fixed receptive field
TabNetAttention + tabularFeature selection built-in

LightGBM is the workhorse. It trains in minutes, produces readable feature importances, and consistently ranks near the top of Qlib’s own benchmarks. I’d start every experiment there before trying anything fancier.

The neural models (LSTM, Transformer) need significantly more data and tuning to outperform gradient boosting on daily frequency data. On minute-bar or tick data, they start to shine.

Getting Started: Environment to Backtest in 10 Minutes

Step 1: Install

pip install pyqlib
# For GPU-accelerated models:
pip install pyqlib[torch]

Step 2: Download Data

import qlib
from qlib.config import REG_CN  # or REG_US for US market

provider_uri = "~/.qlib/qlib_data/cn_data"
qlib.init(provider_uri=provider_uri, region=REG_CN)

First run downloads historical data to your local cache. US market data covers S&P 500 constituents. Chinese market covers CSI 300 and CSI 500.

Step 3: Run a Backtest

Create a YAML config (or use one of the 27 pre-built examples):

qlib_init:
  provider_uri: "~/.qlib/qlib_data/cn_data"
  region: cn

market: &market csi300
benchmark: &benchmark SH000300

data_handler_config: &data_handler_config
  class: Alpha158
  module_path: qlib.contrib.data.handler
  kwargs:
    start_time: "2008-01-01"
    end_time: "2020-08-01"
    fit_start_time: "2008-01-01"
    fit_end_time: "2014-12-31"
    instruments: *market

task:
  model:
    class: LGBModel
    module_path: qlib.contrib.model.gbdt
    kwargs:
      loss: mse
      num_leaves: 128
      num_boost_round: 1000
      early_stopping_rounds: 50
  dataset:
    class: DatasetH
    module_path: qlib.data.dataset
    kwargs:
      handler: *data_handler_config
      segments:
        train: ["2008-01-01", "2014-12-31"]
        valid: ["2015-01-01", "2016-12-31"]
        test: ["2017-01-01", "2020-08-01"]
  record:
    - class: SignalRecord
      module_path: qlib.workflow.record_temp
    - class: SigAnaRecord
      module_path: qlib.workflow.record_temp

Run it:

qrun config.yaml

That’s it. One command. Qlib handles data loading, feature computation, train/valid/test splitting, model training, prediction generation, and signal analysis.

Reading the Results

After a backtest, you care about four numbers:

MetricWhat It MeansGood Range
IC (Information Coefficient)Correlation between predicted and actual returns> 0.03
ICIR (IC Information Ratio)IC stability (IC mean / IC std)> 0.3
Annual ReturnStrategy return minus benchmark> 10%
Max DrawdownWorst peak-to-trough decline< 20%

IC is the most informative single metric. An IC of 0.05 is solid for daily predictions. An IC of 0.10 is exceptional and probably suspicious — check for lookahead bias.

Sharpe ratio matters too, but it’s meaningless without context. A Sharpe of 2.0 in a backtest often becomes 0.4-0.7 live.

The Reality Check

Here’s where most Qlib tutorials stop. I won’t.

Public factors are overcrowded. Alpha158 is open-source. Thousands of quants use the same 158 factors. When everyone trades the same signals, the alpha erodes. These factors still work as a baseline and for learning — but don’t expect to deploy Alpha158 to production and print money.

Backtest results lie. Not intentionally, but systematically. Slippage, market impact, trading costs, and execution delays all eat returns. My rule of thumb: multiply your backtest annual return by 0.2 to 0.33 for a realistic live estimate. A 30% backtest return might become 6-10% live.

Factor decay is real. A factor that worked from 2010-2020 might be dead by 2022. Markets adapt. Other participants discover the same signal and arbitrage it away. You need to monitor factor IC over rolling windows and retire factors that flatline.

Data quality matters more than model choice. I’ve seen people spend weeks tuning Transformer hyperparameters when their data had survivorship bias. Clean data with LightGBM beats dirty data with any model.

Where Claude Code Fits

Qlib’s YAML config system is perfect for AI-assisted experimentation. You can ask Claude Code to:

  1. Generate config variations (sweep over num_leaves, learning_rate, training windows)
  2. Parse backtest results and flag anomalies (suspiciously high IC, drawdowns exceeding thresholds)
  3. Write custom factor expressions and plug them into the pipeline
  4. Automate the data-download-train-evaluate loop across multiple markets

The config-driven architecture means Claude Code doesn’t need to understand Qlib’s internals — it just needs to produce valid YAML. That’s a much easier problem.

Bottom Line

Qlib is the fastest path from “I want to try quant trading” to “I have backtest results.” The 158 pre-built factors, 27 models, and single-YAML workflow remove weeks of boilerplate. Start with LightGBM on Alpha158. Graduate to custom factors and neural models when you’ve exhausted what the defaults can teach you. And always, always discount your backtest results before getting excited.