Qlib: Microsoft's AI-Powered Quantitative Trading Framework

Most quant trading frameworks make you build everything from scratch. Data pipelines, feature engineering, model training, backtesting — each piece a separate headache. Qlib takes a different approach. Microsoft open-sourced it in 2020, and it ships a complete pipeline where a single YAML config file can take you from raw market data to backtest results.

I spent two weeks with it. Here’s what I learned.

The 5-Stage Pipeline

Qlib’s architecture is a linear flow:

Raw Data → Factor Engine → ML Models → Portfolio Strategy → Backtesting

Each stage feeds the next. You can swap components at any stage without touching the others. Want to replace LightGBM with a Transformer? Change one line in the config. Want different factors? Point to a different dataset handler.

Stage	What It Does	Key Components
Data Engine	Downloads, stores, caches market data	`qlib.init()`, local data cache
Factor Engine	Computes features from raw OHLCV	Alpha158, Alpha360, custom expressions
ML Models	Trains on factor data, predicts returns	27 built-in models
Strategy	Converts predictions to trading signals	TopK, WeightStrategy
Backtesting	Simulates trades, computes metrics	Backtest module, risk analysis

Alpha158: Domain Expert Knowledge in a Box

This is the part that saves you months of work. Alpha158 is a pre-built dataset handler containing 158 technical factors. These aren’t random — they come from quantitative research literature and domain experts.

The factors fall into categories:

Price-based: 20-day moving average, MACD, Bollinger Bands, price momentum over various windows
Volatility: Rolling standard deviation, ATR, high-low spread ratios
Volume: Volume-weighted averages, OBV (on-balance volume), volume momentum
Cross-sectional: Rank-based features, relative strength indicators

You don’t configure these individually. Point your YAML at Alpha158 and all 158 factors get computed automatically.

data_handler_config: &data_handler_config
  class: Alpha158
  module_path: qlib.contrib.data.handler
  kwargs:
    start_time: "2008-01-01"
    end_time: "2020-08-01"
    instruments: csi300

Alpha360: When You Want Raw Features

Alpha360 takes a different philosophy. Instead of hand-crafted indicators, it gives you 360 raw time-series features — basically rolling windows of OHLCV data at multiple timeframes. The idea: let the ML model figure out what patterns matter.

Alpha360 uses lazy processing. Features are computed on-demand, not pre-materialized. This matters when you’re experimenting with different training windows, because you’re not recomputing everything each time.

My take: start with Alpha158 for interpretability. Switch to Alpha360 when you want the model to discover patterns that humans might miss. In practice, LightGBM on Alpha158 is a very tough baseline to beat.

27 Built-In Models

Qlib ships models across four families:

Model	Type	Best For
Linear	Baseline	Sanity checks, feature importance
LightGBM	Gradient boosting	Fast training, interpretable, strong baseline
CatBoost	Gradient boosting	Categorical features, robust defaults
XGBoost	Gradient boosting	When you need fine-grained tuning
LSTM	Recurrent neural net	Sequential patterns, regime detection
GRU	Recurrent neural net	Lighter alternative to LSTM
Transformer	Attention-based	Long-range dependencies
ALSTM	Attention + LSTM	Hybrid sequential modeling
TCN	Temporal convolution	Parallel training, fixed receptive field
TabNet	Attention + tabular	Feature selection built-in

LightGBM is the workhorse. It trains in minutes, produces readable feature importances, and consistently ranks near the top of Qlib’s own benchmarks. I’d start every experiment there before trying anything fancier.

The neural models (LSTM, Transformer) need significantly more data and tuning to outperform gradient boosting on daily frequency data. On minute-bar or tick data, they start to shine.

Getting Started: Environment to Backtest in 10 Minutes

Step 1: Install

pip install pyqlib
# For GPU-accelerated models:
pip install pyqlib[torch]

Step 2: Download Data

import qlib
from qlib.config import REG_CN  # or REG_US for US market

provider_uri = "~/.qlib/qlib_data/cn_data"
qlib.init(provider_uri=provider_uri, region=REG_CN)

First run downloads historical data to your local cache. US market data covers S&P 500 constituents. Chinese market covers CSI 300 and CSI 500.

Step 3: Run a Backtest

Create a YAML config (or use one of the 27 pre-built examples):

qlib_init:
  provider_uri: "~/.qlib/qlib_data/cn_data"
  region: cn

market: &market csi300
benchmark: &benchmark SH000300

data_handler_config: &data_handler_config
  class: Alpha158
  module_path: qlib.contrib.data.handler
  kwargs:
    start_time: "2008-01-01"
    end_time: "2020-08-01"
    fit_start_time: "2008-01-01"
    fit_end_time: "2014-12-31"
    instruments: *market

task:
  model:
    class: LGBModel
    module_path: qlib.contrib.model.gbdt
    kwargs:
      loss: mse
      num_leaves: 128
      num_boost_round: 1000
      early_stopping_rounds: 50
  dataset:
    class: DatasetH
    module_path: qlib.data.dataset
    kwargs:
      handler: *data_handler_config
      segments:
        train: ["2008-01-01", "2014-12-31"]
        valid: ["2015-01-01", "2016-12-31"]
        test: ["2017-01-01", "2020-08-01"]
  record:
    - class: SignalRecord
      module_path: qlib.workflow.record_temp
    - class: SigAnaRecord
      module_path: qlib.workflow.record_temp

Run it:

qrun config.yaml

That’s it. One command. Qlib handles data loading, feature computation, train/valid/test splitting, model training, prediction generation, and signal analysis.

Reading the Results

After a backtest, you care about four numbers:

Metric	What It Means	Good Range
IC (Information Coefficient)	Correlation between predicted and actual returns	> 0.03
ICIR (IC Information Ratio)	IC stability (IC mean / IC std)	> 0.3
Annual Return	Strategy return minus benchmark	> 10%
Max Drawdown	Worst peak-to-trough decline	< 20%

IC is the most informative single metric. An IC of 0.05 is solid for daily predictions. An IC of 0.10 is exceptional and probably suspicious — check for lookahead bias.

Sharpe ratio matters too, but it’s meaningless without context. A Sharpe of 2.0 in a backtest often becomes 0.4-0.7 live.

The Reality Check

Here’s where most Qlib tutorials stop. I won’t.

Public factors are overcrowded. Alpha158 is open-source. Thousands of quants use the same 158 factors. When everyone trades the same signals, the alpha erodes. These factors still work as a baseline and for learning — but don’t expect to deploy Alpha158 to production and print money.

Backtest results lie. Not intentionally, but systematically. Slippage, market impact, trading costs, and execution delays all eat returns. My rule of thumb: multiply your backtest annual return by 0.2 to 0.33 for a realistic live estimate. A 30% backtest return might become 6-10% live.

Factor decay is real. A factor that worked from 2010-2020 might be dead by 2022. Markets adapt. Other participants discover the same signal and arbitrage it away. You need to monitor factor IC over rolling windows and retire factors that flatline.

Data quality matters more than model choice. I’ve seen people spend weeks tuning Transformer hyperparameters when their data had survivorship bias. Clean data with LightGBM beats dirty data with any model.

Where Claude Code Fits

Qlib’s YAML config system is perfect for AI-assisted experimentation. You can ask Claude Code to:

Generate config variations (sweep over num_leaves, learning_rate, training windows)
Parse backtest results and flag anomalies (suspiciously high IC, drawdowns exceeding thresholds)
Write custom factor expressions and plug them into the pipeline
Automate the data-download-train-evaluate loop across multiple markets

The config-driven architecture means Claude Code doesn’t need to understand Qlib’s internals — it just needs to produce valid YAML. That’s a much easier problem.

Bottom Line

Qlib is the fastest path from “I want to try quant trading” to “I have backtest results.” The 158 pre-built factors, 27 models, and single-YAML workflow remove weeks of boilerplate. Start with LightGBM on Alpha158. Graduate to custom factors and neural models when you’ve exhausted what the defaults can teach you. And always, always discount your backtest results before getting excited.