Advanced Factor Engineering with Qlib

Everyone starts with Alpha158. It ships with Qlib, contains 158 pre-built factors, and generates impressive backtest numbers within an hour of setup. That’s the trap. If everyone uses the same 158 factors, nobody has an edge. You’re competing against every other quant who ran the same YAML file.

This tutorial covers three things: what’s inside Alpha158 and why it’s a starting point (not a finish line), the four model paradigms Qlib supports and when to use each, and the brutal math of turning backtests into live returns.

What’s Inside Alpha158

Alpha158 organizes its 158 factors into seven categories. Understanding these categories matters because you’ll eventually write custom factors that extend or replace them.

Momentum factors — ROCP (rate of change percentage) and ROC (rate of change) over various lookback windows (5, 10, 20, 60 days). These measure how fast price is moving and in which direction.

Volatility factors — STD (standard deviation of returns) and VWAP deviation (how far price sits from volume-weighted average price). High volatility signals uncertainty. VWAP deviation tells you whether the current price is cheap or expensive relative to where most volume traded.

Volume factors — VSUMP and VSUMD (sum of volume on up-days vs down-days). These split volume into buying pressure and selling pressure. The ratio between them reveals who’s in control.

Price pattern factors — KMID (midpoint of the candlestick body relative to range) and KLEN (length of the candlestick body relative to range). Small KLEN with large range = indecision. Large KMID skew = strong directional conviction within the bar.

Moving average factors — Ratios between SMA and EMA at different periods. When the 5-day EMA crosses above the 20-day SMA, that’s a momentum signal. Alpha158 encodes dozens of these cross-period ratios.

Correlation factors — Rolling correlations between price and volume, between returns at different lags. A stock whose returns correlate with its own 5-day lagged returns has serial momentum.

Regression and rank factors — Rolling linear regression slopes (is the trend accelerating?) and percentile ranks (where does today’s value sit in the last N days?).

These are all public. Published in a GitHub repo with 15K+ stars. Every quant fund with a Python developer has tested them. That doesn’t make them useless — it makes them baseline. You need them the same way you need a chess opening book. But you won’t win tournaments with opening prep alone.

Four Model Paradigms

Qlib supports multiple ML architectures out of the box. Each sees the same factor data differently.

LightGBM — Decision Trees

The default choice, and honestly the best starting point. LightGBM builds gradient-boosted decision trees that split factor values into buckets. “If 5-day momentum > 0.03 AND volume ratio > 1.2, predict up.” Training takes minutes, not hours. Feature importance visualization tells you exactly which factors drive predictions.

I use LightGBM for rapid iteration. Test a new factor idea? Add it to the feature config, retrain, check if it shows up in the top 20 features. If it doesn’t, throw it out. This feedback loop is fast enough to test 10 factor ideas in an afternoon.

LSTM — Sequential Memory

Long Short-Term Memory networks process factor data as time sequences. Where LightGBM sees each day independently, LSTM remembers what happened 5, 10, 30 days ago and lets that memory influence today’s prediction.

This matters for regime detection. A stock that’s been in a steady uptrend for 60 days behaves differently from one that just reversed after 60 days of decline — even if today’s factor values are identical. LSTM captures that history. The cost: training time jumps from minutes to hours, and you need GPU access to make it practical.

Transformer — Attention Mechanism

The same architecture behind language models. Transformers don’t process time sequentially like LSTM. Instead, they use attention to find which historical days matter most for today’s prediction. Maybe day -3 and day -47 both contain signals while everything in between is noise. Transformers can find that pattern. LSTM would struggle because the signal at day -47 fades through 44 intermediate steps.

Transformers need the most data and compute. They also tend to overfit on short histories. I’d only use them when you have 10+ years of daily data and a clear hypothesis about long-range dependencies.

Linear Model — Weighted Sum

A simple weighted combination of all factors. Factor1 * weight1 + Factor2 * weight2 = prediction. No hidden layers, no tree splits, no memory.

Here’s the uncomfortable truth: linear models are surprisingly competitive. In Qlib’s benchmarks, linear models often land within 10-15% of LightGBM’s performance. Sometimes they beat LSTM. Why? Because most alpha in public factors is linear. The complex models are fitting noise, not signal. When your factors are good, a simple model extracts most of the value. When your factors are bad, no model saves you.

One Config, Full Pipeline

Every model runs through the same YAML config + qrun command:

# config.yaml
data_handler_config:
  class: Alpha158
  module_path: qlib.contrib.data.handler

model:
  class: LGBModel
  module_path: qlib.contrib.model.gbdt
  kwargs:
    loss: mse
    num_leaves: 128
    learning_rate: 0.05

backtest:
  start_time: "2020-01-01"
  end_time: "2025-12-31"
  benchmark: SH000300
  account: 1000000

qrun config.yaml

That’s it. Data loading, feature engineering, model training, prediction, backtesting, and performance reporting — all from one command. Swap LGBModel for LSTMModel or TransformerModel to change paradigms. The rest of the config stays identical.

The Backtest Reality Check

Let me share numbers that most tutorials skip.

Your Alpha158 + LightGBM backtest shows 46.2% annualized return. Exciting, right? Now apply reality:

Slippage and fees: Market impact, commissions, and bid-ask spread. For mid-cap Chinese stocks, budget 15-30 basis points per round trip. For US large-caps, 5-10 bps. This alone cuts 46.2% down to ~38%.

Factor crowding: Everyone running Alpha158 submits similar orders. When your model says “buy,” so do hundreds of other models trained on the same factors. The price moves before you get filled. Crowding eats another 30-50% of the alpha.

The multiplier: Multiply your backtest return by 0.2 to 0.33 to estimate live performance. That 46.2% becomes 9% to 15%. Still positive, but a different conversation than the backtest suggested.

This isn’t pessimism. It’s calibration. The gap between backtest and live is where most retail quants lose money — they size positions for the backtest number and get the live number.

Factor Decay

Most factors lose predictive power within 6 to 12 months. The mechanism is simple: other quants discover the same pattern, trade on it, and arbitrage it away. Momentum factors from the 1990s that returned 15% annually now return 2-3%.

Alpha158’s factors are already well-known. Their alpha has been partially arbitraged. You’ll still see signal — factor crowding doesn’t eliminate alpha entirely — but the half-life is short.

This means factor research isn’t a one-time project. It’s a continuous process. You need a pipeline that generates, tests, and retires factors on a rolling basis.

Writing Custom Factors

Qlib’s expression engine lets you define factors as formulas:

from qlib.contrib.data.handler import Alpha158

class MyFactors(Alpha158):
    def get_feature_config(self):
        # Start with all 158 base factors
        fields, names = super().get_feature_config()

        # Add custom factor: 5-day volume surge relative to 20-day average
        fields.append("Div($volume, Mean($volume, 20))")
        names.append("volume_surge_5_20")

        # Add custom factor: overnight gap (open vs previous close)
        fields.append("Div($open, Ref($close, 1)) - 1")
        names.append("overnight_gap")

        # Add custom factor: intraday range relative to ATR
        fields.append("Div(Sub($high, $low), Mean(Sub($high, $low), 14))")
        names.append("range_vs_atr14")

        return fields, names

The DSL supports Ref (lookback), Mean, Std, Max, Min, Rank, Div, Sub, Add, Mul, and more. You chain these to express nearly any technical factor without writing raw numpy code.

Where the Real Edge Lives

Public factors are the floor, not the ceiling. The quants who consistently make money combine Alpha158 with proprietary signals:

Alternative data — satellite imagery of parking lots, credit card transaction aggregates, app download trends. This data costs $10K-$100K/year per dataset, which is precisely why it retains alpha longer than free factors.

NLP signals — sentiment from earnings call transcripts, SEC filing language changes, patent filings. Claude Code can actually help here. Feed it transcripts, have it extract sentiment shifts, and convert those into daily factor values.

Cross-asset signals — bond yield movements predicting equity sector rotation. Currency flows predicting commodity stocks. These require domain knowledge that most ML pipelines don’t encode.

The pattern is clear: the harder the data is to acquire and process, the longer its alpha persists. Alpha158 is the easiest data to use. That’s why its edge is the smallest.

Start with Alpha158 to build your pipeline. Prove you can run the full loop from data to backtest to analysis. Then start replacing public factors with proprietary ones, one at a time, measuring the incremental improvement each adds. That’s factor engineering. Not a one-time build, but a continuous search for signal that others haven’t found yet.