
The strategy uses Random Forest to predict abnormal stock returns around earnings announcements based on historical data. Positions are taken when predicted returns exceed a threshold, going long for positive and short for negative predictions.
ASSET CLASS: stocks | REGION: United States | FREQUENCY:
Daily | MARKET: equities | KEYWORD: Machine Learning, Financial Statement
I. STRATEGY IN A NUTSHELL
The strategy invests in U.S. stocks from Compustat North America, with missing financial data imputed via SoftImpute. A Random Forest model with 200 trees is trained using a sliding window of five quarters—four for inputs and one for prediction—to forecast 30-day abnormal returns around earnings announcements. The model minimizes mean squared error and trades only when the absolute predicted abnormal return exceeds 0.1, taking long positions for positive predictions and short positions for negative ones.
II. ECONOMIC RATIONALE
Random Forest models outperform alternative approaches in forecasting stock reactions to earnings announcements, offering strong predictive accuracy while mitigating overfitting through ensemble averaging. Using a prediction threshold (ε = 0.1) balances trade frequency and liquidity, enhancing profitability even when micro- and small-cap stocks are excluded. The most influential predictors are accounting-based variables, especially those related to free cash flow, consistent with established evidence that cash flow fundamentals are powerful return predictors—providing the economic foundation for the strategy’s success.
III. SOURCE PAPER
Machine Learning-Based Financial Statement Analysis [Click to Open PDF]
Amir Amel-Zadeh, University of Oxford – Said Business School, Oxford-Man Institute of Quantitative Finance; Jan-Peter Calliess, University of Oxford – Oxford-Man Institute of Quantitative Finance; Daniel Kaiser, University of Oxford – Oxford-Man Institute of Quantitative Finance; Stephen Roberts, University of Oxford – Oxford-Man Institute of Quantitative Finance
<Abstract>
This paper explores the application of machine learning methods to financial statement analysis. We compare a range of models in the machine learning repertoire in their ability to predict the sign and magnitude of abnormal stock returns around earnings announcements based on past financial statement data alone. Random Forests produce the most accurate forecasts and the highest abnormal returns. (Nonlinear) neural network-based models perform relatively better for predictions of extreme market reactions, while the linear methods are relatively better in predicting moderate market reactions. Long-short portfolios based on model predictions generate sizable abnormal returns, which seem to decay over time. Abnormal returns are robust to various risk factors and load in expected ways on size, value and accruals. Analysing the underlying economic drivers of the performance of the Random Forests, we find that the models select as most important predictors financial variables required to forecast free cash flows and firm characteristics that are known cross-sectional predictors of stock returns.


IV. BACKTEST PERFORMANCE
| Annualised Return | 47.47% |
| Volatility | 18% |
| Beta | N/A |
| Sharpe Ratio | 2.64 |
| Sortino Ratio | N/A |
| Maximum Drawdown | N/A |
| Win Rate | N/A |