The strategy involves using LASSO regression to predict commodity futures returns based on past data, going long on the top five predicted commodities and short on the bottom five, with monthly rebalancing.

I. STRATEGY IN A NUTSHELL

The strategy trades 27 actively traded commodity futures across grains, softs, livestock, energy, and metals. Monthly excess returns are computed, and a 60-month rolling regression with lagged returns identifies predictors. LASSO selects significant predictors, and OLS estimates parameters. Based on predicted returns, the investor goes long on the top five commodities and short on the bottom five. The portfolio is rebalanced monthly.

II. ECONOMIC RATIONALE

Lead-lag relationships arise from economic links (e.g., lean hogs predicting wheat/soy returns, gasoline-corn via ethanol production) and financialization effects from index trading, ETFs, and ETNs. LASSO enhances predictive accuracy, reducing overfitting and enabling profitable trades.

III. SOURCE PAPER

 The Serial Dependence of the Commodity Futures Returns: A Machine Learning Approach [Click to Open PDF]

Han, Yufeng; Kong, Lingfei — University of North Carolina at Charlotte; Washington University in St. Louis.

<Abstract>

We are grateful to Yulin Feng, Chulwoo Han, Liyan Han, Gene Lai, Christopher Kirby, Tao-Hsien Dolly
King, David Mauer, Mihail Velikov, Kiplan S. Womack, Yilei Zhang, Chengli Zheng, Xun Zhong, Guofu Zhou,
and seminar participants at the 2020 Symposium on Commodity Market Development and Risk Management,
2020 Financial Management Association Annual Meeting, 2021 Miami Herbert Business School Winter
Research Conference on Machine Learning and Business, 2021 International Futures and Derivatives
Conference, and seminar participants at the University of North Carolina at Charlotte for their helpful comments and suggestions.

IV. BACKTEST PERFORMANCE

Annualised Return15.15%
Volatility16.24%
Beta0.036
Sharpe Ratio0.93
Sortino Ratio-0.076
Maximum Drawdown-8.28%
Win Rate51%

V. FULL PYTHON CODE

from AlgorithmImports import *
# but it is a normal regression with added penalization term). The shrinkage parameter lambda for each regression is found by minimizing the AICc information criterion (equation 3.12). AICc
# is corrected AIC for small-sample bias. After the LASSO selects the important predictors, OLS is used to estimate the regression, using only the selected predictors to overcome the
# underfitting of LASSO parameter estimates. Prediction of the returns in the month t+1 is based on OLS estimated parameters of month t (using returns of month t and t-1; naturally, the OLS 
# regression to obtain parameters at time t, uses returns of months t-1 and t-2). If LASSO does not select any predictor for some commodity, the commodity is omitted. Based on the predicted
# returns, long top five commodities and short bottom five commodities. The portfolio is rebalanced monthly.
#
# QC implementation:
#   - Shrinkage parameter is found by cross-validation instead of minimizing AIC, although AIC minimalization code is also included.
import statsmodels.api as sm
from sklearn import linear_model
class TheSerialDependenceoftheCommodityFuturesReturns(QCAlgorithm):
    def Initialize(self):
        self.SetStartDate(2000, 1, 1)
        self.SetCash(100000)
        # the full list - table 1
        self.symbols = [
                        "CME_BO1",  # Soybean Oil Futures, Continuous Contract
                        "CME_C1",   # Corn Futures, Continuous Contract
                        "CME_KW2",  # Wheat Kansas, Continuous Contract
                        "CME_O1",   # Oats Futures, Continuous Contract
                        "CME_RR1",  # Rough Rice Futures, Continuous Contract
                        "CME_S1",   # Soybean Futures, Continuous Contract
                        "CME_SM1",  # Soybean Meal Futures, Continuous Contract
                        "CME_W1",   # Wheat Futures, Continuous Contract
                        "ICE_CC1",  # Cocoa Futures, Continuous Contract 
                        "ICE_CT1",  # Cotton No. 2 Futures, Continuous Contract
                        "CME_DA1",  # Class III Milk Futures
                        "ICE_OJ1",  # Orange Juice Futures, Continuous Contract
                        "ICE_KC1",  # Coffee C Futures, Continuous Contract
                        "CME_LB1",  # Random Length Lumber Futures, Continuous Contract
                        "ICE_SB1",  # Sugar No. 11 Futures, Continuous Contract
                        "CME_CL1",  # Crude Oil Futures, Continuous Contract
                        "ICE_O1",   # Heating Oil Futures, Continuous Contract
                        "CME_NG1",  # Natural Gas (Henry Hub) Physical Futures, Continuous Contract
                        "CME_RB2",  # Gasoline Futures, Continuous Contract
                        "CME_FC1",  # Feeder Cattle Futures, Continuous Contract
                        "CME_LC1",  # Live Cattle Futures, Continuous Contract
                        "CME_LN1",  # Lean Hog Futures, Continuous Contract
                        "CME_GC1",  # Gold Futures, Continuous Contract
                        "CME_HG1",  # Copper Futures, Continuous Contract
                        "CME_PA1",  # Palladium Futures, Continuous Contract 
                        "CME_PL1",  # Platinum Futures, Continuous Contract
                        "CME_SI1",  # Silver Futures, Continuous Contract
                        ]
        
        self.daily_period = 21
        self.period = 60 + 1
        # self.monthly_period = self.period * self.daily_period
        
        self.SetWarmUp(self.period * self.daily_period, Resolution.Daily)
        
        self.perf_data = {}
        self.roc = {}
        
        for symbol in self.symbols:
            data = self.AddData(QuantpediaFutures, symbol, Resolution.Daily)
            data.SetFeeModel(CustomFeeModel())
            data.SetLeverage(5)
            
            self.roc[symbol] = self.ROC(symbol, self.daily_period, Resolution.Daily)
            self.perf_data[symbol] = RollingWindow[float](self.period)
        self.recent_month = -1
    def OnData(self, data):
        if self.IsWarmingUp:
            return
        # rebalance once a month
        if self.recent_month == self.Time.month:
            return
        self.recent_month = self.Time.month
        
        dependent_symbols = []  # dependent variables to count with
        self.x = []
        
        # store monthly performance and construct indepedend variable
        for symbol in self.symbols:
            if self.perf_data[symbol].IsReady and self.Securities[symbol].GetLastData() and self.Time.date() < QuantpediaFutures.get_last_update_date()[symbol]:
                # create 1-month and 2-months lagged returns variables
                perf_data = [x for x in self.perf_data[symbol]]
                x_s_t1 = perf_data[:-1]
                x_s_t2 = perf_data[1:]
                
                self.x.append(x_s_t1)
                self.x.append(x_s_t2)
                
                dependent_symbols.append(symbol)
            
            # roc data is ready and price data is still comming in
            if self.roc[symbol].IsReady:
                self.perf_data[symbol].Add(self.roc[symbol].Current.Value)
        
        predicted_return = {}
        
        # regression estimates
        if len(self.x) != 0:
            for symbol in dependent_symbols:
                y = np.array([x for x in self.perf_data[symbol]][:self.period-1])
                x = np.array(self.x)
                # model_result = self.multiple_linear_regression(self.x, y)
                
                # lasso with cross-validation
                lasso_cv_model = linear_model.LassoCV()
                lasso_cv_model.fit(x.T, y)
                lasso_best = linear_model.Lasso(alpha=lasso_cv_model.alpha_)
                lasso_best.fit(x.T, y)
                
                # find shrinkage parameter alpha by minimizing aic parameter
                # lasso_aic_model = linear_model.LassoLarsIC(criterion='aic')
                # lasso_aic_model.fit(x.T, y)
                # lasso_best = linear_model.Lasso(alpha=lasso_aic_model.alpha_)
                # lasso_best.fit(x.T, y)
                
                # select only important predictors
                relevant_predictor_indicies = [i for i, coef in enumerate(lasso_best.coef_) if coef not in [0, -0]]
                if len(relevant_predictor_indicies) != 0:
                    # perform OLS regression
                    ols_x = np.array([x[i][1:] for i in relevant_predictor_indicies])
                    ols_model = self.multiple_linear_regression(ols_x, y[:-1])
                    ols_x_pred = [1] + [x[i][0] for i in relevant_predictor_indicies]
                    y_pred = ols_model.predict(ols_x_pred)[0]
                    predicted_return[symbol] = y_pred
        
        long = []
        short = []
        count = 5
        if len(predicted_return) >= count*2:
            sorted_by_pred_return = sorted(predicted_return.items(), key=lambda x: x[1], reverse=True)
            long = [x[0] for x in sorted_by_pred_return[:count]]
            short = [x[0] for x in sorted_by_pred_return[-count:]]
        # trade execution
        targets: List[PortfolioTarget] = []
        for i, portfolio in enumerate([long, short]):
            for symbol in portfolio:
                if symbol in data and data[symbol]:
                    targets.append(PortfolioTarget(symbol, ((-1) ** i) / len(portfolio)))
        
        self.SetHoldings(targets, True)
    def multiple_linear_regression(self, x, y):
        x = np.array(x).T
        x = sm.add_constant(x)
        result = sm.OLS(endog=y, exog=x).fit()
        return result
# Quantpedia data.
# NOTE: IMPORTANT: Data order must be ascending (datewise)
class QuantpediaFutures(PythonData):
    _last_update_date:Dict[Symbol, datetime.date] = {}
    @staticmethod
    def get_last_update_date() -> Dict[Symbol, datetime.date]:
       return QuantpediaFutures._last_update_date
    def GetSource(self, config, date, isLiveMode):
        return SubscriptionDataSource("data.quantpedia.com/backtesting_data/futures/{0}.csv".format(config.Symbol.Value), SubscriptionTransportMedium.RemoteFile, FileFormat.Csv)
    def Reader(self, config, line, date, isLiveMode):
        data = QuantpediaFutures()
        data.Symbol = config.Symbol
        
        if not line[0].isdigit(): return None
        split = line.split(';')
        
        data.Time = datetime.strptime(split[0], "%d.%m.%Y") + timedelta(days=1)
        data['back_adjusted'] = float(split[1])
        data['spliced'] = float(split[2])
        data.Value = float(split[1])
        if config.Symbol.Value not in QuantpediaFutures._last_update_date:
            QuantpediaFutures._last_update_date[config.Symbol.Value] = datetime(1,1,1).date()
        if data.Time.date() > QuantpediaFutures._last_update_date[config.Symbol.Value]:
            QuantpediaFutures._last_update_date[config.Symbol.Value] = data.Time.date()
        return data
# Custom fee model.
class CustomFeeModel(FeeModel):
    def GetOrderFee(self, parameters):
        fee = parameters.Security.Price * parameters.Order.AbsoluteQuantity * 0.00005
        return OrderFee(CashAmount(fee, "USD"))
class SymbolData():
    def __init__(self, monthly_period: int):
        self.days_to_liquidate = days_to_liquidate
        self.long_flag = long_flag

Leave a Reply

Discover more from Quant Buffet

Subscribe now to keep reading and get access to the full archive.

Continue reading