投资宇宙包括在上海和深圳股市上市的A股股票,数据来自CSMAR数据库。首先,通过时间序列回归构建过度自信的代理变量,计算异常交易量(ATV)。每个月,按中位市值将股票分为小市值和大市值,再根据前一个月的中位ATV进行分组。投资者做多大市值和小市值的过度自信组合,做空大市值和小市值的低自信组合,策略按市值加权,每月再平衡。

策略概述

投资宇宙包括在上海和深圳股市上市的A股股票。数据来自中国股票市场与会计研究(CSMAR)数据库。首先,使用月度数据通过时间序列回归构建过度自信的代理变量(论文未指定使用的观测数量,但我们假设为一年的时间段,作者在研究的其他部分也使用了相同的时间段)。对于每只股票和每个月,交易量由常数项和市场中所有股票的总交易量解释。估计回归并计算残差,该残差为异常交易量(ATV)的代理变量。每个月,首先根据t-1月的中位市值将股票分为两组——小市值和大市值。然后根据前一个月的中位ATV将股票分为两组。做多大市值过度自信组合和小市值过度自信组合,做空大市值低自信组合和小市值低自信组合。该策略按市值加权,并每月再平衡。

策略合理性

投资者往往高估自己的能力、信息的重要性或交易能力,这并不令人意外。多项研究表明,过度自信会导致投资者过度交易,因此异常交易量可以作为过度自信的代理变量。然而,与其他更发达的市场相比,在中国的行为偏差并未导致负面的表现——至少在研究关注的一个月的短期内如此。研究表明,在接下来的一个月里,更为成熟的投资者并未利用这些市场效率低下的现象,因此押注于最过度自信股票的策略是有利可图的。

论文来源

Behavioural Factors in China Stock Market [点击浏览原文]

<摘要>

基于三个行为偏差:过度自信、处置效应和羊群效应,我们为中国股票市场构建了三个行为因子。与传统因子相比,行为因子能够为投资组合的回报提供增量的解释能力和可预测性。为了解决显著的异常现象,我们的行为模型优于传统因子模型和两个流行的新因子模型。本文为学术界和业界研究中国股票市场的行为方法提供了新的思路。

回测表现

年化收益率16.49%
波动率11.78%
Beta-0.011
夏普比率1.4
索提诺比率N/A
最大回撤N/A
胜率50%

完整python代码

from AlgorithmImports import *
import statsmodels.api as sm
import data_tools
# endregion

class OverconfidenceFactorInChina(QCAlgorithm):

    def Initialize(self):
        self.SetStartDate(2009, 1, 1)
        self.SetCash(100000)

        self.tickers:list[str] = [
            '0001', '0002', '0004', '0006', '0011', '0012', '0016', '0017',
            '0019', '0020', '0027', '0066', '0083', '0101', '0135', '0144', '0151',
            '0175', '0241', '0267', '0268', '0270', '0288', '0291', '0316', '0322',
            '0384', '0388', '0489', '0586', '0656', '0669', '0688', '0700', '0762',
            '0788', '0836', '0868', '0881', '0883', '0916', '0960', '0968', '0981',
            '0992', '1038', '1044', '1093', '1099',

            # data error: '0003'

            # '1109', '1113', '1177', '1193',
            # '1299', '1308', '1359', '1378', '1810', '1876', '1913', '1928', '1929',
            # '1972', '1997', '2007', '2020', '2066', '2269', '2313', '2319', '2328',
            # '2331', '2382', '2388', '2638', '2688', '3311', '3323', '3690', '3692',
            # '3799', '3800', '6098', '6823', '6862'
        ]

        self.min_volumes:int = 15           # need atleast n daily volumes for one month
        self.regression_period:int = 12     # need m monthly data
        self.quantile:int = 2
        self.leverage:int = 5
        self.data:dict[Symbol, data_tools.SymbolData] = {}

        # NOTE: Download method is rate-limited to 100 calls (https://github.com/QuantConnect/Documentation/issues/345)
        for ticker in self.tickers:
            security = self.AddData(data_tools.ChineseStock, ticker, Resolution.Daily)
            security.SetFeeModel(CustomFeeModel())
            security.SetLeverage(self.leverage)

            stock_symbol:Symbol = security.Symbol
            income_statement_symbol:Symbol = self.AddData(data_tools.IncomeStatement, ticker, Resolution.Daily).Symbol
            self.data[stock_symbol] = data_tools.SymbolData(income_statement_symbol, self.regression_period)

        self.recent_month:int = -1

    def OnData(self, data):
        curr_date:datetime.date = self.Time.date()

        # daily data update
        # price and average shares are retrieved and stored separately, because they might not come at the same time
        for symbol, symbol_obj in self.data.items():
            if symbol in data and data[symbol]:
                if data[symbol].GetProperty('volume') != 0:
                    volume:float = data[symbol].GetProperty('volume')
                    
                    symbol_obj.update_daily_volumes(volume)
                
                if data[symbol].Value != 0: 
                    price:float = data[symbol].Value

                    symbol_obj.update_last_price(price)

            income_statement_symbol:Symbol = symbol_obj.income_statement_symbol
            if income_statement_symbol in data and data[income_statement_symbol] and \
                data[income_statement_symbol].GetProperty('weightedAverageShsOutDil') != 0:

                average_shares:float = data[income_statement_symbol].GetProperty('weightedAverageShsOutDil')

                symbol_obj.update_last_average_shares(average_shares)

        # rebalance monthly
        if self.recent_month == curr_date.month:
            return
        self.recent_month = curr_date.month

        filtered_symbols:list[Symbol] = []
        regression_x:np.array = np.zeros(self.regression_period)

        for symbol, symbol_obj in self.data.items():
            if symbol_obj.are_daily_volumes_ready(self.min_volumes):
                symbol_obj.update_monthly_volumes()
                symbol_obj.reset_daily_volumes()
            else:
                # data have to be consecutive
                symbol_obj.reset_volumes()

            if symbol_obj.are_regression_data_ready() and symbol_obj.are_data_for_market_cap_ready():
                regression_x += np.array(symbol_obj.get_monthly_volumes())

                filtered_symbols.append(symbol)

        ATV_by_symbol:dict[Symbol, float] = {}
        market_cap_by_symbol:dict[Symbol, float] = {}

        for symbol in filtered_symbols:
            market_cap:float = self.data[symbol].get_market_cap()
            regression_y:list[float] = self.data[symbol].get_monthly_volumes()

            regression_model = self.MultipleLinearRegression(regression_x, regression_y)

            ATV:float = np.mean(regression_model.resid)
            ATV_by_symbol[symbol] = ATV
            market_cap_by_symbol[symbol] = market_cap

        # big and small cap will be sorted into halfs by ATV values and there has to be enough stocks
        if len(ATV_by_symbol) < (self.quantile * 2):
            self.Liquidate()
            return

        cap_quantile:int = int(len(ATV_by_symbol) / self.quantile)

        sorted_by_cap:list[Symbol] = [x[0] for x in sorted(market_cap_by_symbol.items(), key=lambda item: item[1])]
        large_cap:list[Symbol] = sorted_by_cap[-cap_quantile:]
        small_cap:list[Symbol] = sorted_by_cap[:cap_quantile]

        large_cap_sorted_by_ATV:list[Symbol] = sorted(large_cap, key=lambda symbol: ATV_by_symbol[symbol])
        small_cap_sorted_by_ATV:list[Symbol] = sorted(small_cap, key=lambda symbol: ATV_by_symbol[symbol])

        large_cap_quantile:int = int(len(large_cap_sorted_by_ATV) / self.quantile)
        small_cap_quantile:int = int(len(small_cap_sorted_by_ATV) / self.quantile)

        # long the big overconfident and small overconfident portfolios
        long_leg:list[Symbol] = large_cap_sorted_by_ATV[-large_cap_quantile:] + small_cap_sorted_by_ATV[-small_cap_quantile:]

        # short the big low confident and small low confident portfolios.
        short_leg:list[Symbol] = large_cap_sorted_by_ATV[:large_cap_quantile] + small_cap_sorted_by_ATV[:small_cap_quantile]

        # trade execution
        invested = [x.Key for x in self.Portfolio if x.Value.Invested]
        for symbol in invested:
            if symbol not in long_leg + short_leg:
                self.Liquidate(symbol)

        total_long_cap:float = sum(list(map(lambda symbol: market_cap_by_symbol[symbol], long_leg)))
        for symbol in long_leg:
            if symbol in data and data[symbol]:
                self.SetHoldings(symbol, market_cap_by_symbol[symbol] / total_long_cap)

        total_short_cap:float = sum(list(map(lambda symbol: market_cap_by_symbol[symbol], short_leg)))
        for symbol in short_leg:
            if symbol in data and data[symbol]:
                self.SetHoldings(symbol, -market_cap_by_symbol[symbol] / total_short_cap)

    def MultipleLinearRegression(self, x:np.array, y:list):
        x = np.array(x).T
        x = sm.add_constant(x)
        result = sm.OLS(endog=y, exog=x).fit()
        return result
        
# Custom fee model
class CustomFeeModel(FeeModel):
    def GetOrderFee(self, parameters):
        fee = parameters.Security.Price * parameters.Order.AbsoluteQuantity * 0.00005
        return OrderFee(CashAmount(fee, "USD"))

Leave a Reply

Discover more from Quant Buffet

Subscribe now to keep reading and get access to the full archive.

Continue reading