
Analyze NYSE, AMEX, and NASDAQ stocks by similarity (SIM) using Euclidean distance on key metrics. Trade deciles by longing the highest SIM and shorting the lowest, rebalancing monthly.
ASSET CLASS: stocks | REGION: United States| FREQUENCY:
Monthly | MARKET: equities | KEYWORD: Momentum
I. STRATEGY IN A NUTSHELL
The strategy leverages stock similarity (SIM) by measuring the Euclidean distance between each stock and peers based on Price, Book-to-Market, Size, Operating Profitability, and Investment. SIM is computed as the value-weighted average excess return of the 50 closest stocks over the past month. Stocks are sorted into deciles; the highest SIM decile is bought, the lowest decile is sold, with a value-weighted portfolio rebalanced monthly.
II. ECONOMIC RATIONALE
The approach exploits the similarity effect, where investors favor stocks resembling high-performing peers. By controlling for anomalies (momentum, IVOL, skewness, coskewness) and Retail Order Imbalance, results show the effect remains robust. Empirical evidence supports that similarity-based investing systematically drives returns, reflecting behavioral patterns in equity markets.
III. SOURCE PAPER
Similar Stocks[Click to Open PDF]
He, Wei, Southwestern University of Finance and Economics (SWUFE) – Institute of Chinese Financial Studies (ICFS); Wang, Huaixin, Tsinghua University – PBC School of Finance; Wang, Yuehan, Central University of Finance and Economics (CUFE) – School of Finance; Yu, Jianfeng, Tsinghua University – PBC School of Finance
<Abstract>
Similarity between two stocks is measured by the distance between their characteristics such as price, size, book-to-market, return on assets, and investmentto-assets. We find that after a stock’s most similar stocks have experienced high (low) returns in the past month, this focal stock tends to earn an abnormally high (low) return in the current month. The long-short portfolio strategy sorted on similar-stocks’ past average return earns a monthly CAPM alpha of 1.25% and a Fama-French six-factor alpha of 0.85%. This similarity effect is robust after controlling for style investing and a wide range of well-known firm-level characteristics that can predict returns in the cross section. Our result is consistent with the increased propensity for investors to buy other stocks with similar characteristics after experiencing positive returns for a currently held stock. We also explore other potential explanations for our findings.


IV. BACKTEST PERFORMANCE
| Annualised Return | 13.08% |
| Volatility | 18.86% |
| Beta | -0.08 |
| Sharpe Ratio | 0.69 |
| Sortino Ratio | -0.047 |
| Maximum Drawdown | N/A |
| Win Rate | 50% |
V. FULL PYTHON CODE
from AlgorithmImports import *
class SimilarStockShortTermMomentum(QCAlgorithm):
def Initialize(self):
self.SetStartDate(2002, 1, 1)
self.SetCash(100000)
self.data:Dict[Fundamental, SymbolData] = {}
self.weight:Dict[Symbol, float] = {}
self.exchange_codes:List[str] = ['NYS', 'NAS', 'ASE']
self.nearest_const:int = 10 # selecting n nearest stocks, based on euclidean distance
self.period:int = 21
self.rebalance_month:int = 4
self.quantile:int = 10
self.leverage:int = 5
self.min_share_price:float = 5.
market:Symbol = self.AddEquity('SPY', Resolution.Daily).Symbol
self.fundamental_count:int = 500
self.fundamental_sorting_key = lambda x: x.DollarVolume
self.selection_flag = False
self.UniverseSettings.Resolution = Resolution.Daily
self.AddUniverse(self.FundamentalSelectionFunction)
self.Settings.MinimumOrderMarginPortfolioPercentage = 0.
self.Schedule.On(self.DateRules.MonthStart(market), self.TimeRules.AfterMarketOpen(market), self.Selection)
self.settings.daily_precise_end_time = False
def OnSecuritiesChanged(self, changes: SecurityChanges) -> None:
for security in changes.AddedSecurities:
security.SetFeeModel(CustomFeeModel())
security.SetLeverage(self.leverage)
def FundamentalSelectionFunction(self, fundamental: List[Fundamental]) -> List[Symbol]:
# daily updating of stock prices
for stock in fundamental:
symbol:Symbol = stock.Symbol
if stock in self.data:
self.data[stock].update(stock.AdjustedPrice)
# rebalance monthly
if not self.selection_flag:
return Universe.Unchanged
selected:List[Fundamental] = [
x for x in fundamental if x.HasFundamentalData and x.Market == 'usa' and x.Price > self.min_share_price and \
not np.isnan(x.ValuationRatios.PBRatio) and x.ValuationRatios.PBRatio != 0 and \
not np.isnan(x.FinancialStatements.IncomeStatement.EBIT.ThreeMonths) and x.FinancialStatements.IncomeStatement.EBIT.ThreeMonths != 0 and \
not np.isnan(x.FinancialStatements.BalanceSheet.InvestmentsAndAdvances.ThreeMonths) and x.FinancialStatements.BalanceSheet.InvestmentsAndAdvances.ThreeMonths != 0 and \
not np.isnan(x.MarketCap) and x.MarketCap != 0 and \
x.MarketCap != 0 and x.SecurityReference.ExchangeId in self.exchange_codes
]
if len(selected) > self.fundamental_count:
selected = [x for x in sorted(selected, key=self.fundamental_sorting_key, reverse=True)[:self.fundamental_count]]
stocks_data:List[StockData] = [] # storing stocks price, book to market, market cap, operation profit and investments
highest_values:List[int] = [0, 0, 0, 0, 0] # storing biggest values of price, book to market, size, operation profit and investments
# warm up stock prices
for stock in selected:
symbol:Symbol = stock.Symbol
if stock not in self.data:
self.data[stock] = SymbolData(self.period)
history = self.History(symbol, self.period, Resolution.Daily)
if history.empty:
continue
closes = history.loc[symbol].close
for time, close in closes.items():
self.data[stock].update(close)
if self.data[stock].is_ready():
# retrieve stock's data
price:float = self.data[stock]._price
book_to_market:float = 1 / stock.ValuationRatios.PBRatio
size:float = stock.MarketCap
operation_profit:float = stock.FinancialStatements.IncomeStatement.EBIT.ThreeMonths
investments:float = stock.FinancialStatements.BalanceSheet.InvestmentsAndAdvances.ThreeMonths
# keep storing biggest values
if highest_values[0] < price:
highest_values[0] = price
if highest_values[1] < book_to_market:
highest_values[1] = book_to_market
if highest_values[2] < size:
highest_values[2] = size
if highest_values[3] < operation_profit:
highest_values[3] = operation_profit
if highest_values[4] < investments:
highest_values[4] = investments
# store stock's data in StockData object
stocks_data.append(StockData(stock, price, book_to_market, size, operation_profit, investments))
# make sure there are enough stocks for later decile selection and SIM calculation
if len(stocks_data) < self.nearest_const * 2:
return Universe.Unchanged
# perform normalization on stocks data by dividing it's values with biggest values
for stock_data in stocks_data:
stock_data._price = stock_data._price / highest_values[0]
stock_data._book_to_market = stock_data._book_to_market / highest_values[1]
stock_data._size = stock_data._size / highest_values[2]
stock_data._operation_profit = stock_data._operation_profit / highest_values[3]
stock_data._investments = stock_data._investments / highest_values[4]
euclidean_distances:Dict[Fundamental, List] = {}
# calculate stocks euclidean distances each to each
for index in range(len(stocks_data)):
if index + 1 >= len(stocks_data):
break
# retrieve stock's data based on index
current_stock_data:StockData = stocks_data[index]
curr_stock:Fundamental = current_stock_data._stock
# create list for current stock euclidean distances
if curr_stock not in euclidean_distances:
euclidean_distances[curr_stock] = []
# calculate euclidean distance with each stock, which is after current one in stocks data list
for j in range(index + 1, len(stocks_data)):
temp_stock_data:StockData = stocks_data[j]
temp_stock:Fundamental = temp_stock_data._stock
# create list for stocks euclidean distances
if temp_stock not in euclidean_distances:
euclidean_distances[temp_stock] = []
# calculate stocks data difference and power them by 2
price_diff_pow = (current_stock_data._price - temp_stock_data._price) ** 2
book_to_market_diff_pow = (current_stock_data._book_to_market - temp_stock_data._book_to_market) ** 2
size_diff_pow = (current_stock_data._size - temp_stock_data._size) ** 2
operation_profit_diff_pow = (current_stock_data._operation_profit - temp_stock_data._operation_profit) ** 2
investments_diff_pow = (current_stock_data._investments - temp_stock_data._investments) ** 2
# calculate euclidean distance of differences of current stock data and temp stock data each powered by 2
euclidean_distance_value = np.sqrt(price_diff_pow + book_to_market_diff_pow + size_diff_pow +
operation_profit_diff_pow + investments_diff_pow)
# append euclidean distance value and symbol of pair stock in dictionary keyed by stock symbol
euclidean_dist_pair_symbol = [euclidean_distance_value, temp_stock]
euclidean_distances[curr_stock].append(euclidean_dist_pair_symbol)
# change symbol of temp_stock_data for symbol of current stock data in euclidean_dist_pair_symbol
# and append it in dictionary list keyed by symbol of temp_stock_data
euclidean_dist_pair_symbol = [euclidean_distance_value, curr_stock]
euclidean_distances[temp_stock].append(euclidean_dist_pair_symbol)
SIM:Dict[Fundamental, float] = {} # storing stock's total return of selected n nearest stocks keyed by stock's symbol
# select top n nearest stocks and calculate their return
for stock, dist_sym_list in euclidean_distances.items():
# create list of stock symbols sorted by euclidean distances
sorted_by_dist_values:List = [x[1] for x in sorted(dist_sym_list, key=lambda item: item[0])]
# select top n nearest stocks symbols
nearest_stocks:List = sorted_by_dist_values[:self.nearest_const]
total_return:float = 0.
# calculate value weighted return of selected nearest stocks
total_market_cap:float = sum([x.MarketCap for x in nearest_stocks])
for temp_stock in nearest_stocks:
total_return += (temp_stock.MarketCap / total_market_cap) * self.data[temp_stock].performance()
# store stock's total return of selected n nearest stocks keyed by stock's symbol
SIM[stock] = total_return
long:List[Fundamental] = []
short:List[Fundamental] = []
if len(SIM) >= self.quantile:
# sort stocks by their SIM values and perform decile selection
quantile:int = int(len(SIM) / self.quantile)
sorted_by_SIM:List[Fundamental] = [x[0] for x in sorted(SIM.items(), key=lambda item: item[1])]
# portfolio goes long on the decile with the highest SIM and shorts the decile with the lowest SIM.
long = sorted_by_SIM[-quantile:]
short = sorted_by_SIM[:quantile]
# calculate weights
for i, portfolio in enumerate([long, short]):
mc_sum:float = sum(map(lambda x: x.MarketCap, portfolio))
for stock in portfolio:
self.weight[stock.Symbol] = ((-1) ** i) * stock.MarketCap / mc_sum
return list(self.weight.keys())
def OnData(self, data: Slice) -> None:
# rebalance monthly
if not self.selection_flag:
return
self.selection_flag = False
# trade execution
portfolio:List[PortfolioTarget] = [PortfolioTarget(symbol, w) for symbol, w in self.weight.items() if symbol in data and data[symbol]]
self.SetHoldings(portfolio, True)
self.weight.clear()
def Selection(self) -> None:
self.selection_flag = True
class SymbolData():
def __init__(self, period: int):
self._price:Union[None, float] = None
self._closes:RollingWindow = RollingWindow[float](period)
def update(self, close: float) -> None:
self._price = close
self._closes.Add(close)
def is_ready(self) -> bool:
return self._price and self._closes.IsReady
def performance(self) -> float:
return self._closes[0] / self._closes[self._closes.Count - 1] - 1
class StockData():
def __init__(self, stock: Fundamental, price: float, book_to_market: float, size: float, operation_profit: float, investments: float):
self._stock: float = stock
self._price: float = price
self._book_to_market: float = book_to_market
self._size: float = size
self._operation_profit: float = operation_profit
self._investments: float = investments
# Custom fee model
class CustomFeeModel(FeeModel):
def GetOrderFee(self, parameters):
fee = parameters.Security.Price * parameters.Order.AbsoluteQuantity * 0.00005
return OrderFee(CashAmount(fee, "USD"))
VI. Backtest Performance