Building a humble buy and sell decision engine for me in Python
— python — 9 min read
My Python skill is pretty shallow. I hope this mini project can help me deepen my understanding.
Resources
Google colab
It’s convenient that Python packages are automatically managed in the notebook
Buy when the price is low, and sell when the price is high
import pandas as pdimport yfinance as yfimport numpy as np
start_date = pd.to_datetime('2020-01-01')end_date = pd.to_datetime('2024-01-01')
goog_data = yf.download('GOOG', start_date, end_date)
goog_data_signal = pd.DataFrame(index=goog_data.index)goog_data_signal['price'] = goog_data['Adj Close']goog_data_signal['daily_difference'] = goog_data_signal['price'].diff()goog_data_signal['signal'] = 0.0goog_data_signal['signal'] = np.where(goog_data_signal['daily_difference'] > 0, 1.0, 0.0)goog_data_signal['positions'] = goog_data_signal['signal'].diff()
goog_data_signal.tail(10)
Date | price | daily_difference | signal | positions |
---|---|---|---|---|
2023-12-15 00:00:00 | 133.83999633789062 | 0.6399993896484375 | 1.0 | 1.0 |
2023-12-18 00:00:00 | 137.19000244140625 | 3.350006103515625 | 1.0 | 0.0 |
2023-12-19 00:00:00 | 138.10000610351562 | 0.910003662109375 | 1.0 | 0.0 |
2023-12-20 00:00:00 | 139.66000366210938 | 1.55999755859375 | 1.0 | 0.0 |
2023-12-21 00:00:00 | 141.8000030517578 | 2.1399993896484375 | 1.0 | 0.0 |
2023-12-22 00:00:00 | 142.72000122070312 | 0.9199981689453125 | 1.0 | 0.0 |
2023-12-26 00:00:00 | 142.82000732421875 | 0.100006103515625 | 1.0 | 0.0 |
2023-12-27 00:00:00 | 141.44000244140625 | -1.3800048828125 | 0.0 | -1.0 |
2023-12-28 00:00:00 | 141.27999877929688 | -0.160003662109375 | 0.0 | 0.0 |
2023-12-29 00:00:00 | 140.92999267578125 | -0.350006103515625 | 0.0 | 0.0 |
- High: The highest price of the stock on that trading day.
- Low: The lowest price of the stock on that trading day.
- Close: The price of the stock at closing time.
- Open: The price of the stock at the beginning of the trading day (closing price of the previous trading day).
- Volume: How many stocks were traded.
- Adj Close: The closing price of the stock that adjusts the price of the stock for corporate actions. This price takes into account the stock splits and dividends.
- I used
yf.download()
aspd.DataReader
has compatibility issue with yahoo finance api
Visualise it
import pandas as pdimport yfinance as yfimport numpy as npimport matplotlib.pyplot as plt
start_date = pd.to_datetime('2020-01-01')end_date = pd.to_datetime('2024-01-01')
goog_data = yf.download('GOOG', start_date, end_date)
goog_data_signal = pd.DataFrame(index=goog_data.index)goog_data_signal['price'] = goog_data['Adj Close']goog_data_signal['daily_difference'] = goog_data_signal['price'].diff()goog_data_signal['signal'] = 0.0goog_data_signal['signal'] = np.where(goog_data_signal['daily_difference'] > 0, 1.0, 0.0)goog_data_signal['positions'] = goog_data_signal['signal'].diff()
fig = plt.figure()ax1 = fig.add_subplot(111, ylabel='Google price in $')
goog_data_signal['price'].plot(ax=ax1, color='r', lw=2.)ax1.plot(goog_data_signal.loc[goog_data_signal.positions == 1.0].index, goog_data_signal.price[goog_data_signal.positions == 1.0], '^', markersize=5, color='m')ax1.plot(goog_data_signal.loc[goog_data_signal.positions == -1.0].index, goog_data_signal.price[goog_data_signal.positions == -1.0], 'v', markersize=5, color='k')plt.show()
Backtesting
Backtesting is a key phase to get statistics showing how effective the trading strategy is.
- Profit and loss (P and L): The money made by the strategy without transaction fees.
- Net profit and loss (net P and L): The money made by the strategy with transaction fees.
- Exposure: The capital invested.
- Number of trades: The number of trades placed during a trading session.
- Annualised return: This is the return for a year of trading.
- Sharpe ratio: The risk-adjusted return. This is important because it compares the return of the strategy with a risk-free strategy.
import pandas as pdimport yfinance as yfimport numpy as npimport matplotlib.pyplot as plt
start_date = pd.to_datetime('2021-01-01')end_date = pd.to_datetime('2024-01-01')
goog_data = yf.download('GOOG', start_date, end_date)
goog_data_signal = pd.DataFrame(index=goog_data.index)goog_data_signal['price'] = goog_data['Adj Close']goog_data_signal['daily_difference'] = goog_data_signal['price'].diff()goog_data_signal['signal'] = 0.0goog_data_signal['signal'] = np.where(goog_data_signal['daily_difference'] > 0, 1.0, 0.0)goog_data_signal['positions'] = goog_data_signal['signal'].diff()
initial_capital = float(1000.0)
positions = pd.DataFrame(index=goog_data_signal.index).fillna(0.0)portfolio = pd.DataFrame(index=goog_data_signal.index).fillna(0.0)
positions['GOOG'] = goog_data_signal['signal']portfolio['positions'] = (positions.multiply(goog_data_signal['price'], axis=0))portfolio['cash'] = initial_capital - (positions.diff().multiply(goog_data_signal['price'], axis=0)).cumsum()portfolio['total'] = portfolio['positions'] + portfolio['cash']
# portfolio.tail(100)
portfolio.plot()plt.show()
As you can see, this strategy is not very profitable
Exponential Moving Average
The exponential moving average, EMA, is similar to the simple moving average, but, instead of weighing all prices in the history equally, it places more weight on the most recent price observation and less weight on the older price observations.
There two different types of EMAs.
- Fast EMA: converges to new price observations faster and forgets older observations faster
- Slow EMA: converges to new price observations slower and forgets old observations slower.
EMA = ( P - EMAp ) * K + EMAp
Where:
P = Price for the current periodEMAp = the Exponential moving Average for the previous periodK = the smoothing constant, equal to 2 / (n + 1)n = the number of periods in a simple moving average roughly approximated by the EMA
Implementation of the EMA
import pandas as pdimport yfinance as yf
start_date = pd.to_datetime('2021-01-01')end_date = pd.to_datetime('2024-01-01')
goog_data = yf.download('GOOG', start_date, end_date)close = goog_data['Close']
'''EMA = ( P - EMAp ) * K + EMAp
Where:
P = Price for the current periodEMAp = the Exponential moving Average for the previous periodK = the smoothing constant, equal to 2 / (n + 1)n = the number of periods in a simple moving average roughly approximated by the EMA'''
num_periods = 20 # number of days over which to averageK = 2 / (num_periods + 1) # smoothing constantema_p = 0
ema_values = [] # to hold computed EMA values
for close_price in close: if (ema_p == 0): ema_p = close_price else: ema_p = (close_price - ema_p) * K + ema_p
ema_values.append(ema_p)
goog_data = goog_data.assign(ClosePrice=pd.Series(close, index=goog_data.index))goog_data = goog_data.assign(Exponential20DayMovingAverage=pd.Series(ema_values, index=goog_data.index))
close_price = goog_data['ClosePrice']ema = goog_data['Exponential20DayMovingAverage']
import matplotlib.pyplot as plt
fig = plt.figure()ax1 = fig.add_subplot(111, ylabel='Google price in $')close_price.plot(ax=ax1, color='g', lw=2., legend=True)ema.plot(ax=ax1, color='b', lw=2., legend=True)plt.show()
Backtesting EMA
import pandas as pdimport yfinance as yfimport numpy as np
start_date = pd.to_datetime('2023-07-01')end_date = pd.to_datetime('2024-01-01')
goog_data = yf.download('GOOG', start_date, end_date)close = goog_data['Close']
'''EMA = ( P - EMAp ) * K + EMAp
Where:
P = Price for the current periodEMAp = the Exponential moving Average for the previous periodK = the smoothing constant, equal to 2 / (n + 1)n = the number of periods in a simple moving average roughly approximated by the EMA'''
num_periods = 20 # number of days over which to averageK = 2 / (num_periods + 1) # smoothing constantema_p = 0
ema_values = [] # to hold computed EMA values
for close_price in close: if (ema_p == 0): ema_p = close_price else: ema_p = (close_price - ema_p) * K + ema_p
ema_values.append(ema_p)
goog_data = goog_data.assign(ClosePrice=pd.Series(close, index=goog_data.index))goog_data = goog_data.assign(Ema20=pd.Series(ema_values, index=goog_data.index))goog_data['Signal'] = np.where(goog_data['ClosePrice'] > goog_data['Ema20'], 1.0, 0.0)goog_data['Position'] = goog_data['Signal'].diff()
initial_capital = float(1000.0)
positions = pd.DataFrame(index=goog_data.index).fillna(0.0)portfolio = pd.DataFrame(index=goog_data.index).fillna(0.0)
positions['GOOG'] = goog_data['Signal']portfolio['Signal'] = goog_data['Signal']portfolio['Price'] = goog_data['ClosePrice']portfolio['Positions'] = (positions.multiply(goog_data['ClosePrice'], axis=0))portfolio['Cash'] = initial_capital - (positions.diff().multiply(goog_data['ClosePrice'], axis=0)).cumsum()portfolio['Total'] = portfolio['Positions'] + portfolio['Cash']
portfolio.tail(100)# goog_data.tail(100)
The last 6 month result is poor. It started with 1,000 capital and end with 990.6
Date | Signal | Price | Positions | Cash | Total |
---|---|---|---|---|---|
2023-12-05 00:00:00 | 0.0 | 132.38999938964844 | 0.0 | 990.6200256347656 | 990.6200256347656 |
2023-12-06 00:00:00 | 0.0 | 131.42999267578125 | 0.0 | 990.6200256347656 | 990.6200256347656 |
2023-12-07 00:00:00 | 1.0 | 138.4499969482422 | 138.4499969482422 | 852.1700286865234 | 990.6200256347656 |
2023-12-08 00:00:00 | 1.0 | 136.63999938964844 | 136.63999938964844 | 852.1700286865234 | 988.8100280761719 |
2023-12-11 00:00:00 | 0.0 | 134.6999969482422 | 0.0 | 986.8700256347656 | 986.8700256347656 |
2023-12-12 00:00:00 | 0.0 | 133.63999938964844 | 0.0 | 986.8700256347656 | 986.8700256347656 |
2023-12-13 00:00:00 | 0.0 | 133.97000122070312 | 0.0 | 986.8700256347656 | 986.8700256347656 |
2023-12-14 00:00:00 | 0.0 | 133.1999969482422 | 0.0 | 986.8700256347656 | 986.8700256347656 |
2023-12-15 00:00:00 | 0.0 | 133.83999633789062 | 0.0 | 986.8700256347656 | 986.8700256347656 |
2023-12-18 00:00:00 | 1.0 | 137.19000244140625 | 137.19000244140625 | 849.6800231933594 | 986.8700256347656 |
2023-12-19 00:00:00 | 1.0 | 138.10000610351562 | 138.10000610351562 | 849.6800231933594 | 987.780029296875 |
2023-12-20 00:00:00 | 1.0 | 139.66000366210938 | 139.66000366210938 | 849.6800231933594 | 989.3400268554688 |
2023-12-21 00:00:00 | 1.0 | 141.8000030517578 | 141.8000030517578 | 849.6800231933594 | 991.4800262451172 |
2023-12-22 00:00:00 | 1.0 | 142.72000122070312 | 142.72000122070312 | 849.6800231933594 | 992.4000244140625 |
2023-12-26 00:00:00 | 1.0 | 142.82000732421875 | 142.82000732421875 | 849.6800231933594 | 992.5000305175781 |
2023-12-27 00:00:00 | 1.0 | 141.44000244140625 | 141.44000244140625 | 849.6800231933594 | 991.1200256347656 |
2023-12-28 00:00:00 | 1.0 | 141.27999877929688 | 141.27999877929688 | 849.6800231933594 | 990.9600219726562 |
2023-12-29 00:00:00 | 1.0 | 140.92999267578125 | 140.92999267578125 | 849.6800231933594 | 990.6100158691406 |
Strategy that uses MACD
Python Implementation
# MACD Strategyimport pandas as pdimport pandas_datareader as pdrimport matplotlib.pyplot as pltfrom datetime import datetimeimport yfinance as yf
# Fetch stock datadef fetch_stock_data(ticker, start, end): return yf.download(ticker, start, end)
# Calculate MACDdef calculate_macd(data, short_window=12, long_window=26, signal_window=9): data['EMA_short'] = data['Close'].ewm(span=short_window, adjust=False).mean() data['EMA_long'] = data['Close'].ewm(span=long_window, adjust=False).mean() data['MACD'] = data['EMA_short'] - data['EMA_long'] data['Signal_Line'] = data['MACD'].ewm(span=signal_window, adjust=False).mean()
# Generate trading signals (buy=1 , sell=-1, do nothing=0)def generate_signals(data): data['Signal'] = 0 data['Signal'][data['MACD'] > data['Signal_Line']] = 1 data['Signal'][data['MACD'] < data['Signal_Line']] = -1 data['Position'] = data['Signal'].diff()
# Backtesting the strategydef backtest_strategy(data): initial_capital= float(100000.0) positions = pd.DataFrame(index=data.index).fillna(0.0) portfolio = pd.DataFrame(index=data.index).fillna(0.0) # Buy a 100 shares positions['Stock'] = 100*data['Signal'] portfolio['positions'] = (positions.multiply(data['Close'], axis=0)) portfolio['cash'] = initial_capital - (positions.diff().multiply(data['Close'], axis=0)).cumsum() portfolio['total'] = portfolio['positions'] + portfolio['cash']
# Calculate daily returns portfolio['returns'] = portfolio['total'].pct_change()
return portfolio
# Calculate the Sharpe Ratiodef calculate_sharpe_ratio(portfolio): # Assuming risk-free rate = 0 for simplicity risk_free_rate = 0 sharpe_ratio = (portfolio['returns'].mean() - risk_free_rate) / portfolio['returns'].std() # Annualize the Sharpe ratio sharpe_ratio_annualized = (252**0.5) * sharpe_ratio return sharpe_ratio_annualized
# Main execution functiondef run_strategy(ticker): start_date = '2023-06-09' end_date = datetime.now().strftime('%Y-%m-%d')
data = fetch_stock_data(ticker, start_date, end_date) calculate_macd(data) generate_signals(data)
portfolio = backtest_strategy(data) sharpe_ratio = calculate_sharpe_ratio(portfolio)
# Plot the results plt.figure(figsize=(10,6)) plt.plot(data['Close'], label='Close Price', alpha=0.5) plt.plot(data['EMA_short'], label='12-day EMA', alpha=0.5) plt.plot(data['EMA_long'], label='26-day EMA', alpha=0.5) plt.plot(data['MACD'], label='MACD', alpha=0.5) plt.plot(data['Signal_Line'], label='Signal Line', alpha=0.5) plt.scatter(data.index, data['Position'], label='Buy Signal', marker='^', color='green', alpha=1) plt.scatter(data.index, data['Position'], label='Sell Signal', marker='v', color='red', alpha=1) plt.title(f'MACD Strategy: {ticker}') plt.legend() plt.show()
plt.figure(figsize=(10,6)) plt.plot(portfolio['total'], label='Portfolio Value') plt.title('Portfolio Performance') plt.legend() plt.show()
print(f"Sharpe Ratio: {sharpe_ratio}")
# Run the strategy for a given stockrun_strategy('AAPL')
Output and Backtesting
The result is not very impressive and the Sharpe Ratio is 0.336
, which is moderate. The following is the typical Sharpe Ratios of other strategies.
Strategy | Average Sharpe Ratio |
---|---|
Buy and Hold | 0.2-0.4 |
Moving Average Crossover | 0.3-0.5 |
MACD (Your Strategy) | 0.3363 |
RSI | 0.4-0.6 |
Momentum | 0.5-0.7 |