Correctly applying GARCH in Python

  • Problem: Correct usage of GARCH(1,1)
    Aim of research: Forecasting volatility/variance.
    Tools used: Python
    Instrument: SPX (specifically adjusted close prices)
    Reference material: On Estimation of GARCH Models with an Application to Nordea Stock Prices (Chao Li, 2007)

    Note: I have checked almost all the Quant.SE posts discussing GARCH, but I have not seen any of them with the approximate nature of what I'm asking.

    Description:

    I am doing a personal research of the historical volatility of stock prices when I came upon the concept of volatility forecasting. Naturally, this piqued my interest. I landed on the abovementioned reference material. After a couple of reads, I decided to see if my understanding of the application of GARCH(1,1) is correct and if I can get any insight into using it when investigating historical volatility.

    For this, I'll be using SPX prices, and the bt, pandas, and arch libraries in Python.

    My initial programming steps are as follows.

    In [1]: import pandas as pd
       ...: import bt
       ...: import arch
       ...: 
    
    In [2]: df = bt.get('SPX', start='1990-01-01')
    
    In [3]: df.head()
    Out[3]: 
                   spx
    Date              
    1990-01-02  359.69
    1990-01-03  358.76
    1990-01-04  355.67
    1990-01-05  352.20
    1990-01-08  353.79
    
    In [4]: df['pct_change'] = df['spx'].pct_change().dropna()
       ...: df['stdev21'] = pd.rolling_std(df['pct_change'], 21)
       ...: df['hvol21'] = df['stdev21']*(252**0.5) # Annualize.
       ...: df['variance'] = df['hvol21']**2
       ...: df = df.dropna() # Remove rows with blank cells.
       ...: df.head()
    
    Out[4]: 
                   spx  pct_change   stdev21    hvol21  variance
    Date                                                        
    1990-03-02  335.54    0.008415  0.007304  0.115946  0.013443
    1990-03-05  333.74   -0.005364  0.007425  0.117863  0.013892
    1990-03-06  337.93    0.012555  0.007770  0.123344  0.015214
    1990-03-07  336.95   -0.002900  0.007804  0.123889  0.015348
    1990-03-08  340.27    0.009853  0.007855  0.124689  0.015547
    

    The above is pretty simple. Now, I'll use the GARCH function provided by the arch Python module to get omega, beta, and alpha.

    In [5]: returns = df['pct_change'] * 100
       ...: am = arch.arch_model(returns)
       ...: res = am.fit(iter=5)
       ...: res.params
    Iteration:      5,   Func. Count:     39,   Neg. LLF: 8447.41751792
    Iteration:     10,   Func. Count:     74,   Neg. LLF: 8443.32521758
    Optimization terminated successfully.    (Exit mode 0)
                Current function value: 8443.31731767
                Iterations: 12
                Function evaluations: 86
                Gradient evaluations: 12
    Out[5]: 
    mu          0.058224
    omega       0.011511
    alpha[1]    0.079411
    beta[1]     0.911240
    Name: params, dtype: float64
    

    Following the formula $\sigma_t^2 = \omega + \alpha_1{a^2}_{t-1} + \beta_1{\sigma^2}_{t-1}$, I execute the following code.

    In [6]: df['C'] = res.params['omega']
       ...: df['B'] = df['variance'] * res.params['beta[1]']
       ...: df['A'] = (df['pct_change']**2) * res.params['alpha[1]']
       ...: df['forecast_var'] = df.loc[:,'C':'A'].sum(axis=1)
       ...: df['forecast_vol'] = df['forecast_var']**0.5
       ...: df.head()
       ...: 
    Out[6]: 
                   spx  pct_change   stdev21    hvol21  variance         C  \
    Date                                                                     
    1990-03-02  335.54    0.008415  0.007304  0.115946  0.013443  0.011511   
    1990-03-05  333.74   -0.005364  0.007425  0.117863  0.013892  0.011511   
    1990-03-06  337.93    0.012555  0.007770  0.123344  0.015214  0.011511   
    1990-03-07  336.95   -0.002900  0.007804  0.123889  0.015348  0.011511   
    1990-03-08  340.27    0.009853  0.007855  0.124689  0.015547  0.011511   
    
                       B         A  forecast_var  forecast_vol  
    Date                                                        
    1990-03-02  0.012250  0.000006      0.023767      0.154165  
    1990-03-05  0.012659  0.000002      0.024172      0.155474  
    1990-03-06  0.013863  0.000013      0.025387      0.159333  
    1990-03-07  0.013986  0.000001      0.025498      0.159681  
    1990-03-08  0.014167  0.000008      0.025686      0.160269
    

    Now I arrive at a forecasted volatility value. My questions are as follows:

    1. In codeblock [5], this was used: returns = df['pct_change'] * 100. I arbitrarily took this on face value as it's the way I've seen returns used in GARCH calculations. However, in the initial calculations of variance, I never did need to multiply the pct_change column by 100. Do I keep it or should I feed the pct_change column as is to the function?

    2. If I feed pct_change column to the function as is, the values of omega, beta, and alpha become considerably smaller, which brings down the forecast_var and forecast_vol values down by one order of magnitude. Obviously the shift in precision causes problematic comparison between my historical volatility values and my forecasted volatility values. How can I resolve this?

    3. Should I be using a different approach altogether?

    Kindly point out as well any glaring errors I might have missed or any missing logic

  • It doesn't matter if you use *100 or just pct_change, as long as you are consistent. However, in practice, due to underlying floating point numerical instabilities in the underlying optimization algorithms/default tolerances used in scipy/arch, having the returns expressed in %, i.e. multiplied by 100, will have a better chance of converging during the fitting of the model.

    The mistakes start at In[6]. Once the model is fitted, you can obtain the forecast conditional volatilities at res.conditional_volatility, which you need to annualize, i.e. multiply by sqrt(252).

    Note that in the GARCH formula a(t-1) is the model residual, which you can find in res.residual. It is not the pct_change**2. To arrive at the residual from pct_change, you have to work backwards in the equations. But this has already been done for you in res.resid

    Anyway, the correct formula in In[6] for forecast_vol should be:

    0.01 * np.sqrt(res.params['omega'] + res.params['alpha[1]'] * res.resid**2 + res.conditional_volatility**2 * res.params['beta[1]'])

    ^^ Here you are forecasting the instantaneous volatility for the NEXT day. Multiplying with 0.01 to take care of the earlier multiplication of daily return (pct_change) by 100

    I hope this clarifies things a bit.

    Once the model is fitted to training data, is there anyway we can then use the formula to forecast the volatility for the next *n* days into the future?

License under CC-BY-SA with attribution


Content dated before 7/24/2021 11:53 AM