### How to build a factor model?

• Factor models such as Fama-French or the other ones that are partially summarized here work on the cross-section of asset returns.

How are the factors built, how are sensitivities/coefficients estimated? In this context Fama-MacBeth regressions are usually mentioned. How does this method work intuitively? Could anyone give a step-by-step manual?

EDIT: Links to papers and manuals have been posted in the two answers - this is great. But can someone provide more intuition in the answer? Say we have a universe of stocks (say MSCI Europe) and we group them by value and size. How can we proceed? How do we construct the factors and how do we construct the sensitivities? Could someone please give a more direct explanation, without a link? thanks!

the answer to this question should fill an entire book

6 years ago

1. Determine Factors

Economically, the use of factor models can be either motivated using the ICAPM or the APT. Although there are some theoretical differences between the model, for empirical and practical work these differences are irrelevant. In the end, both models stipulate that returns and expected returns are linear functions of the factors: $$r_{i,t} = \alpha_i + \sum_j \beta_{i,j} F_{j,t} + \epsilon_{i,t} \quad (1)$$ $$\mathbb{E}[ r_{i,t}] = \lambda_o + \sum_j \beta_{i,j} \lambda_j \quad\quad\quad(2)$$ where $F_{j,t}$ is the factor surprise of factor $j$ at time $t$ and $\lambda_j$ is the factor risk premium of factor $j$. What the factors are is fundamentally undetermined. Following the ICAPM, the factors should be proxies for future marginal consumption growth (=state variables). Whatever factor you use, there should be an economic reason why returns should be related to the factor. For some of the steps later, it makes a difference whether the factors are traded returns or some other factor (such as macroeconomic variables). Factors based on returns are usually derived as the return on a particular portfolio or the difference between two portfolios. Best known examples for the first group are the macro factors used by Chen, Roll, and Ross (1986) and for the later group the Fama and French factors (1992,1993, 1996, 2014). It makes the statistical estimation somewhat easier when the factors are returns (I’ll explain this point later)

2. Collect Data

The next step is always the data collection, both for the factors and the test assets. Sometimes, when the factors are macroeconomic time series (or something similar) their predictable component is removed so that the factors are only the factor surprises. In principle, only the unexpected component should explain the cross-sectional differences in returns. When factors are constructed as portfolio returns, a key question is the rebalancing frequency. Most papers that I am aware of follow the example of Fama and French and form the portfolios in the middle of the year (1st of July) and then keep the portfolio constituents the same for a year (a well-known counterexample is the momentum factor of Carhart (1997) who uses monthly rebalancing). When factors are constructed as the difference in returns between top and bottom portfolios according to some ranking the question arises at which quantiles to split the assets. Common are splits at the median, 30/70 quantiles, or 10/90 quantiles.

3. Estimate regressions

The final step is to estimate the regressions to see if the factors are able to explain the cross-section of returns. There are two principle approaches to this, sometimes called time-series regression and cross-sectional regressions (I have also heard people refer to the first procedure as the Fama-French method and the second one as the Fama-MacBeth method).

a) Time-Series Regression

When all factors are returns, you can use time-series regressions for each test asset to estimate the regression slopes $\beta_{i,j}$. In this case, you estimate model (1). You will obtain a beta for each factor and test asset. The reason you can use time-series regressions in this case is that the factor premia $\lambda_j$ can simply be estimated as the time-series mean of the factor returns. If you use excess returns as dependent variables in the regression, the factor model has one implications: all $\alpha_i$ should be zero. Testing this depends a bit on your assumptions about the temporal and cross-sectional correlation in the error terms. In any case, you will have to resort to some form of F-Test (adjusted for autocorrelation, heteroscedasticity, general errors etc.) as you are testing multiple hypotheses. The book by Cochrane (2001) derives these in detail using a GMM approach (chapters 12 and 13).

b) Cross-Sectional Regressions

For general factors, you will need to run cross-sectional regressions by estimating equation (2). A key problem here is that both the $\beta$ coefficients and the prices of risk $\lambda$ are not directly observable. The usual way around is to follow the procedure laid out by Fama and MacBeth (1973): You first run time-series regressions separately for each test asset. This will give you estimates for each $\beta$ for each asset. These estimates are then used in the cross-sectional regression as independent variables using the average returns for each asset as dependent variable. The coefficients being estimated in this regression are the factor risk premia $\lambda$. Again, the prediction of a factor model is that the pricing errors $\lambda_0$ are zero for each asset. In the case of cross-sectional regressions this is a single parameter for which the nullhypothesis that it is zero in the population can be tested. This procedure is usually repeated using a rolling window; with monthly data usually 5 years of data. The real “meat” of the Fama-MacBeth method is the statistical theory of how to account in the standard errors of the cross-sectional regressions for the fact that the $\beta$’s are estimated coefficients from a time series regression and cross-sectional correlation. Again, I would refer to Cochrane’s (2001) book in Chapter 12 for details on the test statistics.

4. Evaluate results

After evaluating whether the pricing errors are small (test that $\alpha_i=0$ for all i), the next question is to test whether the factors chosen in step 1 are “good factors”. This means that they should exhibit a strong relationship to expected returns. The cross-sectional and time-series approaches give slightly different methods to test if a factor is priced. For both methods (time-series and cross-sectional regressions) one should test if the factors are actually priced in the cross-section. For time-series regressions, the factor risk premia are estimated as the time-series average of the factor returns. Standard statistical tests can be used to test if these are positive. For cross-sectional regressions, the factor risk premia are the coefficients of the regressions which can also be tested. In both cases, one should be careful about the standard errors used (autocorrelation in the time series approach, cross-sectional dependencies).

A question that often comes up is which approach is “better”. First, time-series regressions can only be used when the factors are returns. In case the factors are returns, the two approaches are not necessarily equivalent. The time-series regression estimates the factor premium as the average return. Therefore, any factor receives a zero pricing error in the sample. This is equivalent to forcing the intercept in the cross-sectional to zero. In order to make the two methods equivalent, you will have to include the factor as a test asset as well. If you do this, then using the correct standard errors will produce the same estimates for the prices of risk.

Very nice answer! Do you have a couple of links at hand too? For the most important papers that you refer to? This would make the answer 100% complete ... thanks for your efforts!

You're asking for links after not asking for links....All the references the writer used are to classic papers or one of the most well-known finance textbooks.

John you are strict ;) but right too. Whenever I use papers in an answer then I try to find links too ... to have a complete answer ... that's my standard, but I accepted pbr142's answer anyways as I think that it is very complete already.