Independent Project: Portfolio Construction & Factor Modeling
This project demonstrates the full workflow of portfolio construction for 10 large NYSE stocks, combining Fama–French 3-factor regression analysis with Markowitz mean-variance optimization in Excel.
It leverages historical stock and factor data to:
- Estimate expected returns using factor models
- Calculate portfolio risk via covariance matrices
- Construct optimal portfolios (Minimum Variance and Tangency) using Excel Solver
- Visualize results and provide a clear, report-ready summary
- Raw Data: Yahoo Finance (Adjusted Close) — https://finance.yahoo.com/
- Fama–French Factors: Kenneth French Data Library — https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html
- Risk-Free Rate: FRED DTB3 (3-month T-bill) — https://fred.stlouisfed.org/series/DTB3
- Assets: AAPL, MSFT, JPM, KO, MCD, XOM, CAT, JNJ, WMT, DIS
- Market Proxy: S&P 500 (^GSPC)
- Timeframe: Last 7 years, monthly frequency
This setup ensures sufficient diversification, yet keeps analysis manageable and visually clear in Excel.
- Why 10 stocks: Beyond 10–15 reasonably uncorrelated stocks, incremental benefit from diversification is minimal.
- Why 7 years: Long enough to estimate covariances and betas without excessive noise, recent enough to reflect current market conditions.
- Why Excel: Transparent formulas, visible regressions, and hands-on Solver optimization demonstrate all steps clearly.
Construct and analyze an optimal portfolio of 10 large-cap NYSE stocks using factor models and mean-variance optimization.
- How do the Fama–French 3-factor model estimates compare to historical returns?
- How does covariance between assets affect portfolio risk?
- What are the optimal portfolio weights for minimum variance and maximum Sharpe ratio (tangency portfolio)?
- How do visualizations (efficient frontier, asset scatter, MVP/TP pies) help interpret results?
- Stock Prices: Yahoo Finance (monthly adjusted close)
- Factors: Fama–French 3-factor monthly data
- Risk-Free Rate: FRED 3-month T-bill (annual/12 for monthly decimals)
- RawPrices – stock + S&P 500 prices
- Factors_RF – MKT-RF, SMB, HML, RF
- Returns – arithmetic & log returns, market excess returns
- ExcessReturns – stock excess returns (R − RF)
- Regressions – Excel ToolPak regression outputs & residuals
- ExpectedReturns – monthly & annualized E[R]
- CovMatrix – 10×10 covariance of excess returns
- PortfolioCalc – weights, expected return, variance, stdev, Sharpe
- Solver_MVP / Solver_TP – Solver instructions & reference results
- Frontier – efficient frontier table for plotting
- Charts – frontier chart, asset scatter, MVP/TP pies
This step gathers all the raw data that powers the portfolio model.
To make the workflow efficient and reproducible, I used three small Python scripts — one for each data source — to pull, clean, and save monthly CSV files.
These scripts do not perform any analysis; they simply automate the collection and alignment of inputs that are later used in Excel.
| Script | Purpose | Output File | Source |
|---|---|---|---|
| fetch_prices.py | Downloads monthly adjusted close prices for 10 stocks and the S&P 500 (^GSPC) | prices_monthly.csv | Yahoo Finance |
| fetch_factors.py | Retrieves Fama–French 3-Factor data (MKT−RF, SMB, HML, RF) | ff_factors_monthly.csv | Kenneth R. French Data Library |
| fetch_rf.py | Pulls 3-month Treasury Bill (DTB3) data, converts it to monthly decimal RF | rf_dtb3_monthly.csv | FRED (Federal Reserve) |
Each script automatically sets the date range (last 7 years) and outputs a clean CSV ready for import into Excel.
- Run the scripts
Execute each.pyfile from your terminal or IDE:python fetch_prices.py python fetch_factors.py python fetch_rf.py
Each script saves a CSV file in your working directory.
-
Import CSVs into Excel
- prices_monthly.csv → paste into the RawPrices sheet
- ff_factors_monthly.csv → paste into the Factors_RF sheet
- rf_dtb3_monthly.csv → cross-check that the monthly RF values line up with the factor data
-
Align on Month-End Dates
- Use month-end alignment across all sheets (RawPrices, Factors_RF, Returns).
- Drop any months that don’t appear in all three datasets (inner join).
- This ensures that every regression and covariance calculation uses perfectly matched rows.
This step converts prices and macro inputs into the return series that power both the factor regressions and the optimizer. Accuracy and alignment here are critical—every downstream estimate depends on these returns.
What we compute and why:
- Arithmetic returns: used for factor regressions and portfolio optimization because they aggregate linearly across assets.
- Excess returns: subtract the contemporaneous monthly risk-free rate (RF) so that returns are measured relative to cash—this becomes the dependent variable for the Fama–French regressions.
- Market excess returns: same idea for the market proxy (^GSPC), used as an explanatory factor.
Implementation notes:
- Apply arithmetic return formulas to all stocks and to the market proxy (^GSPC).
- Align dates across RawPrices, Factors_RF, and Returns; drop months that don’t appear in all three datasets (inner join).
- Ensure the risk-free rate is in monthly decimal terms (e.g., 0.002 rather than 0.2%).
- Confirm dividends are included via Adjusted Close; this ensures total-return consistency across assets.
Quality checks:
- Plot quick histograms of monthly returns to spot anomalies or fat tails.
- Verify that Average(Excess Return) = Average(Return) − Average(RF) for multiple tickers.
- Remove any rows with missing or malformed values; #DIV/0! or blanks will break regressions and MMULT.
Deliverables from this step:
- Returns sheet with monthly arithmetic returns for all assets and the market.
- ExcessReturns sheet with monthly excess returns for all assets and the market.
The goal here is to estimate how each stock’s excess return loads on the Fama–French factors. We run one regression per stock, using monthly data across the aligned sample window.
For each stock i, regress monthly excess returns on the Fama–French factors:
Procedure in Excel (ToolPak):
- Data → Data Analysis → Regression
- Y Range: Stock_i_Excess_Returns
- X Range: columns for MKT−RF, SMB, HML
- Check “Labels” if headers included; include intercept
- Output residuals and summary statistics
Record for each stock:
- α, β_MKT, β_SMB, β_HML
- R², standard error
- Residuals (idiosyncratic component)
Interpretation:
- β_MKT: sensitivity to broad market risk.
- β_SMB: tilt toward small-cap vs. large-cap characteristics.
- β_HML: tilt toward value vs. growth characteristics.
- α: historical intercept; for forward-looking E[R], set α = 0 to avoid in-sample overfitting.
Good practice and diagnostics:
- Use the exact same date range across all regressions; mismatches distort betas.
- Inspect residuals for outliers or patterns that might indicate missing data or alignment issues.
- ToolPak provides OLS with classical SEs; for research-grade inference consider robust SEs (e.g., Newey–West) in a stats package.
- Remember: high R² indicates explanatory power of variation, not necessarily higher expected return.
Deliverables from this step:
- Regressions sheet with full ToolPak outputs (one per stock).
- A concise Betas summary table (α, βs, R²) covering all 10 stocks.
- Residuals saved for optional idiosyncratic risk diagnostics.
We translate factor exposures (betas) into forward-looking expected returns by combining them with estimated factor premia (historical means over the same 7-year window). This keeps regression and premia estimation consistent.
Compute expected monthly returns from the factor model by multiplying estimated betas with the historical factor premia:
Monthly Return formula:
Geometric Annualization:
Linear (approximation) Annualization:
Notes:
- RF should be the monthly average over the same sample used for factor means.
- Setting α = 0 is a conservative and standard forward-looking assumption.
- Cross-check: compare FF3-implied E[R_i] with each stock’s historical average excess return to understand model vs. realized return gaps.
Quality checks:
- Confirm factor means were computed on the identical rows used in regressions.
- Sanity-check rankings: with a positive market premium, higher β_MKT should typically raise E[R_i].
- Store both monthly and annualized E[R_i] for downstream plotting and reporting.
Deliverables from this step:
- ExpectedReturns sheet with monthly E[R_i], and geometric/linear annualized values for all 10 stocks.
The covariance matrix Σ captures how assets co-move and is the backbone of portfolio risk. Diversification works when off-diagonal covariances are low or negative, reducing overall variance.
Build a 10×10 covariance matrix (Σ) using monthly excess returns:
- Diagonal entries: VAR.P of each stock’s excess returns
- Off-diagonal entries: COV.P of each stock pair’s excess returns
- Keep Σ in monthly units for optimization; annualize only for charts
Tips:
- Use the exact same aligned ExcessReturns range used in Step 3—consistency is crucial.
- Prefer population formulas (VAR.P, COVARIANCE.P) for stability in smaller samples.
- Optional: Apply simple shrinkage toward the diagonal (e.g., 10–20%) if Σ appears unstable.
- Verify symmetry: Σ must be symmetric; if not, re-check your cell references.
Quality checks:
- Build a correlation heatmap from the same range; verify signs/magnitudes are sensible.
- Confirm no NA or text cells in the Σ range; they will break MMULT.
Deliverables from this step:
- CovMatrix sheet containing a clean, symmetric monthly covariance matrix Σ for all 10 assets.
With E[R] from Step 4 and Σ from Step 5, we compute any portfolio’s expected return and variance, then use Solver to find optimal weights under constraints.
Set up weight base (initially with equal weights) and Expected Returns table (from FF3)

Set up formulas:
- Portfolio return (monthly):
$= \text{SUMPRODUCT}(weights_range, expected_returns_range)$ - Portfolio variance:
$= \text{MMULT}(\text{TRANSPOSE}(weights_range), \text{MMULT}(\Sigma_range, weights_range))$ - Portfolio standard deviation:
$= \text{SQRT}(portfolio_variance_cell)$ - Sharpe ratio (monthly):
$= \dfrac{PortfolioReturn - RF_cell}{PortfolioStdev}$
Constraints (long-only base case):
- SUM(weights) = 1
- Each weight ≥ 0
Solver settings:
- MVP (Minimum Variance Portfolio): Set objective to minimize variance
- TP (Tangency Portfolio): Set objective to maximize Sharpe
- Engine: GRG Nonlinear
- Starting values: equal weights or MVP solution
Operational tips:
- For Tangency, ensure RF_cell references the monthly risk-free rate used throughout.
- If Solver struggles to converge, try:
- Using MVP solution as a warm start,
- Applying per-asset caps (e.g., ≤ 40–50%) to regularize,
- Switching to the Evolutionary engine as a fallback.
- After solving, round weights only for presentation; keep full-precision weights for calculations.
Validation:
- Confirm Solver converges without binding inconsistencies.
- Inspect weight concentration; long-only TP often clusters in a few high E[R]/σ names.
- Check MVP variance is below most single-name variances; if not, re-check Σ and date alignment.
Deliverables from this step:
- PortfolioCalc sheet with dynamic portfolio metrics.
- Solver_MVP and Solver_TP setups documented with final weights and performance stats.
This step translates the optimizer into a full picture of risk–return trade-offs. You’ll generate a curve of feasible portfolios by targeting return levels and solving for minimum variance at each target.
Efficient Frontier:
- Create a column of target monthly returns spanning MVP to max observed E[R]
- For each target, use Solver to minimize variance subject to:
- SUM(weights)=1
- weights≥0
- SUMPRODUCT(weights,E[R])=Target
- Record portfolio stdev and annualize for plotting
Charts to include:
- Efficient Frontier (annualized)
- Asset scatter (annualized return vs. stdev)
- Pie charts for MVP and TP
- FF Betas by Stock
How to read the visuals:
- The asset scatter shows each stock’s standalone position; the frontier traces the best achievable trade-off via diversification.
- MVP lies at the far-left of the frontier (lowest variance). Tangency (max Sharpe) is the point where a straight line from RF is just tangent to the frontier.
- Pie charts reveal portfolio intuition—defensive names dominate MVP; higher-return exposures tilt TP.
Quality checks:
- Ensure every frontier point uses identical constraints and correct references.
- Verify monotonicity: annualized stdev should generally rise with higher target returns.
Deliverables from this step:
- Frontier sheet with a clean table of target returns, monthly σ, annualized stats, and solved weights.
- Charts sheet containing the finalized frontier, asset scatter, and MVP/TP pies ready for inclusion in the report.


