Summary

The goal of the project was to automate a quarterly extraction, transformation, and loading holdings data from a small subset of index funds, the pipeline focuses on consolidating portfolio holdings by sector and market capitalization to give investors and analysts useful insights into what funds hold (the data from FI is only available per quarter meaning i only have fresh data ca every 4 months, it is also very large and albeit structured but hard to read),

Purpose

The Main goal was & is to simplify cross-fund holdings analysis by transforming raw, fund-level asset data into organized sector and market capitalization groupings. The structure enhances visbility for analysts and investors, supporting easier comparison between funds, trend detection, and portfolio risk assesments.

Technical Overview

The ETL pipeline relies on Python scripts orchestrated to:

• Extract: Automatically download and parse FI’s quarterly XML holdings datasets.

• Transform: Clean, normalize, and categorize securities by sector and market capitalization.

• Load: Persist processed results into a PostgreSQL (or SQLite) database for querying and visualization.

The entire workflow is containerized via Docker, simplifying deployment, reproducibility, and database setup.

Results

The ETL pipeline processed 1.35 million holdings from 8,976 XML files across 892 Swedish index funds, producing 64,086 fund × sector × quarter allocations stored in PostgreSQL. The database contains 29,520 unique securities across 7 sectors. Next steps include integration with Tableau or Power BI for visualization and automated validation through CI to ensure data integrity.

Index-funds-etl: FI & Swedish Index Funds

Summary

Purpose

Technical Overview

Results