Analyzing the market impact of central bank communications using natural language processing and financial sentiment analysis
Market Voice Analytics is an end-to-end data pipeline that scrapes speeches from the European Central Bank (ECB) and Federal Reserve, performs financial sentiment analysis using FinBERT, and correlates the results with market movements across major indices and currencies including EUR/USD, S&P 500, Gold, US Treasuries, and Euro Stoxx 50.
The project demonstrates the intersection of natural language processing, financial analysis, and data engineering by analyzing how central bank communications influence market behavior.
Scrapers for ECB and Federal Reserve speeches, press releases, and statements with URL-based deduplication
Archive scrapers that fetch thousands of historical speeches from ECB foedb JSON database and Fed yearly archives
Sentiment analysis using FinBERT with intelligent sentence-based chunking for long documents
Automatic fetching of price data around speech dates using Yahoo Finance API
Measures how speech sentiment correlates with market movements over 1-day and 1-week periods
3-page Streamlit app with sentiment distribution, speaker analysis, and market impact visualizations
The pipeline follows a modular architecture with clear separation of concerns:
Central bank speeches are typically 2,000-5,000 words, but BERT models have a 512-token limit.
Solution: Implemented intelligent sentence-based chunking that splits text on sentence boundaries, verifies exact token count for each chunk, handles edge cases like abbreviations and long sentences, and aggregates sentiment scores across all chunks.
Ensuring no duplicate speeches are stored when running daily ingestion and historical backfills.
Solution: URL-based deduplication with database constraints and validation checks before insertion.
Coordinating multiple data sources and processing steps in a reliable, scheduled manner.
Solution: Apache Airflow DAGs for orchestration with proper dependency management and error handling.
The project successfully demonstrates: