FinSight API¶
An automated platform that scrapes, parses, and analyzes financial reports from Malaysian public companies — transforming unstructured PDFs into structured data and AI-powered intelligence.
What is FinSight API?¶
FinSight API bridges the gap between raw financial disclosures and actionable developer tooling. It ingests quarterly and annual reports from Malaysian Blue-Chip companies, extracts structured financial tables, and surfaces the data through a developer API, interactive dashboard, and AI-powered analysis layer.
Quick Navigation¶
| Section | Description | Status |
|---|---|---|
| Platform Overview | Goals, monetization, and roadmap | ✅ Current |
| Architecture | System design and component map | ✅ Current |
| Data Engineering | Scraping, parsing, and ETL pipeline | ✅ Current |
| Backend | FastAPI services and database schema | ✅ Current |
| API Reference | Endpoints, auth, and usage examples | ✅ Current |
| Frontend | Next.js dashboard and state management | ✅ Current |
| AI Systems | RAG, LLM analysis, and agents | 🚧 Planned |
| MLOps | Model training, tracking, and deployment | 🚧 Planned |
| System Design | Scaling, reliability, and data quality | 🚧 Planned |
| Development | Local setup and contribution guide | ✅ Current |
Development Phases¶
Phase 1 ── MVP & Data Acquisition ← Weeks 1–3
Phase 2 ── ETL Pipeline & Database ← Weeks 4–7
Phase 3 ── RAG & Backend API ← Weeks 8–11
Phase 4 ── Full-Stack Dashboard & Auth ← Weeks 12–16
Phase 5 ── Agentic Workflows ← Weeks 17–19
Phase 6 ── ML Models & Production CI/CD ← Weeks 21–24
Tech Stack at a Glance¶
- Next.js — React framework for the web dashboard
- Zustand — Lightweight client-side state management
- TanStack Query — Server state and data fetching
- Recharts / Chart.js — Financial data visualizations
- FastAPI — High-performance Python API framework
- PostgreSQL + pgvector — Relational store + vector embeddings
- Elasticsearch — Hybrid keyword + semantic search
- Redis — Caching and rate-limiting
- Playwright / Selenium — Dynamic web scraping
- PyMuPDF — PDF parsing and table extraction
- Airflow / Prefect — Pipeline orchestration
- Docker — Containerized pipeline execution
- RAG Pipelines — Retrieval-Augmented Generation over financial docs
- LangGraph — Multi-agent orchestration and routing
- MCP Tools — Model Context Protocol for agent tool-use
- Langfuse — AI observability, cost, and latency tracking
- MLflow — Experiment tracking and model registry
- PyTorch / XGBoost — Model training
- GitHub Actions — CI/CD pipeline
- Docker — Reproducible build and deployment