Skip to content

FinSight API

An automated platform that scrapes, parses, and analyzes financial reports from Malaysian public companies — transforming unstructured PDFs into structured data and AI-powered intelligence.


What is FinSight API?

FinSight API bridges the gap between raw financial disclosures and actionable developer tooling. It ingests quarterly and annual reports from Malaysian Blue-Chip companies, extracts structured financial tables, and surfaces the data through a developer API, interactive dashboard, and AI-powered analysis layer.


Quick Navigation

Section Description Status
Platform Overview Goals, monetization, and roadmap ✅ Current
Architecture System design and component map ✅ Current
Data Engineering Scraping, parsing, and ETL pipeline ✅ Current
Backend FastAPI services and database schema ✅ Current
API Reference Endpoints, auth, and usage examples ✅ Current
Frontend Next.js dashboard and state management ✅ Current
AI Systems RAG, LLM analysis, and agents 🚧 Planned
MLOps Model training, tracking, and deployment 🚧 Planned
System Design Scaling, reliability, and data quality 🚧 Planned
Development Local setup and contribution guide ✅ Current

Development Phases

Phase 1 ── MVP & Data Acquisition          ← Weeks 1–3
Phase 2 ── ETL Pipeline & Database         ← Weeks 4–7
Phase 3 ── RAG & Backend API               ← Weeks 8–11
Phase 4 ── Full-Stack Dashboard & Auth     ← Weeks 12–16
Phase 5 ── Agentic Workflows               ← Weeks 17–19
Phase 6 ── ML Models & Production CI/CD   ← Weeks 21–24

Tech Stack at a Glance

  • Next.js — React framework for the web dashboard
  • Zustand — Lightweight client-side state management
  • TanStack Query — Server state and data fetching
  • Recharts / Chart.js — Financial data visualizations
  • FastAPI — High-performance Python API framework
  • PostgreSQL + pgvector — Relational store + vector embeddings
  • Elasticsearch — Hybrid keyword + semantic search
  • Redis — Caching and rate-limiting
  • Playwright / Selenium — Dynamic web scraping
  • PyMuPDF — PDF parsing and table extraction
  • Airflow / Prefect — Pipeline orchestration
  • Docker — Containerized pipeline execution
  • RAG Pipelines — Retrieval-Augmented Generation over financial docs
  • LangGraph — Multi-agent orchestration and routing
  • MCP Tools — Model Context Protocol for agent tool-use
  • Langfuse — AI observability, cost, and latency tracking
  • MLflow — Experiment tracking and model registry
  • PyTorch / XGBoost — Model training
  • GitHub Actions — CI/CD pipeline
  • Docker — Reproducible build and deployment