Skip to main content

Intelligent Search on documents in Data Lake for Investment Research using Databricks and AWS native tools

Objective

  • Improve Investment Research by Unifying Technical, Fundamental & Alternate Datasets (Structured, Semi-structured & Unstructured) to generate ideas for new investment opportunities or detect portfolio risks.
  • Develop an end to end Solution which can Ingest, Process and Store large volumes of unstructured SEC Filings Data to the Data cum Document Lake and make it available to end users using a Fast and Intelligent Search interface.

Our Solution

  • Developed entire Solution from scratch using AWS native tools and Databricks, which can Ingest, Process and Store large volumes of unstructured documents like Regulatory Filings, Emails, PDFs etc., and make them available to business users with an Intelligent Search interface.
  • Started with an MVP based approach to build a solution using SEC filings data in Pilot phase. Pilot will support
  • Faceted & Full Text Search
  • NER based search capabilities & Semantic Search
  • Solution is highly scalable and able to serve the search results with low latency (in less than 1 seconds) even on 10M+ Documents.
  • Technologies used: AWS, S3, Airflow, Databricks, Lambda, ElasticSearch, Python, FastAPI & Svelte.

Impact

  • Significantly improved Agility & Accuracy in performing investment research
Let’s engage