Intelligent Search on documents in Data Lake for Investment Research using Databricks and AWS native tools
Improve Investment Research by Unifying Technical, Fundamental & Alternate Datasets (Structured, Semi-structured & Unstructured) to generate ideas for new investment opportunities or detect portfolio risks.
Develop an end to end Solution which can Ingest, Process and Store large volumes of unstructured SEC Filings Data to the Data cum Document Lake and make it available to end users using a Fast and Intelligent Search interface.
Developed entire Solution from scratch using AWS native tools and Databricks, which can Ingest, Process and Store large volumes of unstructured documents like Regulatory Filings, Emails, PDFs etc., and make them available to business users with an Intelligent Search interface.
Started with an MVP based approach to build a solution using SEC filings data in Pilot phase. Pilot will support
Faceted & Full Text Search
NER based search capabilities & Semantic Search
Solution is highly scalable and able to serve the search results with low latency (in less than 1 seconds) even on 10M+ Documents.