Creating a centralised data repository for a leading us asset manager
Creating a centralized data repository
Client grew their AUM from $3.5bn to $10bn over a period of two years. Currently, there is no centralized data repository and all departments have rolled out their own solutions. As can be expected this leads to data trust issues and manual report collation.
A Hadoop based data lake with
Controlled data ingress through an ETL layer
Data quality checks with resolution workflow
Flexible, dynamic schema that can evolve through the solution lifetime
Reporting solution with canned reports as well as self-service ability.
The technology stack: Hortonworks HDP2.6 big data stack; Pentaho for ETL; Activiti workflow engine; Tableau reporting suite
Incoming data feed definition is completely configuration based - Apache Atlas metadata store
ETL layer is powered by Pentaho
Data moved from raw to staging to production with extensive quality checks to ensure veracity
Exception reporting and resolution (whether quality related or otherwise). The workflow service runs standalone so it can be used for other enterprise workflow needs
A reporting data mart to power Tableau reporting suite
Self-service reporting is available through standard Tableau tools
Data governance (QA, Security and Lineage) built into the design
Low-cost solution - traditional Warehouse/BI technology stacks cost massive amounts of money for licensing, installation and maintenance. ODA solution delivers on all the architectural needs of a modern data management solution but uses open-source technologies entirely.
Opened door towards enabling new generation technologies like machine learning, data science et al.
Multi-account infrastructure using AWS Landing Zone to help reduce the time in setting up secure and scalable workloads while implementing an initial security baseline through the creation of core accounts and resources
Employing Infrastructure-as-Code (laC) through automation templates based on Terraform/CloudFormation to help minimize the time required to roll-out the basic infrastructure for setting up application environments within individual accounts to support end-to-end application lifecycle.
Configuration management using AWS Config to ensure configuration meets standard baselines and facilitate rapid assessment/evaluation of multi-account IT infrastructure components. This further helps simplify compliance auditing, security analysis and change management
Organization wide centralized monitoring and compliance using CloudWatch and CloudTrail integrated with Splunk