Building a data consolidation platform for a leading multifamily real estate investment manager

Overview

The client sells Investment strategies /advice through multiple TAMP platforms. In absence of any in house system, the AUM calculations/Sales commissions were driven based on reports/data supplied by these TAMP platforms. There were inherent problems with data quality, redundancy, non-conformity to structure and at times even missing data. All these resulted in lot of manual work, re-work and mostly incorrect commission calculations, under/over reported AUM and in correct performance measurement of a strategy.

Coforge proposed a solution to develop an in-house system to facilitate following -

  • Single place for all Sales/AUM data view
  • Resolve all Data Quality issues before the actual data load or in subsequent stages
  • Automated Reconciliation of data received from multiple Systems
  • Automated Commission calculation
  • All existing manual reports for Sales/AUM be produced automatically once data is cleansed/transformed.
  • Provide users with an interface to manage reference data, handle workflow events, apply adjustments that were not scoped in and finally review the data before it is reported to senior stakeholders.

Data Consolidation Platform

The SALES/AUM data is mostly sourced from multiple TAMP systems, while SALESFORCE is the master reference data source for Advisors/Investment consultants and other parties.

The data needed to be first quality checked, and then conformed to common structure for consolidated AUM view at account and platform level. The reference data too needed to be refreshed real time and provide users with capabilities to manage these/override any mappings and run calculations each month to prepare Commission payout reports. The Daily Sales report should be accessible to end users through Tableau web automatically.

Approach

Ingestion Layer: In this stage, the data in feed file was checked for data quality, conformance to structure (using the JSON config files), and tolerance.

Exception/Workflow: Any exceptions raised during Ingestion & Transformation were captured and raised in Activiti Workflow for end user to take appropriate action.

Transformation Layer: In this layer the data is transformed i.e. Data is cleansed, mapped for respective reference data, consolidated and finally made available in S3 for production use.

Reporting Layer: This layer designed in Tableau uses Athena to query S3 data to present Daily Sales view & consolidated AUM view.

Building a data consolidation platform for a leading multifamily real estate investment manager

The above is snippet from architecture diagram that shows how different open-source components are integrated to develop the solution. The WEB UI is designed in house using Angular with middle tier in Spark Java. SPARK-Scala is used extensively in Ingestion layer, while rest of transformation is done using PENTAHO. The reports are served using Tableau Server and web. The infrastructure can be managed using Ambari view i.e., memory management, adding new nodes and monitoring. Overall, the system is easy to manage, on-board new functions and scale up to handle large data volumes.

Technology Used

Big Data – Spark Scala, HDFS, HIVE

ETL - Pentaho

Workflow - Activiti

Data Visualization - Tableau Server and Web

AWS – S3, EC2, ECS, Athena, CI CD

Benefits

Futuristic goal achieved - In the end, we have a completely in-house system which caters to all requirements of data quality, data consolidation, automatic sales/commission calculation and AUM.

Configurable - The system is fully configurable to on-board new feeds, put Quality checks and make ETL changes with minimal or no development effort.

Low Cost – Given the complexity/data volumes involved and desired futuristic goal, the whole project was developed in fraction cost when compared to cost footprint using Enterprise solutions.

Open Sources & Scalable Solution - The whole project was developed using open-source systems with all offering on AWS cloud and is scalable to cater any data volume.