Compliance and Governance
The model development team at a Top 10 bank was responsible for forecasting on a $100 billion loan portfolio, but development efforts were often stalled by the complexity in data, and created compliance issues in the process.
We implemented a data platform for providing authoritative and high-quality data to streamline the model development process and ensure the bank stays compliant and well-managed.
A consumer credit team for a top 10 bank was responsible for maintaining the accuracy of annualized loss predictions for a loan portfolio exceeding $100 billion. But the cycle time to continually rebuild and deploy new loan-level models was taking over a year, requiring more than 20 analyst resources. The primary causes of the problem were scattered data, poor data quality and difficulties in tracing data to its source. Additionally, new privacy laws required all customer data to be maintained in production-controlled systems with close monitoring of every data element used.
Streamline the loss forecasting model development process while ensuring that data processes and platforms are compliant and well-managed.
We began by conducting an inventory to understand data elements and sources for every model. To ensure reliability and compliance with standards, we cataloged metadata, data quality rules and lineage to source for all critical data elements.
In parallel, we assessed data processes and platforms relative to the enterprise data and modeling infrastructure standards. We collaborated with the loss forecasting team and their technology partners to collect data directly from the disparate sources and consolidate everything into cloud-based data lake. Data was organized into optimized columnar file formats for faster read/write times, and every element was monitored for quality. Datasets were then merged into analytical warehouses to provide account- level data to analysts in a navigable way.
We created a dynamic, distributed computing pipeline to move massive volumes of data, validate quality in flight and optimize each data element for use in its respective model(s). Dynamic cluster sizing allowed for maximum performance while minimizing costs. In total, the system became the authoritative source for retrieving, validating, consolidating and transforming all data elements needed for critical loss forecasting processes. We built a new capability that would allow users to quickly integrate new data sources into the platform. We also enabled “model- ready” views of data to keep model development teams focused on using the data, not pulling and preparing it.