A large public utility wanted to create a data mart populated with data from their customer master file, which contained information on every account and activity in the region. It was a massive, complex file that had evolved over decades to meet the needs of the corporation. The file was so large, that if flattened, it would be terabytes
This file was their primary legacy system. (For complete details on this massive file, click here.) Access to it in the past had been provided by special purpose extraction routines, a costly and inflexible process. In addition, the data and reports produced by the extraction routines did not reconcile well with the production reporting processes. As such, the uses to which the data could be applied were very limited.
A number of large ETL (Extract, Transform, Load) vendors submitted proposals, but each involved a similarly complex PL/1 program. Faced with this, the corporation considered other options.
Arbutus Analyzer was selected by the utility because it could directly read their complex master file, and at a fraction of the cost of other proposals.
Working with Arbutus technology specialists, the team began the Discovery Phase, defining and profiling the data. This step turned out to be the largest and most significant, as they soon discovered that the file was rife with undocumented transaction types and unforeseen exceptions.
Nevertheless, a small team tackled the project by using Arbutus Analyzer to create virtual columns and data models that mirrored what they expected in the file. Then they iteratively addressed the largest differences, refining their understanding of the actual business processes and reducing the differences with their production reports. In the end, they were able to reconcile the major systems to within 1%. This was by no means ideal, but was still an order of magnitude better than their previous best efforts.
Armed with a more complete understanding of their data, they were able to quickly create the appropriate transformations to match the model of the data mart. The entire process took three months, of which virtually all of the time was spent in the discovery phase of the project. When complete, they declared that this was their first successful data warehousing project.
Visit the Connect product page to find out more.