When talking to analytics managers, I often tease in jest, “I wonder which of the two types of documentation you have, (after a dramatic pause) wrong or nonexistent.” They laugh, but rarely disagree. Examining the reward for those tasked with keeping the documentation accurate realizes the hard truth, “not much.” Your colleagues think it is directional at best, since the assumption is that some updates were missed along the way. There are even those people who have meticulously organized garage peg boards. But unfortunately for them, they work with others who never start the search for a screwdriver at the toolbox. Even those with the best of intentions usually stop trying before long.
There was a time, about a decade ago, when a lot of IT shops were happy enough that someone knew the data and processes well enough to answer most questions accurately. It was usually the person with the printed data model in 6-point font acting as wallpaper for their cube. But with increased data volumes comes greater transformation complexity, as the number of sources also grows. Some companies have similar data (customers, patients, etc.) coming from multiple sources and define “win rules” to determine which elements per feed to trust above others. Consider staff turnover in light of the need to know an increasingly complex end-to-end data architecture. The challenges quickly outgrow one person’s ability to explain much of anything with reasonable precision.
In practice, not all parts of a complete documentation set are equally important. While a glossary is cool to look at, the data lineage is what gets used in practice day-to-day. Forward and back lineage are needed to quickly answer customer questions, accurately estimate projects, and support active development.
The good news is software companies did not just now realize that lineage has great value, or that people are not the best option to maintain accurate information. “Document as you go” automation tools have been around for years and do a good job on new projects as long the entire development team is using them. I saw the value firsthand several years ago after building a large healthcare data warehouse using a self-documenting tool. Just before going into production, I got a call from a VP who needed to assure a concerned customer that a particular field that the customer believed had incorrect data was not being used in our reporting. In minutes, using track forward reporting, I was able to confidently confirm the new reports did not use the field. When I called the developers of the legacy system, I was told it would take days to look through stored procedures, ETL jobs, and other SQL scripts to determine the answer.
The need for accurate lineage is also becoming more of a requirement than a nice-to-have for companies needing regulatory certification. Some combination of GDPR, CCPA, HIPAA, and BCBS-239 are frequently part of our IT lives now.
Companies are realizing that moving to cloud databases, such as Snowflake, can be tricky without understanding what logic is based in the on-premises database and ETL code. Priority #1 for IT leaders when planning migrations is managing risk. A full understanding of data lineage should lead to better scoping decisions and risk mitigation. On the cost side, data lineage can help identify unused objects that would increase storage needs in the cloud.
The even better news is current and actionable data lineage is just minutes away with Clean Data’s new software solution, Manta. Using the industry’s most complete coverage of development and reporting tools, Manta will document your data pipeline and let you visualize it like never before. A perfect time to try Manta is before a major change like a migration or consolidating products after an acquisition. But you will want to keep using Manta so your documents are always up to date and ready to support developers and end users. To support the broader data governance in your organization, Manta will integrate with data governance and catalog systems.
Hopefully, in all technology careers there are a few wow moments. I can remember the first time I used a MPP database and saw billions of data rows being queried in seconds. Manta might be one of those wow moments for you, and just maybe, for your team and company.
Send us an email to info@cleandatainc.com. We’d love to show you the power of Manta and explore how it might work in your environment.