Data Quality is for Everyone!
Chapter 1: What is this “data stuff” everyone is talking about?
If the pandemic has taught us anything, it is that data touches both our personal and professional lives, in so many ways, everyday. Yet everything about “data” is so confusing, locked away and the province of a new elite. It is easy to get scared and turn away.
Not so fast!!!
In my opinion everyone, and I do mean everyone, is affected by data in dozens of ways! And everyone can help make things better! Empowering yourself and others in your personal lives and making yourself more valuable at work. Even better, almost everyone (I am sorry to say not literally everyone) reports that doing the work is fun.
This little book shows you how in ten short Chapters. I can’t promise you’ll find all ten helpful, nor that you won’t have to do a little work. But I’m pretty sure that your efforts will be rewarded!
Now let’s get started-
The first “chapter” is titled “”What is this ‘data stuff’ everyone is talking about” because data is a mystery to so many people. Exacerbating this, you’re deluged with expressions like “alternative facts” and “fake news.” Even if you’ve been in it a while, you may not have thought much about terms such as “data” and “digital transformation.” So what are the key terms and what do they mean? And how do they relate to one another?
Let’s focus first on data.
As everyone knows, the world is a high-speed and confusing place. People have to simplify, or model, the world so they can make sense of it all. Let’s take you and your employer as an example. How does your employer model you? The basic idea is to focus on the characteristics about you that your employer finds most useful — stuff like your NAME, your DEPARTMENT, AGE, and who you REPORT TO. We call characteristics such as your SALARY, “attributes,” and those like REPORTS TO, “relationships.”
Others have an interest in you as well. So, while your employer is interested in you as an EMPLOYEE, a Tax Agency is interested in you as a TAXPAYER and your doctor is interested in you as a PATIENT. Obviously, you are the same person, so each needs to know your NAME. But each has specialized interests and so needs different attributes and relationships. For example, your doctor cares about your BLOOD PRESSURE and the taxing authority about your INTEREST INCOME.
Of course a model alone is not enough. Organizations have to populate their models with your details. Your employer may have your DEPARTMENT as Finance and your AGE as 37 years. Here Finance and 37 years are called “data values.”
Thus a datum, singular, consists of a data model and data value. If you work with spreadsheets, you’re used to thinking of attributes and relationships as the columns and the contents of individual cells as data values. There is also one row per employee. Data models and data values work hand in hand — the data model provides the structure and data values the content. And to do anything with data, you need to understand what the data really mean — in other words the structure. And you have to be able to trust that the data values are correct.
A couple of final points: Notice that I didn’t say “data are facts.” That’s because “facts” can be verified — they have to be true. Not so with data which can, of course, be wrong.
HERE’S WHY THIS MATTERS TO YOU. Everyday, people say things like, “I don’t trust the data,” “that is fake news,” or “here are the alternative facts.” What do they really mean? Are they talking about different data, that is different data structures? Are they saying the data values are wrong? Sorting this out can give you a real leg up! You can be the lone voice of reason in some tough situations by understanding data in the way I described here!
Next up: Data Quality Defined!!