Only 48% of Data Professionals Trust the Accuracy of their Data

Even though I’ve been a data management professional for 30 years, this story in Analytics Magazine was an eye opener. Don’t get me wrong. I get calls all the time from people who don’t trust the data they’re working with. But, 48% is pretty stark when one considers the large scale implications of using information you can’t trust to make important decisions.

48% should be a number that humbles us all. For decades, those of us in the information management world have asserted “garbage in, garbage out”. But, truth be told we can’t simply blame shoddy data input for lack of accurate information. User input is definitely one factor, but there’s more to the problem.

We data management experts don’t get phone calls about problems with dubious data because users don’t care. We get phone calls because there are structural problems with information management processes. What are the underlying structural problems? The story in Analytics Magazine referenced it somewhat in this snippet (italics mine):

Disjointed, inaccessible data is a major productivity inhibitor for analysts, diverting skilled resources from contributing to valuable business intelligence.

Nearly two in five (38.7 percent) data professionals are spending more than half of their work week on tasks unrelated to actual analysis: 43.8 percent of managers reported that 51 percent or more of their team’s work week is spent collecting, integrating and preparing data rather than analyzing it, while 31.3 percent of analysts said they spend 21 or more hours a week on data housekeeping.

Many data professionals struggle with data access. Forty-three of respondents named access as one of their top two analytics challenges. Nearly three in five respondents (56.9 percent) said it takes days or weeks to access all the data they need, and nearly 10 percent (9.8 percent) say they can rarely or never access a complete range of data sources. Only a third of data professionals (33.4 percent) are immediately able to access all their data or can get it in less than a day.

These statistics point to problems at a foundational level. Users can input the most accurate information available, and the structural issues described above would still make it impossible to fully use and analyze the information. Following are some of the foundational problems I run into as a data management consultant:

Misuse and over reliance on spreadsheets for storing (rather than analyzing data)

This is the most common issue I get called in to resolve. Excel is designed as a data analysis tool, and serves exceptionally well in that niche. But, Excel is not designed for efficient data storage.

Users may innocently start a spreadsheet to store some unique information that doesn’t fit into an organization’s enterprise software. However, this unique or specialized data is also important to the organization, it is related to information in other data management systems and the unique data is essential to decision making processes. If the specialized data weren’t important no one would ever have begun collecting the information.

In time, the original spreadsheet morphs into a workbook with even more related information. And as reporting periods and years go by the spreadsheets are regularly used as templates and copy/pasted to new reporting periods. Formulas may not copy/paste accurately. Links may get lost in the translation. Data sets are separated by reporting periods making it extremely difficult to analyze data across reporting periods. Same data is misspelled across multiple data sets so that finding common data is all but impossible.

These are just some of the problems that crop up with an over-reliance on spreadsheet applications for data management tasks. Following are some more detailed articles on this issue:

Too many data storage silos or containers

This is a very real problem. It can include spreadsheet applications. But most often the situations I’m called into have at least one enterprise level software, if not more. In addition there are several departmental level smaller software applications and of course the accompanying spreadsheet applications used to manage data that doesn’t fit into any of the canned software applications.

The consequential outcome of multiple data storage silos is cumbersome analysis. Somehow the information from various data silos has to be integrated and prepped for data analysis. This can be a very time consuming job and it contributes to doubt in the authenticity of final analysis.

The issue of multiple data silos isn’t ever going to go away completely. Organizations will always have information that doesn’t fit easily into canned software systems. But, it is possible to minimize the problem. Organizations do have the power to intentionally manage the evolution of their data management needs. The purpose of a data evolution project is to cut the number of data storage silos.

Even after your organization intentionally evolves multiple data silos to a common core database, there will still be additional data silos that cannot be merged into an evolved database solution. This is where intentional use of data integration techniques can help. It is not necessary to re-invent the integration process every reporting period. There are tools available to efficiently manage integration projects. Following are some articles I’ve written on data integration:

Sharing data in the office

Sharing (or not sharing) information can be a major area of contention in any office. In small offices the issue of sharing data may not come up that much. But, the larger the organization, the more legitimate are the concerns. How and why data is shared can directly impact one’s ability to fully analyze information for reporting and decision making purposes. This is a major issue and should not be ignored when considering how to improve data integrity in your office. You can read more about sharing data here.

When 48% of data professionals don’t trust the integrity of the information they’re working with, we really do need to take notice. The issues I listed above are not comprehensive. But, they are the ones I run into on a regular basis. In evaluating your own data management process the first place to look is the number of spreadsheet applications stored on your system. The second place to look is how many data storage silos your organization is either maintaining or interacting with. Finally take a serious look at how your organization is managing information sharing. Honestly assessing and responding to these issues will go a long way towards improving data integrity in your office.

Do you have questions about your own data management project? Contact Michelle.

1 Comment

Gary Cokins link

8/6/2018 10:11:10 am

Michelle … Even if there is high data integrity and quality with the source transactional data another reason most managers mistrust the data from their accountants is this: the cost allocations.

Most accountants are using cost accounting practices from the 1960s. They continue to “misallocate” indirect expenses as reasonably accurate costs using a single “cost allocation factor” that violates costing’s cause-and-effect principle. They spread those costs like butter across bread using factors like number of direct labor hours or dollars, number of units produced, number of employees, or sales amount volume. None of those factors reflect the proportionate consumption of the expenses. As a result although the costs reconcile in total the individual costed products are simultaneously over-costed and under-costed. Hence the reported profit margins are flawed and misleading. Activity-based costing (ABC) resolves this problem. … Gary … Gary Cokins

Only 48% of Data Professionals Trust the Accuracy of their Data

Leave a Reply.

Michelle Meyer

Archives

Categories