Predictive analytics, data science, artificial intelligence, bots. The waves of advances in the Application of data keep on coming. You can’t read the pages of the mainstream or business media without being impressed by the opportunity. Yet, although the power of analytics is common currency, it’s spoken of far more often than it’s practiced. The biggest obstacle to using advanced data analysis isn’t skill base or technology; it’s plain old access to the data.
Every CIO I meet tells me that they are excited at the potential of analytics for their business. With one caveat — they can’t get their hands on the data in the first place. Embracing data as a competitive advantage is a necessity for today’s business, so why is it so hard to get access to the data we need?
There is a cost to using data. Behind the glamor of powerful analytical insights is a backlog of tedious data preparation. Since the popular emergence of data science as a field, its practitioners have asserted that 80% of the work involved is acquiring and preparing data. Despite efforts among software vendors to create self-service tools for data preparation, this proportion of work is likely to stay the same for the foreseeable future, for a couple of reasons.
The Next Analytics AgeSponsored by SASHarnessing the power of machine learning and other technologies.
First, you can’t cleanly separate the data from its intended use. Depending on your desired application, you need to format, filter, and manipulate the data accordingly. Every new problem has its unique aspects that usually reach back into data acquisition and preparation. Second, data confers insight and advantage. Once you have harvested the low hanging fruit (the easy-to-prepare data), then you’re falling behind if you’re not looking for the next level of insight. So you must pursue the data which is harder to find and use, driving the amount of time spent in prep up.
But there is a bigger and costlier demon that lurks in enterprises. A demon that can drive up that 80% and often makes initiatives impossible: data Silos. These silos are isolated islands of data, and they make it prohibitively costly to extract data and put it to other uses. They can arise for multiple reasons.
Structural. Software applications are written at one point in time, for a particular group in the company. In a world of limited resources, applications are optimized for their main function. The incentives of individual teams are unlikely to encourage data sharing as a primary requirement. This focus on function, for instance, may result in recent sales being stored in different systems from historical sales, thus presenting an immediate barrier to boosting sales through personal product recommendation.
Political. Knowledge is power, and groups within an organization become suspicious of others wanting to use their data. And often with some justification, as the scope for misuse, even accidental, is broad. Data isn’t a neutral entity — you must interpret it with knowledge of its history and context. This sense of proprietorship can act against the interests of the organization as a whole.
Growth. Any long-lived company has grown through multiple generations of leaders, philosophies, and acquisitions, resulting in multiple incompatible systems. Even if there are no political issues in integrating data, it is costly to reconcile and integrate sets of data that embody different approaches to important business concepts.
Vendor lock-in. Software vendors are among the first to know that access to data is power, and their strategies can frustrate the desire of users to export the data contain in applications. This is particularly dangerous with software-as-a-service applications, where the vendor wants to keep you within their cloud platform. Vendors have also worked hard to create entire job functions and career paths centered around their software. Any hint of move from that world could threaten the livelihood of a trained and certified software professional.
Using data costs money. To move to the higher value uses and maintain a competitive edge, we need to lessen the impact of data silos on our businesses. To remove the barriers of silos, a progressive, pragmatic approach is most effective. The end goal of embracing advanced data analytics is to make a company data-driven — that is, to benefit from data in a consistent, organization-wide manner. Unfortunately, few have the luxury of building a suitable infrastructure from scratch, so companies must figure out a way to get there in an incremental way.
Don’t be dazzled by the draw of another favorite industry buzzword, the “data lake.” Things aren’t as beautifully simple as the image of clear water and mountain springs might conjure. We can’t just pour all our data into one system, expecting goodness to result. Your business is unique, and you can’t buy unique advantage off the shelf. Care, planning, and investment is required. Otherwise, you’re certain to end up with a data swamp, seething with liability, confusion, and rotting bits.
Instead, look to identify high-value opportunities. Analyze your business needs, and choose a problem where data could provide a tangible benefit, perhaps in enhancing sales or preemptive incident response. Draw in the data from around the organization and invest in these use cases first. This is not a proof of concept — you should do these earlier as a way of identifying opportunities — but a banner project that can drive subsequent investments. Tie the integration to its application, so you get value early.
Then, move with the goal of integration in mind. Each progressive step should build also toward an integrated platform for your enterprise data. You don’t want to recreate a whole new set of silos, albeit with advanced capabilities.
In order to do this, you’ll need support from the highest level. The cross-organizational nature of integrating data means that unless you are working with the support of executive leadership, and leaders across business and IT, you will be frustrated. As you progress in using data in operational and strategic applications, organizational changes will be inevitable.
In today’s digitized economy, the ability to use data represents a real and essential competitive advantage. To get to a future state of mature analytical competency, there’s real work to be done in integrating the data you have already. This is a strategic goal for the entire company and, when addressed properly, will lead you to develop experience and a data infrastructure that unlocks every next step.
Of course, if it was easy, it wouldn’t be important. Just as 80% of the work in any data analysis is data preparation, expect 80% of the work in becoming data-driven to be integrating your data, and making it available to meet the needs of your company as a whole.