Irrespective of the long-term, heavy investments in data management, data problems at many companies continue to grow. One of the reasons that data has traditionally been perceived as only one aspect of a technology project; it has not been considered as a corporate asset. Consequently, the belief was that a traditional database and application planning efforts were sufficient to address ongoing data problems.
As our corporate data stores have increased in both subject area diversity and sizes, it has become very clear that a strategy to address data is required. Yet, some organizations still struggle with the idea that corporate data needs a comprehensive strategy.
There’s no shortage of sky-high thinking when it comes down to companies strategic road maps and plans. For many companies, such efforts are just a novelty. Indeed, a company’s strategic plans usually produce very few tangible results for them- just lots of documentation and meetings. On the other hand, a successful plan will identify realistic goals along with a road map that offers clear instructions on the best way to get the job done.
Let’s consider an example to see how this played out in real life at one company that set out to develop a data strategy.
Data Strategy: What Problem Does It Solve?
Let’s consider an example of a consulting firm helping a large bank to create a data strategy. From the beginning, the project champion had found it hard to get his VP to understand the importance of a data strategy and the need for. Why?
The bank was successful already. Its costs and revenue were well-managed, and the individual technology groups and the individual business units were good at delivering against their commitments. It wasn’t complacent to the bank’s credit. Management was always trying to look for new ways to reduce ongoing costs and increase staff members’ productivity.
There were all kinds of key performance indicators (KPIs) and metrics to measure IT performance, the total cost of ownership and company benefits. The idea of creating yet another road map to address an issue that wasn’t well-understood met with pushback.
With the bank doing many things right, he needed to better understand how and why a data strategy would make a difference. To address these questions, it’s very critical to consider how data was used and created previously, compared to how it’s used and created today.
Data: Past and Present
Previously, data was seen as a byproduct of a business process or activity. And it had very little value once the process was finished. While there may have been few other application that needed to access the content for follow-up like special reports, customer service, audits, etc., these were mostly one-off activities.
Now, the business is very different. The value of data is widely accepted; the results of analytics and reporting have made data the secret ingredient of several new business initiatives.
While the value of data has grown tremendously during the last 20 years, and business leaders recognize it, some organizations have adjusted their strategies for sharing, capturing and managing corporate data assets. But their behavior reflects an underlying, outdated belief that data is simply an application byproduct.
Companies need to develop data strategies that match with today’s realities. To develop such a comprehensive data strategy, they need to account for current technology and business commitments while also addressing new objectives and goals.
The Business without a Data Strategy
Going back to the story mentioned in the above example, the bank executive’s concerns were not hard to understand. He spends so much of time going through project proposals that his devoted staff was highly emotional about. In many instances, project proposals of his team were about delivering perfection- turning something that already worked into something stronger, faster, or better. Meanwhile, the executive understood the world of finite resources and budgets were any newly approved project would finally take resources and funding away from another request. His mantra was well-known:
The problem was not related to the value or premise of any individual project. The issue was the approach that each individual activity and project took. Each activity solves data needs independently from one another without any awareness of the overlapping costs and efforts.
There was no data reuse, no data sharing, or any economies-of-scale activities to simplify or decrease the cost of data development and movement.
Most of the projects required access to the same data content. Unluckily, there was no coordination to avoid overlapping and wasted work.
Users found inconsistencies across reports as the source data wasn’t documented, and it varied across individual reports.
Common data across separate applications were accessed by business users. Data value formatting and names varied across applications.
The result was processing overlaps, duplicate data, and little awareness that individual projects were replicating work. There wasn’t anything in the position to support collaborating, communicating or sharing data practices and methods across systems and projects.
The problem: Every project at the bank dealt with data issues as a one-off, built-from-scratch activities.
The 5 Components of a Data Strategy
Historically, IT companies have defined a data strategy with a focus on storage. They’ve created comprehensive plans for managing and sizing their platforms and they’ve developed a sophisticated approach for handling data retention. While this is definitely important, it actually addresses the tactical aspects of content storage- it’s not planning for how to improve all of the ways you store, acquire, share, manage, and use data.
Data storage must be addressed by a data strategy, but it must also take into account the way data is accessed, identified, understood, shared, and used. In order to be successful, an effective data strategy has to include each of the different disciplines inside data management.
Five core components are there of a data strategy that work together as building blocks to comprehensively support data management across an organization: identify, store, provision, process and govern.
Identify data and understand its meaning regardless of origin, structure, or location.
One of the basic constructs for sharing and using data inside a company is establishing a way to represent and identify the content. Whether it’s unstructured or structured content, processing and manipulating data isn’t possible unless the data value has a defined format, a name, and value proposition, even unstructured data has these details. Establishing consistent value conventions and data element naming is core to sharing and using data. These details should be independent of how the data is stored in a database, file, etc., or the physical system where it resides.
It’s also very critical to have the means of accessing and referencing metadata associated with your data like origin, definition domain views, location, etc…
Quite similar to the way that having an accurate cared catalog supports an individual’s success in using a library to retrieve a book, successful data usage depends on the existence of metadata (to help retrieve very specific data elements). Consolidating business meaning and terminology into a business data glossary is a common means of addressing part of the challenge.
Libraries have card catalogs as it is not practical to remember the exact location of every book. Metadata is important for business data usage as it is not possible to know the meaning and location of all of the organization’s business data- thousands of data elements across several data sources. And without data identification details, you would be forced to undertake analysis effort and a data inventory every time you wanted to include new data in your analysis or processing activities.
Lacking a metadata and data glossary, organizations are more likely to neglect some of their most prized data assets as they won’t know they exist. And if data is actually a corporate asset, a data strategy has to ensure that all of the data can be identified.
Persist data in a location and structure that supports shared, easy access and processing.
Data storage, although a complex discipline- is one of the most basic capabilities in an organization’s technology portfolio.
Most IT companies have mature methods for managing and identifying the storage needs of the individual application system; each system gets sufficient storage to support its own storage and processing requirements.
Whether dealing with analytical systems, transactional processing applications or even general-purpose data storage like email, files, pictures, etc., most companies use sophisticated ways to allocate storage and plan capacity to the various systems. Unluckily, this method only reflects a “data creation” perspective. It does not encompass data usage and sharing.
The gap within this method is that there’s rarely a plan to efficiently managing the storage needed to move and share data between systems. The reason is quite simple; the most visible data sharing in the IT landscape is transactional in nature. Transactional details between applications are shared and moved to finish a specific business process. Bulk data sharing isn’t well-understood is usually perceived as an infrequent or one-off occurrence.
With the rising popularity of big data, the growth of increased information and business analytics sharing between organizations, it’s much more common to share bulk or large volumes of data. Most of this shared content falls into two categories: externally created content (cloud applications, third-party data, syndicated content, etc.) and: internally created data (customer details, purchase details, etc.). The lack of a centrally managed data sharing process usually forces all systems to manage this space individually, so everyone builds their own copy of the source.
As companies have evolved and the data assets have grown, it has become clear that storing all data in a single location isn’t feasible. It’s not that we can’t develop a system large enough to hold the content. The issues are that the distributed nature and size of our companies- and the diversity of our data sources- makes loading data into a single platform not practical. Everyone doesn’t need access to all of the company’s data; they need access to specific data to support their individual requirements.
The key here is to ensure there are practical ways of storing all the data that is created in a way that enables it to be easily shared and accessed. You don’t have to store all the data in one place; you need to store the data once and then provide a method for users to find and access it.
Once the data is created, it will be shared with many other systems; it’s important to address storage, in a more efficient way that simplifies access. A good data tactic will ensure that any data created is available for future access without everyone to create their own copies.
Package data so it can be shared and reused, and provide access guidelines and rules for the data.
During the early days of IT, the majority of the application systems were created as independent, individual data processing engines that included all of the data required to perform their defined duties. There was very little or no thought given to sharing data across applications. Data was stored and organized for the convenience of the application that created, collected and stored the content.
Majority of the application systems were not designed to share data. The rules and logic required to decode data to be used by others almost never documented or even known outside of the application development team. Meanwhile, most IT organizations don’t give staff resources and budget to address non-transactional data sharing. Instead, it’s handled as a convenience or courtesy- and usually addressed as a personal favor between staff members.
When data is shared, it’s generally packaged at the convenience of the application developer, not the data user. An approach like this may have been acceptable in the past when only a couple of teams and just a few systems needed access. But it’s totally impractical now, where IT manages dozens of systems and depends on data from several sources to support individual business processes. Sharing and packaging data at the convenience of just a single source developer- instead of the individual managing 10 downstream system that need the data- is just ridiculous. And expecting individuals to understand the idiosyncrasies of dozens of source application systems just so they can leverage data is a huge waste of time.
Data sharing is no longer a specialized technical capability to be addressed by the programmers and architects of the application. It has now become a production business need. Companies are highly dependent on data being distributed and shared to support both analytical and operational needs. Sharing data can’t be managed as a courtesy; the way method of sharing and packaging data can’t be treated as a one-off need.
On the other hand, if an organization’s data is really a corporate asset, and then all the data must be prepared and packaged for sharing. To treat data as a valuable asset instead of a burden of doing business, a data strategy has to address data provisioning as a standard business process.
Combine and move data residing in disparate systems, and provide a unified, consistent data view.
Data produced from the applications is a treasure trove of knowledge- but data is a raw commodity at the time of its creation. It hasn’t been transformed, prepared or corrected to make it “ready to use”. The data strategy component- a process that addresses the activities needed to evolve data from a raw ingredient into a finished good.
Source system data is very much like a raw ingredient in the manufacturing process. Data produced from an application is very much a raw ingredient. At most organizations, data sources from both external and internal sources. While internal data is sourced from a dozen of application systems. On the other hand, external data may be delivered from a range of different sources. Meanwhile, data is usually rich with information; it wasn’t packaged in a fashion to be integrated with the unique set of sources that exist inside each individual organization. To make data ready to be used, a series of steps are required to correct, transform, and format the data. And the outcome of this process is small series of homogeneous data sets that can be integrated or merged by a data user with a series of data preparation assignment specific to their individual requirements.
It is quite usual for organizations to establish a centralized team to address data standardization, cleansing, integration, and transformation for the data warehouse. Unluckily, many companies have learned that this kind of processing isn’t unique to a data warehouse. And most of the data users require ready-to-use data- so these users end up taking on the development effort themselves. Developing match records and code identity across these individual sources can be very complex, especially when some systems need data from 20 or more sources.
Developers invest a huge amount of time creating logic to link and match values across a multitude of sources. Unluckily, as each new development team needs access to individual data sources, they reinvent or reconstruct the logic required to link values across the same data sources. The tragedy of data integration is that this rework occurs with each new project ass the learning of the past is never grabbed for reuse purpose.
While most companies have initiatives to address code collaboration and reuse for application development, they have not focused this effort on delivering data that is ready to be used and promotes reuse and sharing. It’s impractical for the data users to become developers. Making data ready to be used is about establishing processes and providing tools to generate data that individual can easily use- without any involvement from the IT department.
Manage, establish, and communicate information mechanisms and policies for effective data usage.
As data is still seen as a byproduct of application processing, few companies have completely developed the processes and ways to effectively manage data outside the context of an application and across the entire enterprise. While many companies have started to invest in data governance initiatives, many are still in the infancy stage of their respective initiatives.
Most of the initiatives of data governance begin by addressing tactical issues like terminology standards or business rule definition, data accuracy, and are confined to specific companies or project efforts. And as governance grows, and as data usage and sharing issues gain visibility, governance initiatives usually widen in scope. As those initiatives widen, companies may establish a set of data rules, policies, and methods to ensure uniform data manipulation, usage, and management.
Usually, data governance is seen as a rigor specific only to analytics and users environment. Actually, data governance applies to all systems, applications, and staff members. The top challenge with data governance is adoption, as it is an overreaching set of data rules and policies that everyone must follow and respect.
The reason for creating a strong governance process is to ensure that once data is decoupled from the application that created it, the details and rules of the data are respected and known by all other data constituents. The role that governance plays within an overall data strategy is to ensure that data is properly managed consistently across the entire organization.
Effective data governance ensures that data is consistently manipulated, managed, and accessed; whether it is for determining data correction logic, security details, data naming standards or even establishing new data rules. And the decisions about how the data is manipulated, processed or shared aren’t made by an individual developer; they’re established by the policies and rules of data governance.
It shouldn’t come as a surprise that a data strategy has to include data governance. It’s just impractical to move forward- without an integrated governance effort- in establishing a road and plan to address all the ways you store, capture, use, and manage information. Data governance offers the required rigor over the data content as changes take place to the processing, technology, and methodology areas associated with the data strategy effort.
The Power of a Data Strategy
Data strategy component’s strength is that they help you identify tangible, focused goals within each of the individual discipline areas. Every company has a unique set of skills and a different combination of weakness and strength. And moving forward with a data strategy begins with identifying the weakness and strengths that exist within your data environment- and identifying a measurable and achievable combination of goals that will improve data sharing and access. The purpose of the component isn’t to identify every potential activity within a data strategy; the components provide visibility into the different disciples that contribute to a data strategy.
A data strategy initiative isn’t a once-and-done effort; by its nature, a strategy is a long-term combination of goals. It’s quite common to identify a multiyear set of goals and identify a shorter-term set of the delivery milestone. This enables the strategy to undergo measurement and review on an ongoing basis to avoid the types of challenges that the bank executive mentioned. The component offers a way categorizing activities and identifying shorter-term deliverables.
A data strategy offers visibly into the relationship of each component have with each other. And if you don’t coordinate the activities, you risk delivering a series of point solutions that can’t work together.
The idea behind a data strategy isn’t to create a perfect world that can effectively address any unforeseen data need. Instead, the true power of a data strategy is that it positions you to deliver the best possible solution as your company’s needs grow and evolve. When the gaps become visible and new requirements arise, the component framework provides a way for identifying the changes needed across your company’s several data management capability and technology areas. Your data strategy is a road map and a means for addressing both future and existing data management needs.