Challenges in data-driven org transformation
According to a 2023 survey by Wavestone's New Vantage Partners, and relayed by Harvard Business Review, 80% of data executives and business leaders cite cultural impediments – people, business process, organizational alignment – as the primary barrier to data-driven organizational transformation.
Complexity and polarization in data-driven organizations
Without organizational alignment, it is hard to implement process changes, to mobilize people and make them receptive to the necessary changes to establish a data culture. One particularly difficult alignment to realize is between the business and IT sides of organizations. Bill Inmon has referred to this issue as "The great divorce between IT and Business", the consequences of which are just too important to ignore, and for which he offers some tactical actions.
Before we can propose our own solution for this situation, it is important to identify some of the critical root causes.
Increased complexity in technology stacks
The architecture of monolithic applications used to be fairly simple with 3 tiers:
The evolution towards modern event-driven architecture patterns, microservices, serverless computing, container orchestration, and cloud architectures represents a shift towards more modular, scalable, and flexible systems that can adapt to the dynamic demands of today's technology landscape. This transformation has brought about a wealth of new possibilities and capabilities facilitated by new tools, technologies, and practices to help organizations design and manage increasingly complex applications.
However, it also introduces new challenges related to monitoring, security, and data consistency, which require careful consideration in the design and operation of these systems.
Polyglot persistence is a natural response to the diversity of data types and access patterns in modern applications. It empowers organizations to select the right database for the right job, improving performance, scalability, and data modeling choices. However, it also introduces complexity in terms of data integration, synchronization, and management, which should be carefully addressed in the application architecture to ensure data consistency and overall system reliability.
It used to be that a single team could determine the business needs, then design the whole application, build it, then run the infrastructure. The complexity today is such that collaboration between complementary teams is the only way to run complex projects on state-of-the-art technology.
The venture capitalist Mark Andreesen wrote in the Wall Street Journal that “software is eating the world”, highlighting the growing influence and ubiquity of software in various industries. Also encouraged by other quotes like Mark Zuckerberg's "move fast and break things", as well as by an erroneous interpretation of the principles of the Agile Manifesto, many developers have adopted a code-first approach which carries inherent risks, including poor quality and a lot of rework, when it turns out that a little forethought would have helped tremendously.
It used to be that data modelers facilitated well-designed structures from the start of software projects, thanks to their unique ability to understand the business and translate business requirements into quality data structures. Earlier in the century, data modeling started to grow out of fashion, due to several factors. Some Enterprise Data Modeling initiatives went rogue and got in the way of getting things done. Some agile developers assumed that they knew enough about the business needs to do their own data modeling, or just started to code first, thinking that data was self-described or that schema-on-read was sufficiently self-explanatory. NoSQL vendors started to push the concepts of schemaless databases and schema-on-read, giving a false sense that data interpretation was intuitively obvious. And some business executives preferred quick-and-dirty approaches to future-proof solutions.
The ownership of data in an organization is a complex and often debated topic. In reality, it's not a matter of the business or IT owning the data exclusively. Instead, it's a shared responsibility that involves both business stakeholders and IT teams. The roles and responsibilities of these groups may differ, but they should work together to ensure the proper management, governance, and utilization of data.
According to the same report by Wavestone's New Vantage Partners, there was previously a general agreement that the Chief Data Officer (CDO) function was best suited to fit within the Chief Information Officer (CIO) organization. Now, a majority – 55.6% – of CDOs (or CDAOs -- Chief Data Analytics Officers) appear to report to business rather than technology functions.
The key to successful data management is collaboration between business and IT. Ownership of data should be seen as a partnership. Business stakeholders define what they need from data, while IT teams provide the tools, infrastructure, and expertise to support those needs. Effective data governance practices can help establish roles and responsibilities and ensure that data is used responsibly, securely, and in alignment with business objectives. Ultimately, the organization as a whole benefits from this shared approach to data ownership, as it ensures that data is both a valuable business asset and a well-managed technical resource. In practice,
The installation of Data Catalogs, Metadata Management (MDM), and Data Quality Management (DQM) solutions was supposed to facilitate the collaboration. Unfortunately, there is a misalignment between the business-focused ownership of metadata management solutions (typically overseen by Chief Data Officers or business stakeholders) and the technical needs of IT developers, leading to challenges and inefficiencies in data management and utilization.
In particular, these solutions lack a critical aspect of data management: the robust support for schema design, definition, evolution, validation, version control and lifecycle management. Such shortcomings explain the struggle for adoption of Metadata Management suites by technical users in the organization..
Absence of a single source-of-truth for metadata
A contributing factor to the lack of built-in hand-in-hand collaboration between business and IT, or maybe a cause of it(?), is the absence of a single source-of-truth (SSOT) for the meaning and context of data and data structures. Instead of a genuinely single source-of-truth, organizations have a multitude of receptacles for metadata containing often out-of-date and/or contradictory information. Such fragmentation and lack of synchronization leads to tensions, ambiguity, errors, and most likely distrust for the tools in place and a drop of adoption.
Typically, each organization may own several of these families of metadata receptacles:
- Git repository of source code for application developers, most likely with a Git platform (GitHub, GitLab, Bitbucket or Azure DevOps Repos), possibly with some schemas and/or descriptions of structures in application code;
- technical data catalog used to facilitate queries (AWS Glue, Databricks Unity, Hive Metastore, Oracle Data Catalog, etc.);
- schema registry for pub/sub pipelines so producers and consumers of transactions can understand each other (Confluent, RedPanda, Amazon MSK, Pulsar, Azure EventHub, etc.);
- documentation site for REST APIs (own website, SwaggerHub, ...);
- repository for data models stored in a proprietary format, from a legacy provider, and used by data architects and data modelers;
- data governance solutions used by the data citizens on the business side (Alation, Apache Atlas, Azure Purview, BigID, Collibra, DataHub, Informatica EDC, or some sort of homegrown solution.)
These tactical and fragmented solutions quickly create additional silos, each owned by different communities in the organization with different requirements and objectives. They foster the lack of collaboration and coordination between the different stakeholders. Each of them wants information to be organized for their own needs, and according to their own processes, without concern for the other communities.
It was the promise of data governance and metadata management solutions to solve this challenge. Unfortunately, these often very expensive solutions have revealed serious shortcomings in meeting expectations of organizations and their data leaders.
They may be fine for business metadata, but they fail to recognize the needs surrounding technical metadata. Business metadata helps business stakeholders understand and interpret data without delving into technical details, whereas technical metadata focuses on the technical attributes of data, and is used to manage and maintain data, design databases and data exchanges, and ensure data quality and performance. They are both essential for comprehensive data management, and they often complement each other in describing and maintaining data assets within an organization.
As each community goes its own way, metadata management solutions failed to fulfill the promise of creating a single up-to-date source-of-truth for the metadata of organizations. Mainly because their implementation has been biased towards one community, at the expense of the others. CDOs and data citizens in the business might be originally happy to own the platform, until they realize that developers have their own interpretation of data, and make applications evolve without consulting them.
On the business side, metadata management suites primarily focus on documenting and organizing metadata after data structures are already in place, an inherently reactive method. Such a reactive approach greatly increases the risks for inaccuracies and omissions in both the data itself and its metadata that may not fully reflect the complexities and nuances of the data structures, resulting in incorrect assumptions, misunderstandings, and errors. Let's add to that inconsistencies, operational inefficiencies, data security and compliance risks, scalability challenges, higher costs and lack of trust.
Data model marts by legacy data modeling providers also fail to deliver on the promise of a single source-of-truth, if only because the solution they provide is proprietary and not fully integrated with any of the other silos.
Proceed to the next page to see the solution we propose with Hackolade Studio to improve collaboration, make complexity manageable, align the interests, and in the end restore the trust between business and IT...Hint...it does NOT involve the creation of yet another receptacle!