What are the four principles of data mesh and how can they lead to ‘Industry 4.0’?
By Monica McDonnell, Automotive Industry Consultant at Teradata
The importance of Artificial Intelligence (AI) and Machine Learning (ML) in factories cannot be overlooked, but there are still challenges that must be overcome. These obstacles do not lie in the data science as the code usually works, but are based on architecture and data governance. Manufacturers must find cost-effective and efficient ways to use and reuse data across a number of projects.
Although, some manufacturers have started to use the four principles of data mesh, with the aim to evolve their architecture into one that delivers “Industry 4.0” projects at scale, whilst avoiding the frequent hurdles.
The common failures
Initially, the main issue was technology scaling, which was labeled ‘pilot purgatory’. This refers to processes that work in the lab but are unable to be scaled to support the demands of the production environment.
However, once this issue is corrected, ‘use case purgatory’ emerged as a new failure mode, in which the challenge is resource scaling. Businesses were delivering Industry 4.0 solutions using an Industry 3.0 architecture. In this way, even though use cases were deployed, they are done so at a high price. The development costs of a dedicated data pipeline along with the support cost of maintaining isolated apps mean that failure in this purgatory is sparsely populated and additional use cases are out of reach due to limited budgets.
Clearing the hurdles with the four principles of a data mesh
Better use of resources is the way out of use case purgatory. Although in a factory environment, the amount and variety of data are large and continue to grow, it is also finite and multipurpose. Take sensor data that comes from a machine, for example, it can support maintenance use cases and process improvements and can provide insight into the quality of products produced. Machine data is a large data set and need very specific operator knowledge. They must be able to provide context to measurements, alerts, and error codes. To gather and prepare this set of data multiple times is a waste of data science resources, so organizations should focus on collecting, cleansing, and pre-processing it once, for all potential use cases.
This is why the four principles of a data mesh should help make better use of limited resources.
Firstly, domain-driven ownership of data means that those who understand contextual data are given the responsibility to prepare this for multiple use cases. As operational technology (OT) and IT have traditionally been separate, it should be easy to implement the ownership principle.
Secondly, all data must be treated as a product, which is a new approach, but should be readily embraced. These data products can be as simple as data sets that are prepared for analytic purposes, or those which are more complex such as the output of ML routines.
Although these two principles do not require specific technology, rather the main implication is organizational. Distributed groups must accept responsibility for ownership of a specific set of data and ensure that this is available to all authorized users. Beneficially, this reduces costs and increases security by reducing data movements and redundancies.
Next, a self-serve data infrastructure as a platform must be implemented. This is critical to allow a number of teams to build on the work of others to create their own insights from the data and products.
Lastly, federated computational governance needs a blend of mesh-oriented governance practices and be the ability to automate some tasks such as schema or lineage creation.
The final two principles imply some technical capabilities or tools and need ongoing investment to find, understand, use, and re-use data, no matter where it is stored. Instead of having to move data at a high cost, organizations can implement an open data ecosystem and bring analytics to the data, which is perfectly aligned with implementing a data mesh.
This open ecosystem will also help manufacturers achieve the ideal trifecta of good, quick, and cost-effective in their Industry 4.0 analytics projects. In this,
- Good: This means increased use case deployment based on trusted and secure data.
- Quick: Leads to increased speed of use case deployment and use case execution.
- Cost-effective: Allows for limited data movements and redundancies, which improves compute efficiency.
Manufacturing operations, supported by both IT and OT are close to implementing the data mesh principles. The full potential of AI and ML in a factory should only need small adjustments. Data governance practice should adapt ownership and introduce data products, in addition to scalable compute power to create operational efficiency in a diverse data ecosystem in the future.