By David Meyer, SVP Product Management at Databricks
Data is a fundamental part of digital transformation as it enables automation, innovation and improves decision making. By having access to the right data in real-time, organisations can take actions or make decisions that best serve customer needs, transform business models and respond to rapid changes in demand.
Data underpins digital transformation
Data empowers data professionals to identify trends, opportunities, and problems in business operations, which can then be addressed to improve overall performance. SEGA Europe is an excellent example of this. The worldwide leader in interactive entertainment uses a data lakehouse platform to personalise the player experience and build its own machine learning algorithm to help target and tailor games for over 30 million of its customers. During the lockdowns of 2020, more anonymised data was being collected through an analytics pipeline than ever before. The team needed a dedicated computing resource to handle the sheer volume of data, extract meaningful insights from it and to enable the data science team to improve general workflow. Less than a year on, SEGA Europe is already reaping the rewards and has succeeded in improving the player experience.
The data hurdles
However, despite better data analysis and collection being a clear driver of digital transformation, many barriers still remain. According to a 2021 Databricks and MIT Technology Review Insights survey, the main challenge slowing organisations from delivering on their data strategy initiatives is that data management platforms do not easily scale ( cited by 44% of respondents). Other often-cited obstacles are slow processing of large data volumes and difficulties in facilitating collaboration. To have the right pillars in place, you need to first bring all data together, then provide all teams and departments with the right tools and infrastructure to draw insights and to drive innovation – all the while, following security and privacy protocols. In fact, according to the survey, the most critical advantages of an ideal new data architecture over an existing one is stronger security and governance (49%) and open-source standards and open data formats (50%).
The success of digital transformation hinges on the ability to make critical business decisions based on actionable insights, otherwise there is a major risk of higher costs and the transformation taking longer than it needs to. But whilst most of today’s organisations are collecting huge volumes of data, they are often storing it in the wrong places, and failing to capture the most important insights that enable cost- and time-efficient transformation and long-term stronger performance.
Creating a robust data architecture
When it comes to more efficiently storing, cleaning and analysing data, there are several options for organisations. There’s the data warehouse, the data lake and the data lakehouse. The data warehouse’s analytical infrastructure allows for high reliability of data, strong governance and security, and high performance of the data, but is unable to store the huge volumes and range of data found in organisations today. In contrast, data lakes can store large amounts of data at a lower cost and for open data formats. All types of data, whether structured, semi-structured, or unstructured, have open access and can scale. But data lakes can become disorganised and flooded with information, turning into a ‘data swamp’, offering poor quality data and performance. This is why the data lakehouse is emerging, bringing the necessary data structure and data management directly on top of data lakes to ensure organisations can draw out the crucial insights they need through harnessing the strengths of the data warehouse and data lake.
What’s most important about data is that all of it is accessible, can be easily contextualised, and that people within an organisation can interact with it. Unfortunately, in many cases, it can be isolated from many people within a company and controlled by a select few. This is hampered further by fragmenting data or creating subsets that can cause data drift, impacting the quality of the data, which means an organisation can lose its single source of truth. Having it in one place can also help automate for scale. Data access, machine learning models, and other templates can be configured in an automated way so it can be deployed across departments and business units, allowing for a much healthier data culture.
Data transformation, underpinned by a robust lakehouse architecture, puts the right data in the right place, makes it accessible across an organisation, and ensures it is analysed effectively. It’s no secret that there’s a competitive edge to be had by crunching and analysing enormous amounts of data, but it must be done in an efficient and effective manner. Getting data transformation right accelerates digital transformation. It’s the most robust way for businesses to scale and complete in 2021 and beyond.