Insights

How a upsertable data lakes can simplify the data lake journey

Migrating data from traditional SQL databases to a data lake presents multiple advantages for organizations.

Data lakes offer the flexibility and scalability to manage growing volumes of data. Data lakes enable organizations to take advantage of disconnected, structured and unstructured data streams such as customer data, IoT sensors, and click streams. Organizations can more cost-effectively store unused data and leverage it down the road for advanced analytics as opportunities arise and as new technologies become available. Furthermore, because they separate storage from compute, data lakes offer far greater flexibility and scalability.

At the same time, migrating to a data lake architecture can be daunting for organizations whose database administrators (DBAs) are accustomed to SQL databases and transactions characterized by atomicity, consistency, isolation, and durability (ACID). Because data lakes are not ACID-compliant, simple DBA tasks such as updating or deleting records are not as straightforward, since data lake files cannot be changed or updated directly. Instead, these tasks require complex scripting and file handling.

The good news is that traditional DBAs can perform these operations with a similar level of skill and effort using what we’re calling “upsertable data lakes”—essentially ACID-compliant, modern table formats that combine the best features of data lakes and data warehouses. In this article, we’ll explore how upsertable data lakes such as Delta Lake and Hudi can ease your organization’s transition to a data lake architecture, all while leveraging the experience and knowledge of your existing DBAs.

Some challenges of data lakes

Managing a data lake is different from managing a relational database and requires different skillsets. Data lakes require data validations in multiple places, and preventing failures necessitates the handling of files and complex scripting that is unfamiliar to traditional, SQL-centric DBAs.

Then there are the challenges of cleansing or deleting data to comply with data retention and privacy laws such as GDPR. In a relational database, these tasks can be accomplished easily through SQL commands. But in a data lake, it can be a complex endeavor, requiring skillsets that fall outside a traditional DBA’s wheelhouse. And because files are immutable, there is some risk of instability during transitions if transactions are not ACID compliant.

How upsertable data lakes reduce complexity

Because they fuse the best of data lakes and data warehouses, upsertable data lakes such as Delta Lake and Hudi provide a more familiar interface for DBAs. Upsertable data lakes take what DBAs love about data warehouses and apply them to a data lake setting, which can make the transition to a data lake architecture easier. Thanks to a rapidly closing feature gap between upsertable data lakes and relational databases, the data lake transition is a much more familiar journey.

To illustrate this feature familiarity, let’s say a DBA needs to delete or update a record. Performing this simple task in a traditional data lake can require the creation of complex scripts. But with an upsertable data lake, it’s a simple matter of running a delete statement or an update statement, as would happen in a relational database.

Furthermore, due to the immutable nature of data lakes files, changing the record requires the creation of a new file with the change, and the deletion of the old one. During this process, oftentimes the data will be in an unstable state. This is where an upsertable data lake comes in like a hero, bringing ACID compliance to transactions to guarantee data validity. Thanks to ACID compliance, an upsertable data lake abstracts all the file handling underneath the hood, which means DBAs no longer have to think about the file handling. The data is never in an unstable state.

Contact Beyondsoft for a data architecture health check

Where are you on your data lake journey? Beyondsoft has performed hundreds of data migrations and big data projects for large enterprise customers. Our certified practitioners have hands-on, best-practice knowledge of all the major platforms, including AWS and Azure.

We can partner with you to analyze your data architecture and business requirements to help you determine if an upsertable data lake is the right fit and determine the best migration strategy. To learn more about how Beyondsoft can help you with your data lake journey, contact us today.

なぜ私たちを選べますか

私たちのオンショア、ニアショア、オフショアのデリバリーサービスは、24時間365日、お客様のビジネスをサポートします。日本の大手SI企業に対し数十年にわたりサービスを提供しています。1999年以来、日本で25年の経験を持っています。日本には500-600人のスタッフがいます。長年にわたる成功事例は、私たちがお客様の投資対効果においてどれほど重要であるかを証明しています。シンガポールは私たちのグローバル本部であり、世界各地に14の地域オフィスを持っています。

30年以上にわたる強力なITコンサルティングサービスの経験

COBOL, C, Java, Pythonなど幅広い開発言語や開発環境に対応

SAPにおけるABAP, BTP, Fioriなど幅広く対応

40以上のグローバルデリバリーネットワークを持つ4大陸をカバー

CMMI 5、ISO 9001、ISO 14001、ISO 20000、ISO 27001、ISO 22301、ISO 45001、TMMi5の認証

マイクロソフトの専門家であるAzure MSP

Beyondsoftと共にビジネスの潜在力を最大限に引き出しましょう。私たちがどのようにイノベーションを推進し、効率を向上させ、ビジネスの成長を実現するのか、ぜひご相談ください。