Leading digital advertising agency advances data analytics using Vertica on AWS cloud

THE CHALLENGE

The client wanted to use their business intelligence tool to query their historical data for 90 days in under two seconds with more than 80 concurrent users. Their current system included data in Hadoop clusters and Infobright DB on-premise cluster, which was unable to handle their data analytics requirements.

THE SOLUTION

Beyondsoft’s Big Data consulting team proposed a solution using Vertica, a columnar database on Amazon Web Services (AWS). The AWS cloud solution would provide the added scalability, elasticity, and performance that the customer wanted. The project consisted of creating Vertica clusters in a repeatable manner and a pipeline-based approach for Vertica DDL. It also included moving large amounts of data daily to the Vertica cluster.

The project consisted of three phases:

  • Phase 1 involved creating a repeatable deployment process through infrastructure as code (Terraform) for Vertica cluster. The Vertica AMI was procured from AWS Marketplace. Beyondsoft engineers added the ability to launch different Vertica nodes as part of a cluster through tags to AWS Elastic Load Balancer (ELB). The Vertica cluster consist of two AZ’s in active/active nodes, with 16 nodes total and 90 days of data, which came to approximately 40TB. AWS Systems Manager Service (SSM) and CloudWatch Logs are used for administration of the cluster. This Vertica infrastructure as code also integrates with the customer’s self-servicing tool for their developers.
  • Phase 2 included a pipeline-based approach for Vertica DDL. Vertica DDL is pushed through the pipeline using Liquibase, a java framework for database change and deployment. This ensures that production Vertica clusters are not touched manually for schema changes.
  • Phase 3 involved setting up ETL from the Hadoop cluster to Vertica using a producer/consumer pattern. The ETL code is written in python code with on-demand Fargate containers, which extract data from Hadoop and store it in zipped files in S3. From there, jobs are created to load data into Vertica from S3. The data is around 120GB/day with around 570M rows loaded at its peak. The customer-facing java application has several dashboards which are able to procure data from Vertica in under two seconds query time with concurrent usage.

TECHNOLOGIES USED

Vertica on AWS, AWS SSM, AWS CloudWatch logs, AWS S3, AWS ELB, AWS Fargate, AWS Parameter Store, AWS ECR, Python, Jenkins, etc.

KNOWLEDGE TRANSFER

Beyondsoft educated the client’s data analytics team around the newly created solution and Terraform and provided a runbook, to enable them to both manage and add to the solution in future. Beyondsoft also provided education on the various AWS services and customized training sessions on various topics.

BENEFITS

Moving from an on-premise cluster to the cloud increased the scalability, agility, and performance of the whole solution. Taking a DevOps approach through data pipelines decreased go-to-market time for code changes. Infrastructure as code provided a repeatable way to create infrastructure, increasing operational consistency and reducing bugs.

Download Case Study

なぜ私たちを選べますか

私たちのオンショア、ニアショア、オフショアのデリバリーサービスは、24時間365日、お客様のビジネスをサポートします。日本の大手SI企業に対し数十年にわたりサービスを提供しています。1999年以来、日本で25年の経験を持っています。日本には500-600人のスタッフがいます。長年にわたる成功事例は、私たちがお客様の投資対効果においてどれほど重要であるかを証明しています。シンガポールは私たちのグローバル本部であり、世界各地に14の地域オフィスを持っています。

30年以上にわたる強力なITコンサルティングサービスの経験

COBOL, C, Java, Pythonなど幅広い開発言語や開発環境に対応

SAPにおけるABAP, BTP, Fioriなど幅広く対応

40以上のグローバルデリバリーネットワークを持つ4大陸をカバー

CMMI 5、ISO 9001、ISO 14001、ISO 20000、ISO 27001、ISO 22301、ISO 45001、TMMi5の認証

マイクロソフトの専門家であるAzure MSP

Beyondsoftと共にビジネスの潜在力を最大限に引き出しましょう。私たちがどのようにイノベーションを推進し、効率を向上させ、ビジネスの成長を実現するのか、ぜひご相談ください。