April 3rd 2023

Emerging classes of real-time analytical databases are providing a solution for companies that need to analyze petabytes of real-time event data combined with historical data and get the answer back in less than a second. Traditional data warehouses are unable to deliver the SQL query performance required by some of the largest companies worldwide. One of the leading members of this new class of databases is Apache Druid, which is gaining popularity among the most sophisticated clients like Salesforce and Confluent.

Although cloud data warehouses have transformed the economics of big data storage and enabled advanced analytics, they still cannot deliver the SQL query performance required by some of the largest companies on the planet. For companies that cannot afford to build their scale-out, column-oriented analytical databases or do not want the hassle, they can rent access to a cloud data warehouse with practically unlimited scalability.

David Wang, Vice President of Product Marketing at Imply, the company behind the Apache Druid project, describes Druid as sitting at the intersection of analytics and applications. Imply calls it a real-time analytics database, while other companies have different names for it. The new class of database provides a solution to companies that need to analyze large amounts of data quickly and efficiently, especially those that need to do so thousands of times per second.

The technical capabilities of traditional data warehouses quickly become insufficient when companies need to analyze petabytes’ worth of real-time event data combined with historical data and get the answer back in less than a second. The emergence of new classes of real-time analytical databases like Apache Druid is helping to address these challenges, enabling companies to achieve their performance requirements and deliver on their analytical and application requirements.

Companies that require real-time, high-concurrency analytics are turning to a new class of real-time analytical databases, led by Apache Druid. Traditional data warehouses are no longer able to deliver the performance required to analyze petabytes of real-time event data, which has led to the rise of real-time analytics databases.

Imply, the company behind the Apache Druid project, calls it a “real-time analytics database,” while other outfits have different names for it. Druid is finding followers among some of the most sophisticated clients, including Salesforce and Confluent. The combination of capabilities developed over years, such as highly optimized storage formats and leveraging dictionary encoding, bit-mapped indexing, and other algorithms, allows Druid to minimize the amount of data analyzed, resulting in significantly better performance than traditional databases.

The separation of compute and storage has become standard in the cloud, but it is no longer feasible for companies that require upper-level analytics performance on large, fast-moving data.

Companies that need to analyze large volumes of real-time event data and historical data at high concurrency and speed are turning to real-time analytical databases. Apache Druid is one such database that is finding adoption among advanced clients, including Salesforce and Confluent. While cloud data warehouses like Google Big Query and Amazon Redshift have made big data storage more affordable and scalable, they cannot deliver the SQL query performance required by some of the largest companies.

According to David Wang, the Vice President of Product Marketing at Imply, Druid was purpose-built for the intersection of analytics and applications, where data must be analyzed at speed and concurrency on operational data. Druid achieves high performance through a combination of capabilities, including CPU efficiency and a highly optimized storage format that leverages unique approaches to segmentation, partitioning, and data storage.

Druid enables users to analyze trillions of rows of data on the fly to get instant responses to ad-hoc pivot table queries. The system has been adopted by Salesforce to analyze different aspects of its cloud, including performance, bugs, and triaging issues. Salesforce’s engineering team uses Druid to power its edge intelligence system, which requires a fast engine to process the scale of its event data. Druid is not intended to compete with traditional data warehouses, as its workload is different, and it can support thousands of concurrent queries per second, which is significantly more than what traditional data warehouses can support.

Confluent, a company that specializes in Apache Kafka, has chosen to use Druid to power an infrastructure observability offering for its cloud operations, both internally and externally. The company required an analytics database that could ingest 5 million events per second and deliver subsecond query response time across 350 channels simultaneously, making Druid the perfect solution.

Druid was selected due to its ability to query both real-time and historical data. Over 1,400 companies have put Druid into action, including major names like Netflix, Target, Cisco’s ThousandEyes, and various major banks. Imply has been developing its cloud-based Druid service, Polaris, and is set to announce more new features next month.

The post Apache Druid Charms in Upper Echelons of OLAP Database Performance first appeared on Business d'Or.

Source

This post first appeared on Bugatti Chiron Successor To Don A More Athletic Shape, Says Designer, please read the originial post: here

People also like

Apache Druid Charms in Upper Echelons of OLAP Database Performance

Related Articles

Share the post

Subscribe to Bugatti Chiron Successor To Don A More Athletic Shape, Says Designer

Thank you for your subscription