100% found this document useful (1 vote)
224 views45 pages

Tenorshare 4uKey Crack Latest Version

StarRocks is an open-source query engine designed for real-time analytics, offering high concurrency and low latency while enabling direct queries on data lakes. It features a cloud-native architecture with separated compute and storage, supports various open table formats, and has shown significant performance improvements over competitors in benchmark tests. The platform is backed by a strong community and aims to optimize data warehouse capabilities while providing user-friendly analytics solutions.

Uploaded by

naeem55ddf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
224 views45 pages

Tenorshare 4uKey Crack Latest Version

StarRocks is an open-source query engine designed for real-time analytics, offering high concurrency and low latency while enabling direct queries on data lakes. It features a cloud-native architecture with separated compute and storage, supports various open table formats, and has shown significant performance improvements over competitors in benchmark tests. The platform is backed by a strong community and aims to optimize data warehouse capabilities while providing user-friendly analytics solutions.

Uploaded by

naeem55ddf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

StarRocks Technical

Overview
A Linux Foundation Project

Albert Wong, albert.wong@celerdata.com


Agenda
● State of the OLAP software landscape
● What is StarRocks
● StarRocks’ Release Timeline
● Major Features in StarRocks
State of the OLAP software
landscape
Trends in OLAP databases
Online analytical processing (OLAP) databases are evolving rapidly to meet the demands of
modern data analytics. Here are some of the key trends in OLAP databases:

2
Sub-second vs. Second/Minute Data Warehouse vs. Data Lake vs.
1 Cloud Native A
3

* Separation of
Query Response Time Data Lakehouse
Compute and 2
Storage B Streaming vs. Batch Data
* Containers
2
* k8s operator Mutable vs. Immutable Data
C

2
Remote (Object) Storage vs. Local
D
(SSD) Storage
2
Open Table Format vs. Product
E
Native Storage Format
Trends in OLAP databases
Open Lakehouse vs Proprietary / Hybrid Lakehouse.

Proprietary / Hybrid Open Storage Open

Compute

Data Catalog

Table Format

Storage Format
StarRocks is an open-source
query engine that delivers data
warehouse performance on the
data lake.
StarRocks Community

7500+ Github Stars 350+ Contributors 18,000+ Community Members

As of Feb 2024
History of StarRocks and CelerData
StarRocks was designed to address the challenges of real-time analytics, including the need to support
high concurrency, low latency, and a wide range of analytical workloads. StarRocks also offers a number
of features that are not available in other real-time analytics databases, such as the ability to query data
directly from data lakes.
Birth of StarRocks StarRocks moves to Linux Foundation Benchmarks outperform competition

StarRocks is created as a commercialized fork of the CelerData contributes StarRocks to the Linux Latest TPC-DS and SSB benchmarks shows 2x-9x
Apache Doris database. Over time, 90% of the Foundation and moves to Apache 2.0 license. speed performance over Trino, Clickhouse and
original codebase has been re-written. Apache Druid.

2022 2023

2020 2023 2023


CelerData is founded CelerData Cloud Launched

CelerData is founded as a company to develop and CelerData launches its managed cloud service for
commercialize StarRocks. StarRocks.
StarRocks is an open-source query engine that delivers
data warehouse performance on the data lake.

Directly query data on Sub-second joins and Hundreds of thousands of


data lake aggregations on billions of rows concurrent end-user requests

Cloud Native w/ separating JOIN performant at scale; mysql protocol with Trino dialect
Compute and Storage tiers denormalization optional
StarRocks Use Cases: User-facing analytics
User-facing analytics (UFA) is a rapidly growing field that is transforming the way businesses
deliver insights to their users. UFA empowers users to explore and analyze data for themselves,
without the need for technical expertise. This can lead to a number of benefits, such as:

1
Improved decision 2 Increased user 3 Reduced reliance on IT
making engagement

Key trends in user-facing analytics:

Self-service Analytics Embedded Analytics Real-Time Analytics Augmented Analytics Conversational Analytics
StarRocks Use Cases: Real-time analytics
Real-time analytics is the process of collecting, processing, and analyzing data as it is
generated, in order to gain insights into the present state of a system or process. This can lead
to a number of benefits, such as:

1 Make better decisions 2 Increased user 3


Staying ahead of the
engagement competition

Key trends in real-time analytics:

Rise of streaming data Growth of Edge Computing Increasing use of machine Democratization of real-time
learning analytics
StarRocks Use Cases: Data Lakehouse
A data lakehouse is a revolutionary data architecture that merges the best of both data lakes
and data warehouses.

1 Democratized Data Access 2 Increased agility and insights

3
Reduced costs and complexity 3
Faster and more accurate analytics

Key trends in data lakehouse:

Directly query Sub-second joins and Hundreds of thousands Cloud Native w/ JOIN performant at scale;
data on data lake aggregations on of concurrent end-user separating Compute denormalization optional
billions of rows requests and Storage tiers
Seamless integration with the
StarRocks Architecture Overview
Linux Foundation project with Apache 2.0 license.
Ecosystem

Ease of Use

Real-world Performance

Open Source OLAP compute


engine

Open Table Formats as the


Foundation

Support for Open Storage

Separated compute and storage


architecture

Cloud Native with k8s Operator

More diagrams: https://github.com/StarRocks/starrocks-reference-architecture


Two Deployment Architectural Choices

More diagrams: https://github.com/StarRocks/starrocks-reference-architecture


StarRocks with Open Data Lake
StarRocks can access multiple open table formats at the same time and even be able to create a materialized view
across all of them.

More diagrams: https://github.com/StarRocks/starrocks-reference-architecture


StarRocks past and future
StarRocks Technical Features
1.x (2020-2021) 2.x (2022-2023) 3.0 (2023) 3.1 (2023) 3.2 (2023)
OLAP for Real Time Analytics OLAP for Data Lake Shared Data Arch. Shared Data Arch. Optimization Shared Data Arch. Optimization v2

● Vectorized Execution Engine ● Global low-cardinality dictionary ● Shared Data Architecture ● Primary Key table support in ● Persisting Primary Key table indexes to local disk
● Cost Based Optimizer ● Pipeline Engine ● New RBAC privilege Shared Data ● Spill to Disk enabled by default for async materialized
● Vectorized ingestion ● Apache Iceberg Support system ● Auto_Increment column attribute views
● Apache Hive support ● Resource Group ● Spill to disk ● Automatic partition creation ● Support creating, dropping database and managed
● Bitmap optimization ● Java UDF ● Fully support for update during load tables in Apache Hive catalogs
● TopN optimization ● JSON data type support ● Support more complete ● Support Apache Iceberg v2 tables ● Unified Catalog
● Lateral JOIN ● Partial update feature UPDATE and DELETE ● Random bucketing ● Supports Information Schema for external tables
● Fast Decimal support ● JDBC external catalog support syntax in primary key ● FILES keyboard ● Enhanced Files()
● Tableau compatibility ● Primary key Index tables ● Generated columns ● Support unloading data from StarRocks to parquet
● Global runtime filter ● Fully support delete and update ● Presto/Trino compatible ● Support loading data into MAP and ● Supports manual optimization of table structure and
● Primary Key Table operations dialect STRUCT data types data distribution strategy
● Multi-table materialized view ● Broadcast JOIN and ● Support nesting Fast Decimal ● Continuous data loading using PIPE
● More table statistics including Bucket Shuffle JOIN can values in ARRAY, MAP and ● Support HTTP SQL API
histogram use query cache STRUCT ● Runtime Profile and text-based profile analysis
● Compute node on k8s ● Global UDFs ● Optimized creation of async commands
● Separation of storage and materialized view ● Support access control through Apache Ranger
compute ● Optimized query rewrite with ● Optimized open file format readers
● Local cache for open table async materialized views ● Added data consistency features for async
formats on data lake ● Optimized refreshing of aysn materialized view
● Semantic cache materialized views ● Hot and warm storage support
● Fully support RBAC ● Optimized caching, and query logic ● Fast Schema Evolution
● Map/Struct data type for StarRocks table format and ● Dynamically adjusting number of tables
● Lambda function Apache Iceberg ● Data redistribution across local disks for primary key
tables
StarRocks 3.x series roadmap
The goal of the 3.x series roadmap is to 1) Build more and optimize core data warehouse features, 2) have
feature parity between the the shared-nothing architecture and shared-data architecture and 3) be able
to query the StarRocks table format and all the popular open table formats such as Apache Iceberg,
Apache Hudi, Apache Hive, Delta Lake and Apache Paimon.

Initial release of Shared Data Architecture Incremental improvement to 3.x goals Incremental improvement to 3.x goals
Decouple compute and storage layers. Mirroring features from shared nothing to shared To be determined.
Further development of StarRocks tables, materialized view, data architecture.
JOIN performance, cache. Further development of core DW features and open
Enhancements to Iceberg, Hudi, Delta Lake, Hive support table format support.

3.1 3.3

3.0 3.2 3.4


Incremental improvement to 3.x goals Incremental improvement to 3.x goals
Mirroring features from shared nothing to shared To be determined.
data architecture.
Further development of core DW features and open
table format support.
Major Features in StarRocks
Vectorized Query Engine with SIMD
Modern CPUs have vectorized instruction sets, which can perform operations on multiple data elements
simultaneously which means faster queries by 3x to 5x over non-SIMD databases.
Table Type Support
Tables are units of data storage. Understanding the table structure in StarRocks and how to design an efficient table
structure helps optimize data organization and enhance query efficiency.
Duplicate Key Append Only ✅
Table
Types of Tables supported Type Aggregate Key Append Only ✅

● Duplicate Key Primary Key All CRUD ✅


○ Analyze raw data, such as raw logs and raw operation records.
○ Query data by using a variety of methods without being limited by the pre-aggregation method.
○ Load log data or time-series data. New data is written in append-only mode, and existing data is not updated.
● Aggregate Key
○ Help website or app providers analyze the amount of traffic and time that their users spend on a specific
website or app and the total number of visits to the website or app.
○ Help advertising agencies analyze the total clicks, total views, and consumption statistics of an advertisement
that they provide for their customers.
○ Help e-commerce companies analyze their annual trading data to identify the geographic bestsellers within
individual quarters or months.
● Primary Key
○ Stream data in real time from transaction processing systems into StarRocks.
○ Join multiple streams by performing update operations on individual columns.
JOIN performance at scale Inner Join ✅
Simply your data engineering pipeline and infrastructure by Left Join ✅
using JOINS; denormalization is optional.
Right Join ✅
SQL JOINS
Types of JOINS supported Full Join ✅

● CBO will do intelligent Join reorder Cross Join ✅


and Join method selection
Semi Join ✅
● Starrocks can join 100 million rows of
Anti Join ✅
data per second using only 1 CPU.
Details at Broadcast Join ✅

https://www.starrocks.io/blog/bench Shuffle Join ✅


mark-test Bucket Shuffle ✅
SQL JOINS Join
Optimization Technique
Co-Located Join ✅

Replicated Join ✅

Local Join ✅
Materialized View
PROJECT ✅
Materialized views can significantly improve query performance by
pre-computing common aggregations. AGGREGATE ✅

Use Case: Query Acceleration JOIN ✅

Transparent Speedup Outer-Join ✅


(Core Functionality)
View-Delta-Join ✅

PARTIAL-UNION ✅

NESTED MV ✅

View-Based ✅

Use Case: Data Modeling


Auto Refresh ✅

Incremental Refresh Scheduled ✅


(Core Functionality) Refresh

Partition-Wise ✅
SQL Hybrid-Based Optimizer
Analyzes a SQL query and chooses the most efficient execution plan by estimating the cost of different potential
plans
Query Rewrite
Technique used to optimize database queries
without the user needing to change their
original query.

Use Case: Semantic Layer

● Targeted at Select -
Projection - Join -
Aggregation (SPJA) query
pattern
● Up to 10x performance
increase
Cache System
Cache allows you to pull the data from memory instead of storage which can improve query efficiency by 3x to
17x.

Metadata ✅

Transparent Speedup Query ✅


(Cache Functionality)
Page ✅

Data ✅
Separated compute and storage architecture
Design approach for databases and data platforms that decouples the processing power (compute) from the
data storage layer.
High Availability
Redundant components and data allows the database to respond even when there is failure.

FE ✅ Additional
Nodes

CN ✅ Additional
Nodes

MySQL ✅ 3rd party


ProxySQL
Service Availability
HTTP Services ✅ 3rd party Load
Balancer

S3 Bucket ✅ 3rd party


vendor

Data Files ✅ 3rd party


vendor S3
Bucket Vendor
Columnar Storage
Stores data in a table by separating each column into its own continuous block instead of grouping entire rows
together.

StarRocks Table Format

Apache Iceberg

Columnar Storage Apache Hudi


Formats
Apache Hive

Delta Lake

Apache Paimon
Support for Open Table Formats
Open Table Formats allow users to extract more value from their data while maintaining flexibility and control.

StarRocks Table ✅ (Read/Write)


Format

Apache Iceberg ✅ (Read/Write)

Apache Hudi ✅ (Read)


Open Table
Formats
Apache Hive ✅ (Read/Write)

Delta Lake ✅ (Read)

Apache Paimon ✅ (Read)


SQL Connectivity through MySQL wire protocol
support with Trino dialect
Communicate with StarRocks through MySQL statements and utilities. Also understands the Trino SQL
dialect.

Client Server
Benchmarks
and
Community References
Benchmark StarRocks Offers 2.2x Performance over ClickHouse and 8.9x
Performance over Apache Druid® in Wide-table Scenarios Out
of the Box using product native table format.
Benchmark StarRocks Delivers 5.54x Query Performance over Trino in
Multi-table Scenarios using Apache Iceberg table format with
Parquet files.
Use Case: User Analytics
at LeetCode
LeetCode's current data warehouse, built on an OLTP database, was struggling under the
weight of terabytes of user activity data. Using this OLTP database, queries took ages,
impacting user experience and hindering LeetCode's ability to analyze trends and optimize
the platform. Scaling up the existing system proved costly and unsustainable.

StarRocks Solution:
● Queries 100x Faster: Complex analytics that previously took hours now finished in
seconds, empowering LeetCode to gain real-time insights into user behavior and
platform performance. Additionally, some queries that couldn't run in the OLTP
system were able to run successfully in StarRocks.
● Unlimited Scalability: StarRocks' horizontal scaling effortlessly accommodated
LeetCode's growing data volume, eliminating concerns about future bottlenecks.
● Cost Savings of 80%: Compared to the a similar managed OLAP solution on GCP,
StarRocks delivered significant cost savings, allowing LeetCode to reinvest in
platform development and user experience.
Use Case: Tableau
Dashboard at Airbnb
The Airbnb Tableau Dashboard project is designed to serve both
internal and external users by providing interactive dashboards. It
requires a quick response to user queries. However, the query
latency of previous solutions is over 10 mins, which is not
acceptable. This project was just suspended until StarRocks is
adopted.

StarRocks Solution:
● StarRocks can directly connect and works very well with
Tableau.
● 3 tables (0.5B rows, 6B rows, 100M rows) + 4 joins + 3
distinct count + JSON functions and regex at same time,
response time just 3.6s.
● Reduce the query response time from mins level to
sub-seconds level.
Use Case: Game and
User Behavior Analytics
at Tencent IEG
● 400+ game data analysis and user behavior analysis
● Operation reports need to be real-time.
● Using ClickHouse for real-time analysis and Trino for
Ad-hoc before, but they want to integrate them all.
● Using Iceberg + COS store, need better performance.
● Need elastic in ad-hoc query to deduce cost.

StarRocks Solution:
● Using StarRocks Primary key to solve update problem.
● Using compute node on k8s to auto-scaling.
● Get much more performance in ad-hoc query.
Use Case: Trust
Analytics at Airbnb
To enhance security, Airbnb needs a real-time fraud detection
system (Trust Analytics) to identify various attacks and take
actions ASAP. This system must support Ad-Hoc query and
real-time update.

StarRocks Solution:
● StarRocks hosts real-time updated datasets via Primary
Key.
● Dataset import from Kafka has a sub-minute delay.
● StarRocks provides second-level query latency for
complex joins.
● Alerting can be achieved by just running a SQL query
regularly.
Thank you.
● Community starrocks.io
● Enterprise celerdata.com
● Managed Service cloud.celerdata.com
Credits
● This presentation is using images from Flaticon.com
Architectural Patterns
and
Best Practices
Kappa and Lambda Architecture with StarRocks + Apache Kafka
Kappa and Lambda Architecture with Open Lakehouse
Open Data Lakehouse with Apache xTable
Best Practices

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy