Tenorshare 4uKey Crack Latest Version
Tenorshare 4uKey Crack Latest Version
Overview
A Linux Foundation Project
2
Sub-second vs. Second/Minute Data Warehouse vs. Data Lake vs.
1 Cloud Native A
3
* Separation of
Query Response Time Data Lakehouse
Compute and 2
Storage B Streaming vs. Batch Data
* Containers
2
* k8s operator Mutable vs. Immutable Data
C
2
Remote (Object) Storage vs. Local
D
(SSD) Storage
2
Open Table Format vs. Product
E
Native Storage Format
Trends in OLAP databases
Open Lakehouse vs Proprietary / Hybrid Lakehouse.
Compute
Data Catalog
Table Format
Storage Format
StarRocks is an open-source
query engine that delivers data
warehouse performance on the
data lake.
StarRocks Community
As of Feb 2024
History of StarRocks and CelerData
StarRocks was designed to address the challenges of real-time analytics, including the need to support
high concurrency, low latency, and a wide range of analytical workloads. StarRocks also offers a number
of features that are not available in other real-time analytics databases, such as the ability to query data
directly from data lakes.
Birth of StarRocks StarRocks moves to Linux Foundation Benchmarks outperform competition
StarRocks is created as a commercialized fork of the CelerData contributes StarRocks to the Linux Latest TPC-DS and SSB benchmarks shows 2x-9x
Apache Doris database. Over time, 90% of the Foundation and moves to Apache 2.0 license. speed performance over Trino, Clickhouse and
original codebase has been re-written. Apache Druid.
2022 2023
CelerData is founded as a company to develop and CelerData launches its managed cloud service for
commercialize StarRocks. StarRocks.
StarRocks is an open-source query engine that delivers
data warehouse performance on the data lake.
Cloud Native w/ separating JOIN performant at scale; mysql protocol with Trino dialect
Compute and Storage tiers denormalization optional
StarRocks Use Cases: User-facing analytics
User-facing analytics (UFA) is a rapidly growing field that is transforming the way businesses
deliver insights to their users. UFA empowers users to explore and analyze data for themselves,
without the need for technical expertise. This can lead to a number of benefits, such as:
1
Improved decision 2 Increased user 3 Reduced reliance on IT
making engagement
Self-service Analytics Embedded Analytics Real-Time Analytics Augmented Analytics Conversational Analytics
StarRocks Use Cases: Real-time analytics
Real-time analytics is the process of collecting, processing, and analyzing data as it is
generated, in order to gain insights into the present state of a system or process. This can lead
to a number of benefits, such as:
Rise of streaming data Growth of Edge Computing Increasing use of machine Democratization of real-time
learning analytics
StarRocks Use Cases: Data Lakehouse
A data lakehouse is a revolutionary data architecture that merges the best of both data lakes
and data warehouses.
3
Reduced costs and complexity 3
Faster and more accurate analytics
Directly query Sub-second joins and Hundreds of thousands Cloud Native w/ JOIN performant at scale;
data on data lake aggregations on of concurrent end-user separating Compute denormalization optional
billions of rows requests and Storage tiers
Seamless integration with the
StarRocks Architecture Overview
Linux Foundation project with Apache 2.0 license.
Ecosystem
Ease of Use
Real-world Performance
● Vectorized Execution Engine ● Global low-cardinality dictionary ● Shared Data Architecture ● Primary Key table support in ● Persisting Primary Key table indexes to local disk
● Cost Based Optimizer ● Pipeline Engine ● New RBAC privilege Shared Data ● Spill to Disk enabled by default for async materialized
● Vectorized ingestion ● Apache Iceberg Support system ● Auto_Increment column attribute views
● Apache Hive support ● Resource Group ● Spill to disk ● Automatic partition creation ● Support creating, dropping database and managed
● Bitmap optimization ● Java UDF ● Fully support for update during load tables in Apache Hive catalogs
● TopN optimization ● JSON data type support ● Support more complete ● Support Apache Iceberg v2 tables ● Unified Catalog
● Lateral JOIN ● Partial update feature UPDATE and DELETE ● Random bucketing ● Supports Information Schema for external tables
● Fast Decimal support ● JDBC external catalog support syntax in primary key ● FILES keyboard ● Enhanced Files()
● Tableau compatibility ● Primary key Index tables ● Generated columns ● Support unloading data from StarRocks to parquet
● Global runtime filter ● Fully support delete and update ● Presto/Trino compatible ● Support loading data into MAP and ● Supports manual optimization of table structure and
● Primary Key Table operations dialect STRUCT data types data distribution strategy
● Multi-table materialized view ● Broadcast JOIN and ● Support nesting Fast Decimal ● Continuous data loading using PIPE
● More table statistics including Bucket Shuffle JOIN can values in ARRAY, MAP and ● Support HTTP SQL API
histogram use query cache STRUCT ● Runtime Profile and text-based profile analysis
● Compute node on k8s ● Global UDFs ● Optimized creation of async commands
● Separation of storage and materialized view ● Support access control through Apache Ranger
compute ● Optimized query rewrite with ● Optimized open file format readers
● Local cache for open table async materialized views ● Added data consistency features for async
formats on data lake ● Optimized refreshing of aysn materialized view
● Semantic cache materialized views ● Hot and warm storage support
● Fully support RBAC ● Optimized caching, and query logic ● Fast Schema Evolution
● Map/Struct data type for StarRocks table format and ● Dynamically adjusting number of tables
● Lambda function Apache Iceberg ● Data redistribution across local disks for primary key
tables
StarRocks 3.x series roadmap
The goal of the 3.x series roadmap is to 1) Build more and optimize core data warehouse features, 2) have
feature parity between the the shared-nothing architecture and shared-data architecture and 3) be able
to query the StarRocks table format and all the popular open table formats such as Apache Iceberg,
Apache Hudi, Apache Hive, Delta Lake and Apache Paimon.
Initial release of Shared Data Architecture Incremental improvement to 3.x goals Incremental improvement to 3.x goals
Decouple compute and storage layers. Mirroring features from shared nothing to shared To be determined.
Further development of StarRocks tables, materialized view, data architecture.
JOIN performance, cache. Further development of core DW features and open
Enhancements to Iceberg, Hudi, Delta Lake, Hive support table format support.
3.1 3.3
Replicated Join ✅
Local Join ✅
Materialized View
PROJECT ✅
Materialized views can significantly improve query performance by
pre-computing common aggregations. AGGREGATE ✅
PARTIAL-UNION ✅
NESTED MV ✅
View-Based ✅
Partition-Wise ✅
SQL Hybrid-Based Optimizer
Analyzes a SQL query and chooses the most efficient execution plan by estimating the cost of different potential
plans
Query Rewrite
Technique used to optimize database queries
without the user needing to change their
original query.
● Targeted at Select -
Projection - Join -
Aggregation (SPJA) query
pattern
● Up to 10x performance
increase
Cache System
Cache allows you to pull the data from memory instead of storage which can improve query efficiency by 3x to
17x.
Metadata ✅
Data ✅
Separated compute and storage architecture
Design approach for databases and data platforms that decouples the processing power (compute) from the
data storage layer.
High Availability
Redundant components and data allows the database to respond even when there is failure.
FE ✅ Additional
Nodes
CN ✅ Additional
Nodes
Apache Iceberg
Delta Lake
Apache Paimon
Support for Open Table Formats
Open Table Formats allow users to extract more value from their data while maintaining flexibility and control.
Client Server
Benchmarks
and
Community References
Benchmark StarRocks Offers 2.2x Performance over ClickHouse and 8.9x
Performance over Apache Druid® in Wide-table Scenarios Out
of the Box using product native table format.
Benchmark StarRocks Delivers 5.54x Query Performance over Trino in
Multi-table Scenarios using Apache Iceberg table format with
Parquet files.
Use Case: User Analytics
at LeetCode
LeetCode's current data warehouse, built on an OLTP database, was struggling under the
weight of terabytes of user activity data. Using this OLTP database, queries took ages,
impacting user experience and hindering LeetCode's ability to analyze trends and optimize
the platform. Scaling up the existing system proved costly and unsustainable.
StarRocks Solution:
● Queries 100x Faster: Complex analytics that previously took hours now finished in
seconds, empowering LeetCode to gain real-time insights into user behavior and
platform performance. Additionally, some queries that couldn't run in the OLTP
system were able to run successfully in StarRocks.
● Unlimited Scalability: StarRocks' horizontal scaling effortlessly accommodated
LeetCode's growing data volume, eliminating concerns about future bottlenecks.
● Cost Savings of 80%: Compared to the a similar managed OLAP solution on GCP,
StarRocks delivered significant cost savings, allowing LeetCode to reinvest in
platform development and user experience.
Use Case: Tableau
Dashboard at Airbnb
The Airbnb Tableau Dashboard project is designed to serve both
internal and external users by providing interactive dashboards. It
requires a quick response to user queries. However, the query
latency of previous solutions is over 10 mins, which is not
acceptable. This project was just suspended until StarRocks is
adopted.
StarRocks Solution:
● StarRocks can directly connect and works very well with
Tableau.
● 3 tables (0.5B rows, 6B rows, 100M rows) + 4 joins + 3
distinct count + JSON functions and regex at same time,
response time just 3.6s.
● Reduce the query response time from mins level to
sub-seconds level.
Use Case: Game and
User Behavior Analytics
at Tencent IEG
● 400+ game data analysis and user behavior analysis
● Operation reports need to be real-time.
● Using ClickHouse for real-time analysis and Trino for
Ad-hoc before, but they want to integrate them all.
● Using Iceberg + COS store, need better performance.
● Need elastic in ad-hoc query to deduce cost.
StarRocks Solution:
● Using StarRocks Primary key to solve update problem.
● Using compute node on k8s to auto-scaling.
● Get much more performance in ad-hoc query.
Use Case: Trust
Analytics at Airbnb
To enhance security, Airbnb needs a real-time fraud detection
system (Trust Analytics) to identify various attacks and take
actions ASAP. This system must support Ad-Hoc query and
real-time update.
StarRocks Solution:
● StarRocks hosts real-time updated datasets via Primary
Key.
● Dataset import from Kafka has a sub-minute delay.
● StarRocks provides second-level query latency for
complex joins.
● Alerting can be achieved by just running a SQL query
regularly.
Thank you.
● Community starrocks.io
● Enterprise celerdata.com
● Managed Service cloud.celerdata.com
Credits
● This presentation is using images from Flaticon.com
Architectural Patterns
and
Best Practices
Kappa and Lambda Architecture with StarRocks + Apache Kafka
Kappa and Lambda Architecture with Open Lakehouse
Open Data Lakehouse with Apache xTable
Best Practices