0% found this document useful (0 votes)
33 views21 pages

Introduction To Google Cloud Big Data Platform: Lecturer: Phd. Tran Minh Quang Data Engineering - Group 12

This document provides an introduction and overview of Google BigQuery. It defines BigQuery as Google's cloud-based interactive query service for massive datasets. It describes why BigQuery is useful, including its ability to query billions of rows quickly, reliability, scalability, and speed. The document also outlines BigQuery's organization into projects, datasets and tables, and how it can be accessed via the web UI, command line, APIs and other tools. It provides details on BigQuery's Dremel architecture including its columnar data storage and tree-based query execution across Google's infrastructure.

Uploaded by

Nhan Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views21 pages

Introduction To Google Cloud Big Data Platform: Lecturer: Phd. Tran Minh Quang Data Engineering - Group 12

This document provides an introduction and overview of Google BigQuery. It defines BigQuery as Google's cloud-based interactive query service for massive datasets. It describes why BigQuery is useful, including its ability to query billions of rows quickly, reliability, scalability, and speed. The document also outlines BigQuery's organization into projects, datasets and tables, and how it can be accessed via the web UI, command line, APIs and other tools. It provides details on BigQuery's Dremel architecture including its columnar data storage and tree-based query execution across Google's infrastructure.

Uploaded by

Nhan Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Introduction to Google Cloud

Big Data Platform


Lecturer: PhD. Tran Minh Quang
Data Engineering - Group 12
Google BigQuery
Agenda
● What is BigQuery?
● Why BigQuery?
● BigQuery Organization
● Accessing BigQuery
● BigQuery Architecture: Dremel
● References
What is BigQuery?
● BigQuery is a service provided by Google Cloud Platform,
a suite of products & services that includes application
hosting, cloud computing, database services, … on
Google’s scalable infrastructure
● BigQuery is Google’s solution for companies who need a
fully-managed and cloud-based interactive query service
for massive datasets
Why BigQuery?
● Service for interactive analysis of massive datasets (TBs)
○ Query billions of rows: seconds to write, seconds to return
○ Uses a SQL-style query syntax
○ It’s a service, can be accessed by API
Why BigQuery? (cont’d)
● Reliable and Secure
○ Replicated across multiple machines
○ Secured through Access Control Lists
Why BigQuery? (cont’d)
● Scalable
○ Store hundreds of terabytes
○ Pay only for what you use
● Fast
○ Run ad hoc queries on multi-terabyte datasets in seconds
BigQuery Organization
BigQuery is structured as a hierarchy with 4 levels:

● Projects: Top-level containers in the Google Cloud Platform that store the data
● Datasets: Within projects, datasets hold one or more tables of data
● Tables: Within datasets, tables are row-column structures that hold actual data
● Jobs: The tasks you are performing on the data, such as running queries, loading data,
and exporting data
Example: BigQuery, Datasets, and Tables
● Here is an example of the left-pane
navigation within BigQuery
● Project are identified by the project name, for
example ‘bigquery-public-data’
● You can expand projects to see the
corresponding datasets, for example ‘github-
repos’
● Tables are referenced by their project and
dataset as: <project>.<dataset>.<table>
○ for example ‘bigquery-public-
data.github_repos.contents’
Accessing BigQuery
● Web UI (bigquery.cloud.google.com)
● console/command line (gcloud)
● Third party Tools
○ Tableau
○ QlikView
○ R
○ Excel
○ …
● Restful API
Restful API
Method HTTP Request

delete DELETE /bigquery/v2/projects/{projectId}/datasets/{datasetId}

get GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}

insert POST /bigquery/v2/projects/{projectId}/datasets

list GET /bigquery/v2/projects/{projectId}/datasets

patch PATCH /bigquery/v2/projects/{projectId}/datasets/{datasetId}

update PUT /bigquery/v2/projects/{projectId}/datasets/{datasetId}

For Dataset
Restful API
Method HTTP Request

cancel POST /bigquery/v2/projects/{projectId}/jobs/{jobId}/cancel

get GET /bigquery/v2/projects/{projectId}/jobs/{jobId}

getQueryResults GET /bigquery/v2/projects/{projectId}/queries/{jobId}

insert POST /bigquery/v2/projects/{projectId}/jobs


POST /upload/bigquery/v2/projects/{projectId}/jobs

list GET /bigquery/v2/projects/{projectId}/jobs

query POST /bigquery/v2/projects/{projectId}/queries

For Jobs
BigQuery Architecture: Dremel
● Data model/Storage
● Query execution
Data model/Storage
● Columnar Storage
● Nested/Repeated Fields
● No indexing => Single full table
scan from disk
BOOK 1:
AUTHOR: Dumas
TITLE: The Three Musketeers
PRICE:
DISCOUNT: 0
USD: 20
EUR: 19
BOOK 2:
AUTHOR: Yrsa Sigurdardottir
AUTHOR: Tina Flecken
AUTHOR: Elma Klein
TITLE: Feuernacht

BOOK 3:
TITLE: Get Fit, Stay Fit
PRICE:
DISCOUNT: 0
EUR: 12
PRICE:
DISCOUNT: 1
EUR: 11
Columnar Representation

AUTHOR PRICE.EU
Dumas (0, 1) 19 (0, 2)
Yrsa Sigurdardottir (0, PRICE.DISCOUNT NULL (0, 0)
1) 0 (0, 2) 12 (0, 2)
Tina Flecken (1, 1) NULL (0, 0) 11 (1, 2)
Elma Klein (1, 1) 0 (0, 2)
NULL (0, 0) 1 (1, 2)

PRICE.USD
TITLE 20 (0, 2)
The Three Musketeers (0,1) NULL (0, 0)
Feuernacht (0, 1) NULL (0, 1)
Get Fit, Stay Fit (0, 1) NULL (1, 1)
BOOK 1: R D
AUTHOR: Dumas AUTHOR 0 1 R = In the
TITLE: The Three Musketeers TITLE 0 1
PRICE: path to the
DISCOUNT: 0 PRICE.DISCOUNT 0 2
USD: 20 PRICE.USD 0 2 field, what
BOOK 2:
EUR: 19 PRICE.EUR 0 2 is the last
AUTHOR: Yrsa Sigurdardottir AUTHOR 0 1 repeated
AUTHOR: Tina Flecken AUTHOR[1] 1 1
AUTHOR: Elma Klein AUTHOR[2] 1 1 field ?
TITLE: Feuernacht TITLE 0 1
(PRICE)
(DISCOUNT): NULL (PRICE).(DISCOUNT) 0 0
(EUR): NULL (PRICE).(EUR) 0 0
(USD): NULL (PRICE).(USD) 0 0 D = In the
BOOK 3:
(AUTHOR): NULL (AUTHOR) 0 0
path to the
TITLE: Get Fit, Stay Fit TITLE 0 1 field, how
PRICE:
DISCOUNT: 0 PRICE.DISCOUNT 0 2 many
EUR: 12 PRICE.EUR 0 2
(USD): NULL PRICE.(USD) 0 1
defined
PRICE: fields ?
DISCOUNT: 1 PRICE[1].DISCOUNT 1 2
EUR: 11 PRICE[1].EUR 1 2
(USD): NULL PRICE[1].(USD) 1 1
Query execution
● Tree architecture
● Using about tens thousands
machines over Google’s petabit
network (+1Petabits/s)
DEMO
References
● https://www.oreilly.com/library/view/google-bigquery-the/9781492044451/
● https://cloud.google.com/files/BigQueryTechnicalWP.pdf
● https://cloud.google.com/bigquery/docs/
● https://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_tue_1415_Ryan
Boyd.pdf

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy