0% found this document useful (0 votes)

33 views21 pages

Introduction To Google Cloud Big Data Platform: Lecturer: Phd. Tran Minh Quang Data Engineering - Group 12

This document provides an introduction and overview of Google BigQuery. It defines BigQuery as Google's cloud-based interactive query service for massive datasets. It describes why BigQuery is useful, including its ability to query billions of rows quickly, reliability, scalability, and speed. The document also outlines BigQuery's organization into projects, datasets and tables, and how it can be accessed via the web UI, command line, APIs and other tools. It provides details on BigQuery's Dremel architecture including its columnar data storage and tree-based query execution across Google's infrastructure.

Uploaded by

Nhan Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views21 pages

Introduction To Google Cloud Big Data Platform: Lecturer: Phd. Tran Minh Quang Data Engineering - Group 12

Uploaded by

Nhan Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Introduction to Google Cloud

Big Data Platform

Lecturer: PhD. Tran Minh Quang
Data Engineering - Group 12
Google BigQuery
Agenda
● What is BigQuery?
● Why BigQuery?
● BigQuery Organization
● Accessing BigQuery
● BigQuery Architecture: Dremel
● References
What is BigQuery?
● BigQuery is a service provided by Google Cloud Platform,
a suite of products & services that includes application
hosting, cloud computing, database services, … on
Google’s scalable infrastructure
● BigQuery is Google’s solution for companies who need a
fully-managed and cloud-based interactive query service
for massive datasets
Why BigQuery?
● Service for interactive analysis of massive datasets (TBs)
○ Query billions of rows: seconds to write, seconds to return
○ Uses a SQL-style query syntax
○ It’s a service, can be accessed by API
Why BigQuery? (cont’d)
● Reliable and Secure
○ Replicated across multiple machines
○ Secured through Access Control Lists
Why BigQuery? (cont’d)
● Scalable
○ Store hundreds of terabytes
○ Pay only for what you use
● Fast
○ Run ad hoc queries on multi-terabyte datasets in seconds
BigQuery Organization
BigQuery is structured as a hierarchy with 4 levels:

● Projects: Top-level containers in the Google Cloud Platform that store the data
● Datasets: Within projects, datasets hold one or more tables of data
● Tables: Within datasets, tables are row-column structures that hold actual data
● Jobs: The tasks you are performing on the data, such as running queries, loading data,
and exporting data
Example: BigQuery, Datasets, and Tables
● Here is an example of the left-pane
navigation within BigQuery
● Project are identified by the project name, for
example ‘bigquery-public-data’
● You can expand projects to see the
corresponding datasets, for example ‘github-
repos’
● Tables are referenced by their project and
dataset as: <project>.<dataset>.<table>
○ for example ‘bigquery-public-
data.github_repos.contents’
Accessing BigQuery
● Web UI (bigquery.cloud.google.com)
● console/command line (gcloud)
● Third party Tools
○ Tableau
○ QlikView
○ R
○ Excel
○ …
● Restful API
Restful API
Method HTTP Request

delete DELETE /bigquery/v2/projects/{projectId}/datasets/{datasetId}

get GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}

insert POST /bigquery/v2/projects/{projectId}/datasets

list GET /bigquery/v2/projects/{projectId}/datasets

patch PATCH /bigquery/v2/projects/{projectId}/datasets/{datasetId}

update PUT /bigquery/v2/projects/{projectId}/datasets/{datasetId}

For Dataset
Restful API
Method HTTP Request

cancel POST /bigquery/v2/projects/{projectId}/jobs/{jobId}/cancel

get GET /bigquery/v2/projects/{projectId}/jobs/{jobId}

getQueryResults GET /bigquery/v2/projects/{projectId}/queries/{jobId}

insert POST /bigquery/v2/projects/{projectId}/jobs

POST /upload/bigquery/v2/projects/{projectId}/jobs

list GET /bigquery/v2/projects/{projectId}/jobs

query POST /bigquery/v2/projects/{projectId}/queries

For Jobs
BigQuery Architecture: Dremel
● Data model/Storage
● Query execution
Data model/Storage
● Columnar Storage
● Nested/Repeated Fields
● No indexing => Single full table
scan from disk
BOOK 1:
AUTHOR: Dumas
TITLE: The Three Musketeers
PRICE:
DISCOUNT: 0
USD: 20
EUR: 19
BOOK 2:
AUTHOR: Yrsa Sigurdardottir
AUTHOR: Tina Flecken
AUTHOR: Elma Klein
TITLE: Feuernacht

BOOK 3:
TITLE: Get Fit, Stay Fit
PRICE:
DISCOUNT: 0
EUR: 12
PRICE:
DISCOUNT: 1
EUR: 11
Columnar Representation

AUTHOR PRICE.EU
Dumas (0, 1) 19 (0, 2)
Yrsa Sigurdardottir (0, PRICE.DISCOUNT NULL (0, 0)
1) 0 (0, 2) 12 (0, 2)
Tina Flecken (1, 1) NULL (0, 0) 11 (1, 2)
Elma Klein (1, 1) 0 (0, 2)
NULL (0, 0) 1 (1, 2)

PRICE.USD
TITLE 20 (0, 2)
The Three Musketeers (0,1) NULL (0, 0)
Feuernacht (0, 1) NULL (0, 1)
Get Fit, Stay Fit (0, 1) NULL (1, 1)
BOOK 1: R D
AUTHOR: Dumas AUTHOR 0 1 R = In the
TITLE: The Three Musketeers TITLE 0 1
PRICE: path to the
DISCOUNT: 0 PRICE.DISCOUNT 0 2
USD: 20 PRICE.USD 0 2 field, what
BOOK 2:
EUR: 19 PRICE.EUR 0 2 is the last
AUTHOR: Yrsa Sigurdardottir AUTHOR 0 1 repeated
AUTHOR: Tina Flecken AUTHOR[1] 1 1
AUTHOR: Elma Klein AUTHOR[2] 1 1 field ?
TITLE: Feuernacht TITLE 0 1
(PRICE)
(DISCOUNT): NULL (PRICE).(DISCOUNT) 0 0
(EUR): NULL (PRICE).(EUR) 0 0
(USD): NULL (PRICE).(USD) 0 0 D = In the
BOOK 3:
(AUTHOR): NULL (AUTHOR) 0 0
path to the
TITLE: Get Fit, Stay Fit TITLE 0 1 field, how
PRICE:
DISCOUNT: 0 PRICE.DISCOUNT 0 2 many
EUR: 12 PRICE.EUR 0 2
(USD): NULL PRICE.(USD) 0 1
defined
PRICE: fields ?
DISCOUNT: 1 PRICE[1].DISCOUNT 1 2
EUR: 11 PRICE[1].EUR 1 2
(USD): NULL PRICE[1].(USD) 1 1
Query execution
● Tree architecture
● Using about tens thousands
machines over Google’s petabit
network (+1Petabits/s)
DEMO
References
● https://www.oreilly.com/library/view/google-bigquery-the/9781492044451/
● https://cloud.google.com/files/BigQueryTechnicalWP.pdf
● https://cloud.google.com/bigquery/docs/
● https://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_tue_1415_Ryan
Boyd.pdf

Internet and Email Exam
No ratings yet
Internet and Email Exam
2 pages
Believing in Bits Digital Media and The Supernatural (Simone Natale (Editor), Diana Pasulka (Editor) )
100% (1)
Believing in Bits Digital Media and The Supernatural (Simone Natale (Editor), Diana Pasulka (Editor) )
228 pages
Edge Computing
No ratings yet
Edge Computing
7 pages
The Complete Guide To Prompt Engineering....
No ratings yet
The Complete Guide To Prompt Engineering....
47 pages
CIS Google Kubernetes Engine (GKE) Benchmark v1.5.0 PDF
No ratings yet
CIS Google Kubernetes Engine (GKE) Benchmark v1.5.0 PDF
219 pages
40+ Google Interview Questions & Answers
No ratings yet
40+ Google Interview Questions & Answers
17 pages
File Sharing Web App
No ratings yet
File Sharing Web App
40 pages
Google Cloud Security
No ratings yet
Google Cloud Security
20 pages
DLL Week 2
No ratings yet
DLL Week 2
13 pages
SEMINAR ON CLOUD SECURITY ... CHALLENGES AND SOLUTION ..Final! 2
No ratings yet
SEMINAR ON CLOUD SECURITY ... CHALLENGES AND SOLUTION ..Final! 2
65 pages
YoozRising RestAPI User Manual EN
No ratings yet
YoozRising RestAPI User Manual EN
24 pages
0 - Ge Elective 3
No ratings yet
0 - Ge Elective 3
13 pages
Casp Comptia Advanced Security Practitioner Certification Exam Guide Exam Cas 003 Second
No ratings yet
Casp Comptia Advanced Security Practitioner Certification Exam Guide Exam Cas 003 Second
851 pages
SEM RESPOSTA - 736496689-Google-Cloud-Professional-Machine-Learning-Engineer-Exam-Questions
No ratings yet
SEM RESPOSTA - 736496689-Google-Cloud-Professional-Machine-Learning-Engineer-Exam-Questions
82 pages
BigQuery For Data Warehouse Practitioners - Solutions - Google Cloud
No ratings yet
BigQuery For Data Warehouse Practitioners - Solutions - Google Cloud
25 pages
Lesson Plan in Creating Email
No ratings yet
Lesson Plan in Creating Email
8 pages
BigQuery Cost Optimization + Best Practices
No ratings yet
BigQuery Cost Optimization + Best Practices
30 pages
CSSLP SECURE SOFTWARE LIFECYCLE PROFESSIONAL ALL-IN-ONE EXAM GUIDE, Third Edition, 3rd Edition Wm. Arthur Conklin & Daniel Paul Shoemaker - Ebook PDF Instant Download
100% (8)
CSSLP SECURE SOFTWARE LIFECYCLE PROFESSIONAL ALL-IN-ONE EXAM GUIDE, Third Edition, 3rd Edition Wm. Arthur Conklin & Daniel Paul Shoemaker - Ebook PDF Instant Download
69 pages
Pankowecki Robert Domaindriven Rails
100% (1)
Pankowecki Robert Domaindriven Rails
278 pages
Unit 1 - Types of OS Gtu
No ratings yet
Unit 1 - Types of OS Gtu
17 pages
GraphQL or Bust v2.2
No ratings yet
GraphQL or Bust v2.2
124 pages
GCP Digital Leader Cheat Sheet PDF
No ratings yet
GCP Digital Leader Cheat Sheet PDF
1 page
The Ultimate Guide To React Native Optimization: 2023 Edition
100% (1)
The Ultimate Guide To React Native Optimization: 2023 Edition
211 pages
Linux Hardening - TFG-B. 1910
No ratings yet
Linux Hardening - TFG-B. 1910
134 pages
Manual ITIL4 Foundation
No ratings yet
Manual ITIL4 Foundation
146 pages
Getting Started With Tivoli Dynamic Workload Broker Version 1.1 Sg247442
No ratings yet
Getting Started With Tivoli Dynamic Workload Broker Version 1.1 Sg247442
706 pages
Effective Aggregate Design Part III: Gaining Insight Through Discovery
No ratings yet
Effective Aggregate Design Part III: Gaining Insight Through Discovery
5 pages
CSA Cloud Controls Matrix V4.0 - A CSF 2.0 Cloud Community Profile
No ratings yet
CSA Cloud Controls Matrix V4.0 - A CSF 2.0 Cloud Community Profile
81 pages
Binder
No ratings yet
Binder
97 pages
LabGuide ESB
No ratings yet
LabGuide ESB
120 pages
Cloud Digital Leader 1
100% (1)
Cloud Digital Leader 1
29 pages
Types of Media
No ratings yet
Types of Media
32 pages
GCP Associate Cloud Engineer
100% (1)
GCP Associate Cloud Engineer
4 pages
Device Protection With Microsoft Endpoint Manager and Microsoft Defender For Endpoint - Module 10 - Device Encryption
No ratings yet
Device Protection With Microsoft Endpoint Manager and Microsoft Defender For Endpoint - Module 10 - Device Encryption
34 pages
AdventureWorks Entity Relationship Diagram
No ratings yet
AdventureWorks Entity Relationship Diagram
1 page
DrWeb Crash
No ratings yet
DrWeb Crash
19 pages
Az-305 8
No ratings yet
Az-305 8
50 pages
Clean and Hexagonal Architecture-SHARE
No ratings yet
Clean and Hexagonal Architecture-SHARE
6 pages
DuckDB in Action MEAP v02 Chptrs 1to4 MotheDuck
No ratings yet
DuckDB in Action MEAP v02 Chptrs 1to4 MotheDuck
123 pages
HTML Tags
No ratings yet
HTML Tags
11 pages
07 Resource Monitoring
No ratings yet
07 Resource Monitoring
37 pages
DCR trv380 Digital Video Camera Recorder Manual
No ratings yet
DCR trv380 Digital Video Camera Recorder Manual
13 pages
Azure + Dynamics 365 + Online Services - IsO 22301 Recertification Assessment Report (4.24.2023)
No ratings yet
Azure + Dynamics 365 + Online Services - IsO 22301 Recertification Assessment Report (4.24.2023)
33 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
Parking Management System-SRS
67% (27)
Parking Management System-SRS
6 pages
DF App-2-Fiela-1209
No ratings yet
DF App-2-Fiela-1209
4 pages
BigQuery Introduction
No ratings yet
BigQuery Introduction
11 pages
HubSpot - Social Media Image Size Template
No ratings yet
HubSpot - Social Media Image Size Template
14 pages
Chapter - 10 Microsoft (MS) Excel: 10.1. Workbook
No ratings yet
Chapter - 10 Microsoft (MS) Excel: 10.1. Workbook
3 pages
Week 3 Computer and Network Security CMDI
100% (1)
Week 3 Computer and Network Security CMDI
33 pages
Alex
No ratings yet
Alex
13 pages
Cloud Computing
No ratings yet
Cloud Computing
12 pages
How To Key Green Screen Footage in After Effects
No ratings yet
How To Key Green Screen Footage in After Effects
16 pages
Business Intelligence Assignment 3
100% (1)
Business Intelligence Assignment 3
11 pages
Rocktree Trainess Manual Rev 0 Jan 20 2023
No ratings yet
Rocktree Trainess Manual Rev 0 Jan 20 2023
6 pages
Free Professional Cloud Architect Exam Questions
No ratings yet
Free Professional Cloud Architect Exam Questions
14 pages
Cloud Computing
No ratings yet
Cloud Computing
27 pages
Material For Student RWVCPC V012021A EN
No ratings yet
Material For Student RWVCPC V012021A EN
70 pages
False
No ratings yet
False
3 pages
Multiple VPC Networks
No ratings yet
Multiple VPC Networks
20 pages
Computer Assignment
No ratings yet
Computer Assignment
4 pages
04 Aptis Platform Migration Pack
No ratings yet
04 Aptis Platform Migration Pack
3 pages
Oratop
No ratings yet
Oratop
16 pages
Installing Matlab: Part 1: Creating A Mathworks Account and Downloading The Installer
No ratings yet
Installing Matlab: Part 1: Creating A Mathworks Account and Downloading The Installer
5 pages
Splunk SPL Commands Quick Reference
No ratings yet
Splunk SPL Commands Quick Reference
3 pages
Google Cloud Fund M1 Introducing Google Cloud
No ratings yet
Google Cloud Fund M1 Introducing Google Cloud
31 pages
Igcse Ict (Code 0417) Lesson Note Document PRDN
No ratings yet
Igcse Ict (Code 0417) Lesson Note Document PRDN
12 pages
Qtp/Uft Interview Questions: Testing Masters Technologies
No ratings yet
Qtp/Uft Interview Questions: Testing Masters Technologies
3 pages
Data Mining N Business Intelligence
No ratings yet
Data Mining N Business Intelligence
63 pages
ST-LINUX - Distribution and Development Environment
No ratings yet
ST-LINUX - Distribution and Development Environment
10 pages
Universal Control 3.1.1 Milestone Release Notes : Version Information
No ratings yet
Universal Control 3.1.1 Milestone Release Notes : Version Information
2 pages
Exploring Cloud Security Services
No ratings yet
Exploring Cloud Security Services
3 pages
Green Cloud Computing PDF
No ratings yet
Green Cloud Computing PDF
15 pages
AZ-500 Syllabus
No ratings yet
AZ-500 Syllabus
4 pages
SnowPro-Core - Not Accurate - Just - Quick - Review
No ratings yet
SnowPro-Core - Not Accurate - Just - Quick - Review
5 pages
CTRL S
No ratings yet
CTRL S
2 pages
Sound Questions
No ratings yet
Sound Questions
6 pages
From Monolithic Systems To Microservices: An Assessment Framework
No ratings yet
From Monolithic Systems To Microservices: An Assessment Framework
12 pages
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
13 pages
Software Engineering 2 Cheatsheet
No ratings yet
Software Engineering 2 Cheatsheet
12 pages
ERP CLOUD and Open Source
No ratings yet
ERP CLOUD and Open Source
15 pages
YARN Essentials - Sample Chapter
No ratings yet
YARN Essentials - Sample Chapter
12 pages
Tutorial Hbase
No ratings yet
Tutorial Hbase
100 pages
Digital Marketing 101 - How To Promote and Market Your Music Online
No ratings yet
Digital Marketing 101 - How To Promote and Market Your Music Online
5 pages
Certified Software Quality Analyst
No ratings yet
Certified Software Quality Analyst
4 pages
What Is Bigquery: Enterprise Data Warehouse
No ratings yet
What Is Bigquery: Enterprise Data Warehouse
2 pages
SDLC and Model Selection: A Study
No ratings yet
SDLC and Model Selection: A Study
5 pages
Cloud Computing: An Overview
No ratings yet
Cloud Computing: An Overview
20 pages
UML Course Day1 V2
No ratings yet
UML Course Day1 V2
78 pages
Cloud Computing: Case Studies and Total Costs of Ownership - ProQuest
0% (1)
Cloud Computing: Case Studies and Total Costs of Ownership - ProQuest
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Introduction To Google Cloud Big Data Platform: Lecturer: Phd. Tran Minh Quang Data Engineering - Group 12

Uploaded by

Introduction To Google Cloud Big Data Platform: Lecturer: Phd. Tran Minh Quang Data Engineering - Group 12

Uploaded by

Introduction to Google Cloud

Big Data Platform

delete DELETE /bigquery/v2/projects/{projectId}/datasets/{datasetId}

get GET /bigquery/v2/projects/{projectId}/datasets/{datasetId}

insert POST /bigquery/v2/projects/{projectId}/datasets

list GET /bigquery/v2/projects/{projectId}/datasets

patch PATCH /bigquery/v2/projects/{projectId}/datasets/{datasetId}

update PUT /bigquery/v2/projects/{projectId}/datasets/{datasetId}

cancel POST /bigquery/v2/projects/{projectId}/jobs/{jobId}/cancel

get GET /bigquery/v2/projects/{projectId}/jobs/{jobId}

getQueryResults GET /bigquery/v2/projects/{projectId}/queries/{jobId}

insert POST /bigquery/v2/projects/{projectId}/jobs

list GET /bigquery/v2/projects/{projectId}/jobs

query POST /bigquery/v2/projects/{projectId}/queries

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.