100% found this document useful (3 votes)
4K views52 pages

M1 DS21-Pengantar Sains Data Dan Analisis Big Data

The document provides an overview of data science including definitions of data science and data scientists from NIST. It discusses the multi-disciplinary nature of data science and the typical skills and roles of data scientists. The document also lists profiles of data science graduates including managers, professionals, technicians and support workers along with their typical job descriptions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (3 votes)
4K views52 pages

M1 DS21-Pengantar Sains Data Dan Analisis Big Data

The document provides an overview of data science including definitions of data science and data scientists from NIST. It discusses the multi-disciplinary nature of data science and the typical skills and roles of data scientists. The document also lists profiles of data science graduates including managers, professionals, technicians and support workers along with their typical job descriptions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

UG

Team Teaching
PENGANTAR
SAINS DATA DAN ANALISIS BIG DATA

UNIVERSITAS GUNADARMA
Agenda
1) GAMBARAN UMUM TENTANG SAINS DATA
2) PROFIL LULUSAN SAINS DATA & TIM SAINS
DATA
3) HUBUNGAN ANTARA SAINS DATA, BIG DATA,
AI, MACHINE LEARNING & DEEP LEARNING
DEFINISI DATA SCIENCE DARI NIST

Definisi Data Science dari NIST (2018).

Data science is the extraction of useful knowledge directly from data


through a process of discovery, or of hypothesis formulation and
hypothesis testing.
GAMBARAN UMUM TENTANG
SAINS DATA

SESSION 1
APA ITU SAINS DATA

Programmer Statistian Programmer


Business Analyst Business Analyst

Data Scientist
APA ITU SAINS DATA
SAINS DATA: MULTI-DISIPLIN
SIKLUS HIDUP-NYA
KOMPONEN-KOMPONEN-NYA
SET KETRAMPILAN DAN PERAN DATA
SCIENTIST
PENERAPAN UTAMA SAINS DATA
PENERAPAN UTAMA SAINS DATA
PROSES SAINS DATA
DEFINISI DATA SCIENTIST DARI NIST

Definitions by NIST Big Data WG (NIST SP1500 - 2015)


• A Data Scientist is
• a practitioner who has sufficient knowledge in the overlapping
regimes of expertise in business needs, domain knowledge,
analytical skills, and programming and systems engineering
expertise to manage the end-to-end scientific method process
through each stage in the big data lifecycle.
• Data science is the empirical synthesis of actionable knowledge and
technologies required to handle data from raw data through the
complete data lifecycle process.
PERAN DATA SCIENTIST
CIRI-CIRI DATA SCIENTIST
MODERN DATA SCIENTIST
MODERN DATA SCIENTIST
PILIHAN KARIR DATA SCIENTIST
TIPIKAL PROYEK DATA SCIENTIST
JENJANG KARIR
DATA SCIENTIST VS DATA ANALYST
DATA SCIENTIST VS STATISTIAN
PROFIL LULUSAN SAINS DATA
DAN
TIM SAINS DATA

SESSION 2
DAFTAR PROFIL LULUSAN PRODI SAINS DATA

Profil Profesional Sains Data tergolong keluarga pekerjaan (okupasi) terkait data.
Profil ini didefinisikan sebagai perluasan dari taksonomi pekerjaan (okupasi) ESCO
(European Skills, Competences, Qualiications and Occupations)
Pekerjaan baru yang diusulkan ditempatkan dalam empat kelompok klasifikasi
teratas:
1) Manager, untuk peran manajerial
2) Professional, untuk pengembang aplikasi dan insinyur/perekayasa
infrastruktur (infrastructure engineers)
3) Teknisi dan Profesional Madya (associate professionals), untuk operator dan
teknisi
4) Pekerja pendukung klerikal (Clerical support workers) , untuk kurator dan
pengurus (stewards) data
DAFTAR PROFIL LULUSAN PRODI SAINS DATA

1. Manager (S2) Peran/Deskripsi Tugas


A. Data science (group) manager data Proposes, plans and manages functional and technical
atau analytics department manager evolutions of the data science operations within the
relevant domain (technical, research, business)
B. Data science infrastructure manager Proposes, plans and manages functional and technical
atau research infrastructure data evolutions of the big data infrastructure within the
storage facilities manager relevant domain (technical research business)
C. Research infrastructure manager atau Proposes, plans and manages functional and technical
research infrastructure data storage evolutions of the research infrastructure within the
facilities manager) relevant scientific domain.
DAFTAR PROFIL LULUSAN PRODI SAINS DATA

2. Profesional (Data science professionals) Peran/Deskripsi Tugas


A. Data scientist (S2) Data scientists find and interpret rich data sources, manage
large amounts of data, merge data sources, ensure consistency
of datasets and create visualizations to aid in understanding
data. Build mathematical models, present and communicate
data insights and findings to specialists and scientists and
recommend ways to apply the data.

B. Data science researcher (S2) Data science researcher applies scientific discovery
research/process, including hypothesis and hypothesis testing,
to obtain actionable knowledge related to scientific problem,
business process, or reveal hidden relations between multiple
processes.

C. Data science architect atau system architect atau Designs and maintains the architecture of data science
applications architect (S1 atau S2) applications and facilities. Creates relevant data models and
processes worklows.
DAFTAR PROFIL LULUSAN PRODI SAINS DATA

2. Profesional (Data science professionals) Peran/Deskripsi Tugas


D. Data science (application) programmer/ Designs/develops/codes large data analytics
engineer atau scientific programmer, data applications to support scientific or
engineer) (S1 atau S2) enterprise/business processes
E. (Big) Data analyst (S1 atau S2) Analyses a large variety of data to extract
information about system, service or organization
performance and presents them in
usable/actionable form.
F. Business analyst (S1) Analyses a large variety of data Information system
for improving business performance.
DAFTAR PROFIL LULUSAN PRODI SAINS DATA

2. Profesional (Data science technology Peran/Deskripsi Tugas


professionals)
A. Data steward (S1) Plans, implements and manages (research) data input, storage,
search, presentation; creates data model for domain specific
data; supports and advises domain scientists/researchers.
Creates data model for domain-specific data, supports and
advises domain scientists/researchers during the whole
research cycle and data management life cycle
B. Digital data curator atau digital curator, digital Finds, selects, organizes, shares (exhibits) digital data
archivist, digital librarian (S1) collections, maintains their integrity, up-to-date status and
freshness, discoverability.
C. Data librarian (S1) Data librarians perform or support one or more of the following:
acquisition (collection development), organization (cataloguing
and metadata) and the implementation of appropriate user
services. Data librarians apply traditional librarianship principles
and practices to data management, including data citation,
digital object identifiers (DOIs), ethics and metadata.
DAFTAR PROFIL LULUSAN PRODI SAINS DATA

2. Profesional (Data science technology Peran/Deskripsi Tugas


professionals)
D. Data archivist atau digital archivists (S1) Maintain historically signiicant collections of
datasets, documents and records and other
electronic data and seek out new items for
archiving.
DAFTAR PROFIL LULUSAN PRODI SAINS DATA

2. Profesional (Database and network Peran/Deskripsi Tugas


professionals)
Large-scale (cloud) data storage designers and
administrators
A. Large-scale (cloud) database designer Designs/develops/codes large-scale databases
(data engineer, data architect) (S1) and their use in domain/subject-specific
applications according to the customer needs
B. Large-scale (cloud) database administrator Designs and implements or monitors and maintains
large-scale cloud databases.
C. Scientific database administrator (S1) Designs and implements or monitors and maintains
large-scale scientiic databases
DAFTAR PROFIL LULUSAN PRODI SAINS DATA

3. Teknisi dan Profesional Madya Peran/Deskripsi Tugas


(Technicians and associate professionals)
Data infrastructure engineers and technicians
A. Big data facilities operators (D3 atau S1) Manages daily operation of facilities and resources
and responds to customer requests. Includes all
operations related to data management and data
life cycle.
B. Large-scale (cloud) data storage operators Manages daily operation of cloud storage,
(D3 atau S1) including related to data life cycle, and responds to
requests from storage users
C. Scientific database operator (D3 – S1) Manages daily operation of scientific databases,
including related to data life cycle, and responds to
requests from database users.
DAFTAR PROFIL LULUSAN PRODI SAINS DATA

4. Pekerja pendukung klerikal (Clerical Peran/Deskripsi Tugas


support worker)
Data and information entry and access
A. Data entry/access desk/terminal Enter data into data management systems directly
workers(D3) reading them from source, documents or obtained
from people/users
B. Data entry ield workers (D3) The same work done on field when collecting data
from disconnected sensors or doing direct counting
or reading
C. User support data services (D3 ) User support data services. Support users to entry
their data into governmental service and user
facing applications.
DATA SCIENCE PROFESSIONS FAMILY (EDISON
DATA SCIENCE FRAMEWORK)

Managers: Chief Data Officer (CDO), Data Science (group/dept) EDISON – Education for Data
manager, Data Science infrastructure manager, Research Infrastructure Intensive Science to Open New
manager science frontiers
Professionals: Data Scientist, Data Science Researcher, Data Science
Architect, Data Science (applications) programmer/engineer, Data
Analyst, Business Analyst, etc.

Professional (database): Large scale (cloud) database designers and


administrators, scientific database designers and administrators

Professional and clerical (data handling/management): Data


Stewards, Digital Data Curator, Digital Librarians, Data Archivists

Technicians and associate professionals: Big Data facilities operators,


scientific database/infrastructure operators

Icons used: Credit to [ref] https://www.datacamp.com/community/tutorials/data-science-industry-infographic


MEMBANGUN TIM SAINS DATA
HUBUNGAN ANTARA SAINS DATA,
BIG DATA, AI, MACHINE LEARNING
DAN DEEP LEARNING

SESSION 3
HUBUNGAN DS-BD-AI-ML-DL DEWASA INI

Source: adaptation from Ian Goodfellow, et.al 2016 & and Matthew Mayo, 2016
MACHINE LEARNING TECHNIQUES

Machine learning mainly has


three types of learning
techniques:
• Supervised learning
• Unsupervised learning
• Reinforcement learning
MACHINE LEARNING TASKS CATEGORIES

1. Classification
2. Regression
3. Clustering
4. Anomaly detection
5. Association
6. Recommendation
7. Dimensionality reduction
8. Computer Vision
9. Text Analytics
PROSES MACHINE LEARNING
TOOL IMPLEMENTASI: MATLAB

• Matlab https://www.mathworks.com/products/matlab.html
• Komersial versi terakhir R2020a
• Tersedia Toolbox: AI, Data Science, and Statistics
• Statistics and Machine Learning Toolbox
• Deep Learning Toolbox
• Reinforcement Learning Toolbox
• Text Analytics Toolbox
• Predictive Maintenance Toolbox
• Link buku Matlab:
https://drive.google.com/drive/folders/1qHLqc2kYrI7REC2UClijIZhrzICmm8AF?us
p=sharing
• Link buku Deep Learning with Matlab:
https://drive.google.com/drive/folders/1QuU9tAMPF-
XPwM4WmSBRiSYQoj8aA9Wg?usp=sharing
TOOL IMPLEMENTASI: RAPIDMINER

• RapidMiner https://rapidminer.com/
• platform perangkat lunak data science
• yang dikembangkan oleh perusahaan bernama sama dengan yang menyediakan lingkungan
terintegrasi untuk data preparation, machine learning, deep learning, text mining, and
predictive analytics.
• Digunakan untuk bisnis dan komersial, juga untuk penelitian, pendidikan, pelatihan, rapid
prototyping, dan pengembangan aplikasi serta mendukung semua langkah dalam proses
machine learning termasuk data preparation, results visualization, model validation and
optimization.
• RapidMiner dikembangkan pada open core model. Dengan RapidMiner Studio Free Edition,
yang terbatas untuk 1 prosesor logika dan 10.000 baris data, tersedia di bawah lisensi AGPL.
RapidMiner Studio 9.7
(https://my.rapidminer.com/nexus/account/index.html#downloads) Harga komersial
dimulai dari $2.500 dan tersedia dari pengembang.
• Link buku RapidMiner: https://drive.google.com/drive/folders/1ln2R4ryr2qj_Iwbk-
ZZT_T9wTyvpuhaN?usp=sharing
TOOL IMPLEMENTASI: R-STUDIO
MENGAPA PAKAI R LANGUAGE ?

• R is a free, open-source software and programming language developed in 1995


at the University of Auckland as an environment for statistical computing and
graphics (Ikaha and Gentleman, 1996).
• Since then R has become one of the dominant software environments for data
analysis and is used by a variety of scientific disiplines, including soil science,
ecology, and geoinformatics (Envirometrics CRAN Task View; Spatial CRAN Task
View).
• R is particularly popular for its graphical capabilities, but it is also prized for it’s
GIS capabilities which make it relatively easy to generate raster-based models.
• More recently, R has also gained several packages which are designed
specifically for analyzing soil data.
MENGAPA PAKAI R LANGUAGE ?
BUKU-BUKU R LANGUAGE
TOOL IMPLEMENTASI: PYTHON, JUPYTER,
ANACONDA
• Python
• Versi 3.8.X
• Tersedia IDE: Spyder https://www.spyder-ide.org/
• Tool interactive: Jupyter (Project Jupyter exists to develop open-source software,
open-standards, and services for interactive computing across dozens of
programming languages.) https://jupyter.org/
• Toolkit: Anaconda (the open-source Individual Edition (Distribution) is the easiest way
to perform Python/R data science and machine learning on a single machine.
Developed for solo practitioners, it is the toolkit that equips you to work with
thousands of open-source packages and libraries) https://www.anaconda.com/
• Google Colab Colaboratory, or "Colab" for short, allows you to write and execute
Python in your browser, with
• Zero configuration required
• Free access to GPUs
• Easy sharing
• Whether you're a student, a data scientist or an AI researcher, Colab can make
your work easier https://colab.research.google.com/notebooks/intro.ipynb
TOOL IMPLEMENTASI: ANACONDA
TOOL IMPLEMENTASI: JUPYTER
LINK BUKU-BUKU

• Big-data dan Data Science:


https://drive.google.com/drive/folders/18jbNHjUWsRor8W64o
NDxggOd_yWqMzHs?usp=sharing
• Deep Learning dan Machine Learning:
https://drive.google.com/drive/folders/1hJ-
E5OJhg35R7LC7_bHy99ccoY7CJ3nO?usp=sharing
• Python:
https://drive.google.com/drive/folders/1zqr5GPjQhP96XqKcW
MxcmeWiMAAZ1iVx?usp=sharing
LINK BUKU-BUKU

• scikit-learn user guide, Mar 01, 2019 :


https://drive.google.com/drive/folders/1rRsU6WdnPUlT3d9Nc
sTuk2f6N92PLZkc?usp=sharing
Terima Kasih

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy