0% found this document useful (0 votes)
5 views29 pages

Analysing Big Data

The document discusses the challenges of analyzing big data for decision-making in organizations, emphasizing the need for specialized software and technologies like OLAP for efficient data analysis. It highlights the importance of data quality characteristics such as accuracy, completeness, reliability, relevance, and timeliness. Additionally, it explains the role of OLAP technology in business intelligence and the multidimensional nature of business problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views29 pages

Analysing Big Data

The document discusses the challenges of analyzing big data for decision-making in organizations, emphasizing the need for specialized software and technologies like OLAP for efficient data analysis. It highlights the importance of data quality characteristics such as accuracy, completeness, reliability, relevance, and timeliness. Additionally, it explains the role of OLAP technology in business intelligence and the multidimensional nature of business problems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Analysing big data

Teacher:Hojiyev
Muhammad
Korxonalar va tashkilotlarga kelajakdagi
qarorlar qabul qilishda yordam berish
uchun katta ma'lumotlarni (katta
ma'lumotlar to'plami) tahlil qilishdagi
qiyinchiliklar:
 maxsus dasturiy vositalardan foydalanish bo‘yicha
yuqori malakaga bo‘lgan ehtiyoj, masalan. JMP va katta
hajmdagi ma'lumotlarni tahlil qilish uchun bashoratli
tahlil, ma'lumotlarni yig’ish(data mining), matn
yig’ish(text mining), prognozlash va ma'lumotlarni
optimallashtirish uchun ilovalar
 ko'p o'lchovli ma'lumotlarni tezda tahlil qilish uchun
onlayn analitik ishlov berish (OLAP) texnologiyasi
(hajm, tezlik, ma'lumotlar to'plamlarining mosligini
boshqarish, barcha ma'lumotlarga kirish), masalan.
murakkab hisob-kitoblarni yakunlash, ‘what if'
stsenariylarini bajarish, turli formatlarda hisobotlarni
ishlab chiqarish
 ma'lumotlarning haqiqiy, to'g'ri, dolzarb, tegishli,
yetarli(valid, accurate, current, relevant, sufficient)
bo'lishini ta'minlash
Biznes razvedkasi nima?
Biznes-tahlilchi ko'pincha biznes haqida katta tasavvurga ega
bo'lishni, jamlangan ma'lumotlarga asoslangan kengroq
tendentsiyalarni ko'rishni va bu tendentsiyalarni istalgan sonli
o'zgaruvchilar bo'yicha taqsimlashni xohlaydi. Biznes razvedkasi -
bu OLAP ma'lumotlar bazasidan ma'lumotlarni olish va keyin bu
ma'lumotlarni tahlil qilish jarayoni bo'lib, siz asosli biznes
qarorlarini qabul qilish va harakat qilish uchun foydalanishingiz
mumkin. Masalan, OLAP va biznes razvedkasi biznes
ma'lumotlariga oid quyidagi turdagi savollarga javob berishga
yordam beradi:
 2007 yildagi barcha mahsulotlarning umumiy sotuvi 2006 yildagi
umumiy sotuvlar bilan qanday taqqoslanadi?
 Bugungi kundagi rentabelligimiz so'nggi besh yildagi xuddi shu
davr bilan qanday taqqoslanadi?
 O'tgan yili 35 yoshdan oshgan mijozlar qancha pul sarflashdi va
vaqt o'tishi bilan bu xatti-harakatlar qanday o'zgardi?
 O'tgan yilning shu oyiga nisbatan bu oyda ikkita aniq
mamlakat/mintaqada qancha mahsulot sotilgan?
 Har bir mijozning yosh guruhi uchun mahsulot toifasi bo'yicha
rentabellik (ham marja foizi, ham jami) qanday taqsimlanadi?
 Yuqori va quyi sotuvchilar, distribyutorlar, sotuvchilar, mijozlar,
Online analytical processing
OLAP technology is used
for:
Strengthening FP&A teams
 Centralizing corporate data
Limitless report viewing
Complex analytical calculations
Data discovery
Security and governance
 Predictive and scenario planning
Fast access to data for analysis
 Security and governance
HOW OLAP TECHNOLOGY IS
USED
End-users perform ad-hoc data
analysis in multiple dimensions.
This provides insights and
understanding for better
decision-making and
responsiveness to management.
BUSINESS PROBLEMS
ARE
MULTIDIMENSIONAL

EX: Let's say you


are reporting for
Sales... You might
need 6 Dimensions:
Salesperson
Sales Amount
Region
Product
Month
Year
An OLAP Cube is a data structure that allows fast
analysis of the multiple dimensions that define any
business problem.

And, you can access OLAP cubes with your


spreadsheets, making it infinitely more powerful.
An OLAP solution can resolve the
challenges facing corporate
budgeting and forecasting teams
with fast, flexible analysis and
reporting
CURIOUS? You can have an agile FP&A team
that delivers quickly for the CFO
Typical traits of semantic modeling
Requirement Description
Schema Schema on write, strongly
enforced
Uses Transactions No
Locking Strategy None
Updateable No (typically requires recomputing
cube)
Appendable No (typically requires recomputing
cube)
Workload Heavy reads, read-only
Indexing Multidimensional indexing
Datum size Small to medium sized
Model Multidimensional
Data shape: Cube or star/snowflake schema
Query flexibility Highly flexible
Capability Azure SQL SQL Server Azure SQL
Analysis Server with Database
Services Analysis Columnstore with
Services Indexes Columnstore
Indexes
Is managed service Yes No No Yes
Supports No Yes No No
multidimensional
cubes
Supports tabular Yes Yes No No
semantic models
Easily integrate Yes Yes No 1
No 1

multiple data sources

Supports real-time No No Yes Yes


analytics
Requires process to Yes Yes No No
copy data from
source(s)
2
Capability Azure SQL Server SQL Server Azure SQL
Analysis Analysis with Database
Services Services Columnstore with
Indexes Columnstore
Indexes

Redundant Yes No Yes Yes


regional
servers for
high
availability

Supports Yes No Yes Yes


query scale
out

Dynamic Yes No Yes Yes


scalability
(scale up)
Typical traits of transactional data
Requirement Description
Normalization Highly normalized

Schema Schema on write, strongly enforced

Consistency Strong consistency, ACID guarantees

Integrity High integrity

Uses transactions Yes

Locking strategy Pessimistic or optimistic

Updateable Yes

Appendable Yes

Workload Heavy writes, moderate reads

Indexing Primary and secondary indexes

Datum size Small to medium sized

Model Relational

Data shape Tabular

Query flexibility Highly flexible

Scale Small (MBs) to Large (a few TBs)


When to use this solution

 Choose OLTP when you need to efficiently process and


store business transactions and immediately make them
available to client applications in a consistent way. Use
this architecture when any tangible delay in processing
would have a negative impact on the day-to-day
operations of the business.
 OLTP systems are designed to efficiently process and
store transactions, as well as query transactional data.
The goal of efficiently processing and storing individual
transactions by an OLTP system is partly accomplished
by data normalization — that is, breaking the data up
into smaller chunks that are less redundant. This
supports efficiency because it enables the OLTP system
to process large numbers of transactions independently,
and avoids extra processing needed to maintain data
integrity in the presence of redundant data.
Challenges
 Implementing and using an OLTP system can create a few challenges:
 OLTP systems are not always good for handling aggregates over large
amounts of data, although there are exceptions, such as a well-planned
SQL Server-based solution. Analytics against the data, that rely on
aggregate calculations over millions of individual transactions, are very
resource intensive for an OLTP system. They can be slow to execute and
can cause a slow-down by blocking other transactions in the database.
 When conducting analytics and reporting on data that is highly
normalized, the queries tend to be complex, because most queries need
to de-normalize the data by using joins. Also, naming conventions for
database objects in OLTP systems tend to be terse and succinct. The
increased normalization coupled with terse naming conventions makes
OLTP systems difficult for business users to query, without the help of a
DBA or data developer.
 Storing the history of transactions indefinitely and storing too much
data in any one table can lead to slow query performance, depending
on the number of transactions stored. The common solution is to
maintain a relevant window of time (such as the current fiscal year) in
the OLTP system and offload historical data to other systems, such as a
data mart or data warehouse.
General capabilities

Capability Azure SQL SQL Server in Azure Azure


Database an Azure Database for Database for
virtual MySQL PostgreSQL
machine

Is Managed Yes No Yes Yes


Service
Runs on N/A Windows, N/A N/A
Platform Linux, Docker
Programmabi T-SQL, .NET, T-SQL, .NET, SQL SQL,
lity 1 R R, Python PL/pgSQL,
PL/JavaScript
(v8)
Scalability capabilities

Capability Azure SQL SQL Server Azure Azure


Database in an Azure Database Database
virtual for MySQL for
machine PostgreSQL
Maximum 4 TB 256 TB 16 TB 16 TB
database
instance
size
Supports Yes Yes No No
capacity
pools
Supports No Yes No No
clusters
scale out
Dynamic Yes No Yes Yes
scalability
(scale up)
Analytic workload capabilities

Capability Azure SQL SQL Server Azure Azure


Database in an Azure Database for Database for
virtual MySQL PostgreSQL
machine
Temporal Yes Yes No No
tables
In-memory Yes Yes No No
(memory-
optimized)
tables
Columnstore Yes Yes No No
support
Adaptive Yes Yes No No
query
processing
Availability capabilitiesAvailability
capabilities
Capability Azure SQL SQL Server in Azure Database Azure Database
Database an Azure virtual for MySQL for PostgreSQL
machine

Readable Yes Yes Yes Yes


secondaries

Geographic Yes Yes Yes Yes


replication

Automatic Yes No No No
failover to
secondary
Point-in-time Yes Yes Yes Yes
restore
Security capabilities
Capability Azure SQL SQL Server in an Azure Azure
Database Azure virtual Database for Database for
machine MySQL PostgreSQL
Row level security Yes Yes Yes Yes

Data masking Yes Yes No No


Transparent data Yes Yes Yes Yes
encryption
Restrict access to specific Yes Yes Yes Yes
IP addresses
Restrict access to allow Yes Yes Yes Yes
VNet access only
Microsoft Entra Yes No Yes Yes
authentication
Active Directory No Yes No No
authentication

Multi-factor authentication Yes No Yes Yes

Supports Always Encrypted Yes Yes No No

Private IP No Yes No No
5 Ma'lumotlar sifatining xususiyatlari
 Ma'lumotlar sifati juda muhim - u ma'lumot ma'lum
bir kontekstda (masalan, ma'lumotlarni tahlil qilish
kabi) o'z maqsadiga xizmat qila oladimi yoki
yo'qligini baholaydi. Xo'sh, berilgan ma'lumotlar
to'plamining sifatini qanday aniqlash mumkin? Siz
bilishingiz kerak bo'lgan ma'lumotlar sifati
xususiyatlari mavjud.
 Maʼlumotlar sifati boʻyicha siz beshta xususiyatni
topasiz: aniqlik, toʻliqlik, ishonchlilik, dolzarblik va
dolzarblik – koʻproq maʼlumot olish uchun oʻqing.
 Aniqlik
 To'liqlik
 Ishonchlilik
 Muvofiqlik
 Vaqtinchalik
Xarakterli U qanday o'lchanadi

Aniqlik Ma'lumotlar har bir tafsilotda to'g'rimi?

To'liqlik Ma'lumot qanchalik to'liq?

Ishonchlilik Ma'lumotlar boshqa ishonchli manbalarga


zid keladimi?

Muvofiqlik Sizga haqiqatan ham bu ma'lumot kerakmi?

Vaqtinchalik Ma'lumotlar qanchalik dolzarb? U real vaqt


rejimida hisobot berish uchun ishlatilishi
mumkinmi?
Aniqlik
 Nomidan ko'rinib turibdiki, bu ma'lumotlar sifati
xarakteristikasi ma'lumotlarning to'g'riligini
anglatadi. Ma'lumotlarning to'g'ri yoki noto'g'riligini
aniqlash uchun o'zingizdan so'rang, bu ma'lumot
haqiqiy vaziyatni aks ettiradimi? Masalan, moliyaviy
xizmatlar sohasida mijozning bank hisobvarag‘ida
haqiqatan ham 1 million dollar bormi?
 Aniqlik ma'lumotlar sifatining muhim belgisidir,
chunki noto'g'ri ma'lumotlar jiddiy oqibatlarga olib
keladigan jiddiy muammolarni keltirib chiqarishi
mumkin. Biz yuqoridagi misoldan foydalanamiz –
agar mijozning bank hisobidagi xatolik bo‘lsa, bu
kimdir unga o‘zi bilmagan holda kirgan bo‘lishi
mumkin.
To'liqlik

 "To'liqlik" ma'lumotlarning qanchalik keng


qamrovli ekanligini anglatadi. Ma'lumotlarning
to'liqligini ko'rib chiqayotganda, sizga kerak
bo'lgan barcha ma'lumotlar mavjudligi haqida
o'ylang; sizga mijozning ismi va familiyasi kerak
bo'lishi mumkin, lekin o'rtadagi bosh harf ixtiyoriy
bo'lishi mumkin.
 Nima uchun to'liqlik ma'lumotlar sifati
xarakteristikasi sifatida muhim? Agar ma'lumot
to'liq bo'lmasa, undan foydalanish mumkin emas.
Aytaylik, siz xat jo‘natyapsiz. Pochta to'g'ri
manzilga borishini ta'minlash uchun sizga
mijozning familiyasi kerak bo'ladi - busiz
ma'lumotlar to'liq emas.
Ishonchlilik

 Ma'lumotlar sifati xususiyatlari sohasida


ishonchlilik ma'lumotlarning boshqa manba yoki
tizimdagi boshqa ma'lumotlarga zid kelmasligini
anglatadi. Biz sog'liqni saqlash sohasidan misol
keltiramiz; Agar bemorning tug'ilgan kuni bir
tizimda 1970 yil 1 yanvar bo'lsa, boshqa tizimda
1973 yil 13 iyun bo'lsa, ma'lumot ishonchsizdir.
 Ishonchlilik ma'lumotlar sifatining muhim
xususiyati hisoblanadi. Agar ma'lumotlar bir-
biriga zid bo'lsa, ma'lumotlarga ishonib
bo'lmaydi. Siz kompaniyangizning puliga va
obro'siga putur etkazadigan xatoga yo'l
qo'yishingiz mumkin.
Muvofiqlik(Relevance)

 Ma'lumotlar sifati xususiyatlarini ko'rib


chiqayotganda, ahamiyatlilik o'ynaydi, chunki
birinchi navbatda nima uchun ushbu
ma'lumotni to'playotganingiz uchun yaxshi
sabab bo'lishi kerak. Siz haqiqatan ham bu
ma'lumotga muhtojmisiz yoki uni faqat shu
maqsadda yig'yapsizmi, deb o'ylashingiz kerak.
 Nima uchun aloqadorlik ma'lumotlar sifati
xarakteristikasi sifatida muhim? Agar siz
ahamiyatsiz ma'lumotlarni to'playotgan
bo'lsangiz, vaqt va pulni behuda sarf qilasiz.
Sizning tahlillaringiz unchalik qimmatli
bo'lmaydi.
Vaqtinchalik
Vaqtinchalik, nomidan ko'rinib turibdiki, ma'lumotlarning
qanchalik dolzarbligini anglatadi.
Agar u so'nggi bir soat ichida to'plangan bo'lsa, bu o'z
vaqtida - agar oldingi ma'lumotni foydasiz qiladigan yangi
ma'lumotlar kirmasa.
Axborotning o'z vaqtida bo'lishi ma'lumotlar sifatining
muhim belgisidir, chunki o'z vaqtida bo'lmagan
ma'lumotlar odamlarni noto'g'ri qaror qabul qilishga olib
kelishi mumkin. O'z navbatida, bu tashkilotga vaqt, pul va
obro'ga putur etkazadi.
"O'z vaqtidalik - bu ma'lumotlar sifatining muhim
xususiyati - eskirgan ma'lumotlar kompaniyaga vaqt va pul
sarflaydi"
Bugungi biznes muhitida ma'lumotlar sifati xususiyatlari
sizning ma'lumotlaringizdan maksimal darajada
foydalanishni ta'minlaydi. Sizning ma'lumotlaringiz ushbu
standartlarga javob bermasa, u qimmatli emas. Aniq
ta'minlaydima'lumotlar sifati
yechimlarimaʼlumotlaringizning aniqligi, toʻliqligi,

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy