0% found this document useful (0 votes)
5 views6 pages

Big Data Answers All Sets

The document provides a comprehensive overview of big data analytics, covering various analytic processes, characteristics of big data applications, and the significance of intelligent data analysis. It discusses stream processing, HDFS architecture, and the differences between conventional and intelligent computing, along with tools like PIG and HiveQL. Additionally, it addresses predictive analytics, regression vs classification, and the importance of statistical significance in model evaluation.

Uploaded by

22hp1a4449
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

Big Data Answers All Sets

The document provides a comprehensive overview of big data analytics, covering various analytic processes, characteristics of big data applications, and the significance of intelligent data analysis. It discusses stream processing, HDFS architecture, and the differences between conventional and intelligent computing, along with tools like PIG and HiveQL. Additionally, it addresses predictive analytics, regression vs classification, and the importance of statistical significance in model evaluation.

Uploaded by

22hp1a4449
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

BIG DATA ANALYTICS - EXAM QUESTIONS ANSWERED (SETS 1 TO 4)

UNIT I

Set 1

1a) Analytic Processes:

- Descriptive Analytics: Summarizes past data.

- Diagnostic Analytics: Examines why something happened.

- Predictive Analytics: Forecasts future outcomes.

- Prescriptive Analytics: Recommends actions.

- Cognitive Analytics: Uses AI for decision-making.

1b) Characteristics of Big Data Applications:

- Volume, Velocity, Variety, Veracity, Value

- Real-time processing, scalability, fault tolerance, and distributed architecture.

Set 2

1a) Intelligent Data Analysis: Uses AI techniques to find patterns and trends automatically.

1b) Sources & Significance of Big Data:

- Sources: Social media, IoT devices, sensors, transactions.

- Significance: Helps in real-time decision-making, customer insights, operational efficiency.

Set 3

1a) Nature of Data:

- Can be structured, semi-structured, or unstructured.

- Applications: social media, banking, healthcare.


1b) Challenges of Conventional Systems:

- Cannot handle large-scale data.

- Lack of scalability, real-time capability.

Set 4

1a) Conventional vs Intelligent Computing:

- Conventional: Rule-based.

- Intelligent: Learns and adapts (AI-based).

1b) Big Data Framework Features:

- Open-source, distributed, scalable, fault-tolerant (e.g., Hadoop, Spark).

UNIT II

Set 1

3a) Filtering a Stream: Selects relevant data based on conditions.

3b) Stream Data Model and Architecture: Involves continuous input, processing engine (e.g.,

Apache Storm), and storage.

Set 2

3a) Stream Processing & Distinct Counting:

- Processes real-time data.

- Use hashing/sketching to count unique items.

3b) Mining Data Streams & Filters:


- Finds patterns and trends.

- Filters remove unwanted data.

Set 3

3a) Stream Model with Diagram: Data flows from source -> processor -> sink.

3b) Real-Time Applications: Fraud detection, social media analysis, stock trading.

Set 4

3a) Data Streaming Concept: Real-time flow of data.

3b) Decaying Window Algorithm: Prioritizes recent data by giving it more weight.

UNIT III

Set 1

5a) HDFS Architecture:

- NameNode (metadata) and DataNodes (store data).

5b) Hadoop Streaming for Text Processing:

- Uses scripts for input/output. Works well for logs or natural language text.

Set 2

5a) HDFS Overview: Same as above.

5b) MapReduce Application Development: Define map and reduce logic, compile, and run job.
Set 3

5a) Hadoop Features: Fault-tolerant, scalable, open-source.

5b) Old vs New API: New API uses context object, is more flexible.

Set 4

5a) HDFS Write Operation: Client -> NameNode -> DataNodes, data is replicated.

5b) MapReduce Flow:

- Single reducer: All data to one node.

- Multiple reducers: Parallel processing.

UNIT IV

Set 1

7a) PIG Architecture Components: Parser, optimizer, execution engine.

7b) i) HBase: NoSQL DB on Hadoop.

ii) Zookeeper: Coordination service for distributed systems.

Set 2

7a) HBase Note: Column-oriented NoSQL, real-time access.

7b) PIG Architecture: Includes Pig Latin scripts, parser, optimizer.

Set 3

7a) PIG Data Processing Operators: LOAD, FILTER, FOREACH, GROUP, JOIN.
7b) PIG Modes: Local and MapReduce mode.

Set 4

7a) HiveQL Features: SQL-like, used for querying big data.

7b) Zookeeper: Manages config and sync across nodes.

UNIT V

Set 1

9a) Regression vs Classification:

- Regression = continuous output.

- Classification = categories.

9b) Predictive Analytics for Business:

- Increases efficiency, forecasts trends, improves decisions.

Set 2

9a) Predictive Analysis: Uses data to forecast outcomes.

9b) Simple Linear Regression: One independent and one dependent variable.

Set 3

9a) Interpret Coefficients: Show how variables affect the output.

9b) Statistical Significance: P-value shows confidence in coefficients.


Set 4

9a) Interpret p-values and Coefficients:

- Low p-value = strong evidence.

- Coefficient = effect size.

9b) Cross-Validation: Tests model on unseen data to check reliability.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy