Data Science My Notes
Data Science My Notes
(8–10 marks)
Introduction
In Data Science, data is the foundation for analysis, modeling, and decision-making. Data can be
broadly classified into structured and unstructured forms based on how it is stored, managed, and
analyzed. Understanding this difference is crucial for selecting appropriate tools, storage systems,
and processing methods.
Structured data is organized in a predefined format (like rows and columns) and is stored in
relational databases.
Characteristics:
• Clearly defined schema
Examples:
• Customer details in an Excel sheet (Name, Age, Email)
Characteristics:
• No fixed schema
Examples:
• Emails, social media posts
• Images, videos, audio files
• Chat transcripts
Relational databases (e.g., MySQL, Data lakes, NoSQL DBs (e.g., MongoDB,
Storage
Oracle) Hadoop)
Processing
SQL, Excel, BI tools NLP, Machine Learning, Deep Learning
Tools
Ease of
Easy to query and analyze Requires complex preprocessing
Analysis
Conclusion
Structured and unstructured data are two fundamental categories in data science, each requiring
different storage, processing, and analytical techniques. While structured data is easier to handle
using traditional tools, unstructured data provides richer, more complex information that demands
advanced technologies for meaningful insights.
Q: Explain various stages in the Data Science process
briefly.
Introduction
Data Science is an interdisciplinary field that involves extracting meaningful insights from data using
scientific methods, algorithms, and tools. The data science process is a structured workflow that
guides how raw data is turned into actionable knowledge. It involves multiple stages, each
contributing to accurate and impactful decision-making.
1. Problem Definition
• Purpose: Clearly define the objective or business question.
• Activities: Understand the domain, define KPIs (Key Performance Indicators), and set success
metrics.
2. Data Collection
• Purpose: Gather relevant raw data from various sources.
• Purpose: Create, modify, or select relevant features that improve model performance.
• Activities: Encoding categorical data, normalization, feature selection.
6. Model Building
• Purpose: Train machine learning models to learn patterns from data.
7. Model Evaluation
8. Deployment
• Purpose: Integrate the model into a production environment.
• Activities: Update models with new data, monitor drift or accuracy drops.
Conclusion
The data science process is a systematic approach to solving real-world problems using data. Each
stage—from problem identification to deployment and maintenance—plays a critical role in building
reliable, effective, and impactful data-driven solutions.
An API (Application Programming Interface) is a set of rules, protocols, and tools that allows one
software application to interact with another. APIs act as intermediaries that enable communication
between two programs without exposing the internal logic or code.
In simple terms, APIs allow different software systems to talk to each other in a standardized way.
Definition
API is a software interface that provides access to functionality or data of a software application,
service, or platform, through requests and responses.
Type Description
Instead of building your own weather service, you can use a Weather API (like OpenWeatherMap).
How it works:
CopyEdit
GET https://api.openweathermap.org/data/2.5/weather?q=Delhi&appid=your_api_key
•
• Your app uses this response to display weather info to users.
Benefits of APIs
• Code reusability
• Faster development
• Promotes modularity
Conclusion
APIs play a crucial role in modern software development by enabling interoperability, scalability, and
efficiency. Whether you're building web apps, mobile apps, or integrating systems, APIs allow
developers to access external functionalities without reinventing the wheel.
Definition:
Web Scraping is an automated method used to extract large amounts of data from websites using
computer programs or scripts.
It allows users to collect structured data from unstructured or semi-structured web content (like
HTML pages).
Key Concepts:
• Target: Webpages containing useful information (e.g., product prices, news articles, job
listings).
o BeautifulSoup
o Requests
o Selenium
o Scrapy
Basic Steps:
1. Send HTTP Request
4. Store Data
Applications:
• Price tracking (e.g., Flipkart, Amazon)
• News monitoring
• Competitor analysis
Limitations:
• Web structure may change frequently
Conclusion:
Web scraping is a powerful data extraction technique widely used in data science, business
intelligence, and automation. It requires responsible usage to balance technical capability with
ethical and legal compliance.
Definition:
A Relational Database (RDB) is a type of database that stores data in the form of tables (also called
relations) consisting of rows and columns.
Each table represents an entity, and relationships between tables are maintained using keys.
Key Concepts:
Term Description
Foreign Key A field in one table that refers to the primary key of another
• Ensures data integrity using constraints (e.g., primary key, foreign key)
• Supports ACID properties for transaction reliability:
o Atomicity
o Consistency
o Isolation
o Durability
Advantages:
• Easy to organize and retrieve data
• Oracle
• PostgreSQL
• SQLite
scss
[STUDENT] [COURSE]
RollNo (PK) ←──FK──→ CourseID (PK)
Name CourseName
Conclusion:
Relational databases provide a systematic and efficient way to store and manage data. They are
widely used in business, finance, education, and many other fields where structured data and
relationships are key.
1. Qualitative Data:
Characteristics:
• Descriptive in nature
• Cannot be measured in numbers
Examples:
• Gender: Male, Female, Other
Types:
• Nominal Data – No natural order (e.g., hair color)
• Ordinal Data – Has a logical order (e.g., satisfaction level: low, medium, high)
2. Quantitative Data:
Quantitative data is numerical data that represents measurable quantities and can be analyzed
mathematically.
Characteristics:
• Expressed in numbers
• Suitable for statistical operations
Examples:
• Age: 21 years
• Height: 165 cm
• Marks: 85 out of 100
Types:
Comparison Table:
Conclusion:
Both qualitative and quantitative data are essential in data science and analytics. While qualitative
data helps understand the context and meaning, quantitative data supports measurable analysis
and predictions.
Q: How to be aware of and recognize the
challenges that arise in Data Science?
Introduction
Data Science, despite being a powerful tool for extracting insights from data, presents numerous
challenges at every stage of its process. Being aware of these challenges allows data scientists to
prepare in advance, make informed decisions, and improve the accuracy and ethical standards of
their models.
• Anticipate issues at each stage, e.g., missing data during cleaning or overfitting during
modeling.
• Follow industry blogs, research papers, and attend webinars to stay technologically aware.
o Missing values
o Inconsistent formats
o Duplicate entries
Challenge Example
Conclusion
Being aware and proactive about data science challenges is essential for success. Through
continuous learning, ethical mindfulness, and hands-on experience, data scientists can effectively
recognize, mitigate, and overcome obstacles, leading to more robust and trustworthy solutions.
Q: How to understand the process of Data Science by
identifying the problem to be solved?
Introduction
The first and most critical step in any data science project is to identify and define the problem to
be solved. Without a clear understanding of the problem, even the most advanced models will fail to
deliver meaningful results.
Example:
Problem: “Sales are declining.”
Data Science Translation: “Can we build a model to predict future sales or identify customer
segments?”
Ask:
CopyEdit
[Real-world Problem]
↓
[Understand Business Goal]
↓
Conclusion
Identifying the right problem is the foundation of a successful data science project. A well-defined,
measurable, and domain-aware problem ensures that all further steps—data collection, modeling,
and evaluation—are aligned toward solving a meaningful issue with actionable insights.
Introduction
In the field of Data Science, both analysis and reporting play critical roles in converting raw data into
meaningful insights. Though often used interchangeably, they serve different purposes in the
decision-making process.
1. Data Analysis
Data Analysis is the process of inspecting, cleaning, transforming, and modeling data to discover
useful patterns, trends, and relationships.
Characteristics:
2. Data Reporting
Data Reporting refers to the presentation of collected and processed data in the form of charts,
tables, and dashboards for business understanding.
Characteristics:
• Involves data summarization
Example:
Generating a weekly report showing total sales by region and product.
Comparison Table:
Conclusion
While reporting helps in understanding past and current performance through structured data
presentation, analysis goes a step further to extract deeper insights and build predictive models.
Both are essential in the data science pipeline to support strategic decisions.
These datasets are generated from various sources such as social media, sensors, e-commerce,
mobile devices, and IoT.
V Description
• Web clickstreams
Technologies Used:
• Hadoop, Spark – for distributed processing
Conclusion:
Big Data plays a vital role in modern data-driven decision-making. By leveraging the power of big
data analytics, organizations can gain valuable insights, improve efficiency, and create competitive
advantages.
Unit 2
Q: What is JSON Format? Explain with Example.
1. Introduction:
JSON (JavaScript Object Notation) is a text-based format used to store and exchange data.
Human-readable
Machine-parsable
Popular in web APIs and data transfer between frontend–backend or app–server.
Feature Description
json
CopyEdit
"name": "Komendra",
"age": 22,
"isStudent": true,
"address": {
"city": "Raipur",
"state": "Chhattisgarh"
"age" 22 Number
J – JavaScript Syntax
A – Array & Object supported
S – Structured as key-value
O – Object in curly braces
N – Nested possible
S – Syntax rules (comma, colon)
T – Text-based
A – APIs & Apps use it
R – Readable
S – Supported by many languages
Explanation:
• The original array has 6 rows, which we divide into 3 equal parts (each with 2 rows).
• We use np.split() with axis=0 to split along rows.
• If you wanted to split columns, use axis=1.
Output:
write a python program to demostrate the following operation .
Assume data for each 1. Insertnig a new element in list 2.deleting an
element from the dictionary3, accessing 3rd to 5th element from the
list 4. displaying last four char from stirng "ILovepython"
Output:
Explanation of Operations:
Example:
2. Pandas
3. Matplotlib
Example:
4. Scikit-learn
Example:
5. Tkinter
Example:
6. BeautifulSoup
7. Flask / Django
• Flask: Lightweight web framework for creating APIs and web apps
• Django: Full-stack web framework with built-in ORM and admin panel
8. OpenCV
Example Code:
Part 2: Step-by-Step Process to Generate a Scatter Plot
A scatter plot displays points that show the relationship between two numerical variables.
Step-by-Step Procedure:
• x, y: Data points
• color: Dot color
• marker: Style of the dot ('o' is a circle)
Output:
A graph where each (x, y) pair is shown as a dot on the 2D plane, useful for observing correlations or
distributions.
Conclusion
Web scraping is the process of extracting data from websites using automated tools. In the context of
data science, it is a vital method for gathering large volumes of data when APIs are not available.
• Market research
• Sentiment analysis
• Price tracking
• News aggregation
• Social media mining
A data scientist scrapes product prices from e-commerce websites (like Flipkart or Amazon) and uses
the data for price trend analysis or market basket prediction.
Best Practices:
Conclusion
Web scraping is an essential technique in data science to collect raw data from the internet. With
tools like BeautifulSoup, Requests, and Selenium, data scientists can build data pipelines that power
analytics, models, and dashboards.
Let me know if you'd like a visual diagram or a project-level example (e.g., scraping job postings,
news, or product reviews)!
Definition:
Data manipulation refers to the process of adjusting, organizing, or transforming raw data to make it
suitable for analysis.
Key Points:
• Involves operations like filtering, sorting, grouping, merging, aggregating, and cleaning data.
• Helps in removing errors, handling missing values, and reshaping datasets.
• Commonly done using tools like Pandas in Python (.filter(), .groupby(), .merge(), .dropna()).
• Essential for preparing data before applying statistical or machine learning models.
Example:
Converting raw sales data into monthly summaries by grouping and aggregating.
2. Rescaling
Definition:
Rescaling is the process of transforming data values to a common scale without distorting differences
in the ranges of values.
Key Points:
Important for machine learning algorithms sensitive to the magnitude of data (e.g., KNN, SVM).
Unit 4
Question:
(a) Explain different steps for data preprocessing.
Answer:
Introduction:
Data Preprocessing is the process of cleaning, transforming, and organizing raw data into a suitable
format for analysis or model building.
Raw data often contains missing values, errors, duplicates, or irrelevant information, which can
negatively affect model performance.
Preprocessing ensures higher accuracy, efficiency, and reliability of results.
1. Data Cleaning:
• Correct Errors:
• Remove Duplicates:
2. Data Integration:
• Aggregation:
4. Data Reduction:
6. Data Splitting:
(iii) Quartiles:
(iv) Variance and Standard Deviation:
Question:
Explain different types of probability with example.
Answer:
Introduction:
Types of Probability:
• Formula:
•
• Example:
Probability of getting a 3 when rolling a fair die = 1/6
• Formula:
•
• Example:
If you toss a coin 100 times and get heads 55 times,
Probability of getting heads =
55/100=0.55
3. Subjective Probability:
• Example:
A doctor may believe there is a 90% chance a patient will recover based on their experience,
even without exact medical statistics.
4. Conditional Probability:
• Formula:
•
• Example:
Probability that a student passed Math, given they passed Science.
5. Joint Probability:
• Formula:
•
• Example:
Probability that a person is a female and likes sports.
Conclusion:
Understanding different types of probability is essential for correctly modeling uncertainty in real-
world scenarios.
Each type is applied based on the nature of the data and the problem being solved.
Question
Difference Between Supervised, Unsupervised, and
Reinforced Learning:
Aspect Supervised Learning Unsupervised Learning Reinforced Learning
Classification (e.g., spam Clustering (e.g., customer Game playing (e.g., chess,
Examples detection), Regression (e.g., segmentation), Association Go), Robotics, Self-
house price prediction) (e.g., market basket analysis) learning systems
Question:
Define Machine Learning. What are the different
applications of machine learning?
Answer:
1. Healthcare:
• Disease Diagnosis: ML models help in diagnosing diseases like cancer, diabetes, and heart
disease by analyzing medical images (X-rays, MRIs) and patient records.
• Drug Discovery: ML accelerates the process of discovering new drugs by analyzing chemical
properties and predicting their effectiveness.
• Personalized Treatment: ML algorithms recommend personalized treatment plans based on
a patient’s genetic data.
2. Finance:
• Algorithmic Trading: ML models predict stock market trends and assist in automatic trading
based on real-time data.
3. Marketing:
4. Autonomous Vehicles:
• Route Optimization: ML is used to suggest the most efficient driving route based on real-
time traffic data.
• Text Translation: ML models like Google Translate are used to translate text between
different languages.
• Chatbots: Customer service bots use ML to understand and respond to queries in natural
language.
7. Manufacturing:
• Quality Control: ML algorithms inspect products in manufacturing lines to detect defects and
ensure quality standards are met.
8. Cybersecurity:
• Threat Detection: ML models identify abnormal network behavior to detect and prevent
cyberattacks like malware and phishing.
• Spam Filtering: ML algorithms classify emails as spam or not based on content and user
behavior patterns.
9. Agriculture:
• Crop Prediction: ML models predict crop yields based on weather patterns, soil quality, and
other environmental factors.
10. Entertainment:
Question:
Answer:
k-Nearest Neighbors (k-NN) is a simple, supervised machine learning algorithm used for
both classification and regression tasks.
It is based on the idea that similar data points are close to each other in the feature space.
In k-NN, the output for a new data point is determined by looking at the "k" nearest points
(neighbors) from the training data and making predictions based on their values.
150 50 Student
160 55 Student
170 65 Worker
180 80 Worker
• Height = 165 cm
• Weight = 58 kg
• We want to predict whether this person is a Student or a Worker.
Steps:
• Calculate the distance between the new person and all existing data points.
o Student (2 neighbors)
o Worker (1 neighbor)
4. Advantages of k-NN:
Question:
Explain different learning models. Compare
supervised and unsupervised learning methods.
Answer:
1. Supervised Learning:
• Learns from labeled data (input-output pairs).
• The model tries to learn the mapping between input and output.
Examples:
Algorithms:
• Linear Regression, Decision Trees, SVM, k-NN, Random Forest, Neural Networks
2. Unsupervised Learning:
Examples:
• Customer segmentation
3. Reinforcement Learning:
• Self-driving cars
• Robotics
Algorithms:
Uses labeled data (input with known Uses unlabeled data (only inputs, no
Data
output). output labels).
Question:
Write short notes on:
(i) Support Vector Machine (SVM)
(ii) Decision Tree
Definition:
Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification
and regression tasks. It works by finding the optimal hyperplane that best separates the data points
of different classes.
Key Concepts:
• Hyperplane: A decision boundary that separates data into different classes.
• Support Vectors: Data points that are closest to the hyperplane and influence its position
and orientation.
• Margin: The distance between the hyperplane and the support vectors. SVM maximizes this
margin.
Advantages:
Example:
In a binary classification problem (e.g., spam vs. not spam), SVM finds the best line (in 2D) or plane
(in higher dimensions) that separates the two classes.
Definition:
A Decision Tree is a tree-based supervised learning algorithm used for both classification and
regression.
It splits the dataset into smaller subsets based on feature values, forming a tree structure with
decision nodes and leaf nodes.
Structure:
How It Works:
• The algorithm selects the best feature using splitting criteria like:
o Gini Index
• Splits continue until a stopping condition is met (e.g., max depth, pure leaf).
Advantages:
Example:
A decision tree could classify whether a person will buy a product based on features like age, income,
and browsing history.
Unit 5
Question 1:
Discuss the application of Data Science in Weather Forecasting. Explain the techniques used and
challenges faced.
Answer:
Introduction
Weather forecasting involves predicting atmospheric conditions based on the analysis of large-scale
meteorological data. Data Science techniques have revolutionized forecasting by leveraging machine
learning, statistical models, and big data processing.
3. Time-Series Analysis:
Challenges:
2. Model Uncertainty:
o Inherent unpredictability of weather systems.
3. Computational Resources:
4. Data Quality:
o Incomplete or noisy data can affect predictions.
Conclusion:
Data Science has enhanced the accuracy and reliability of weather forecasting. Despite challenges,
continuous advancements in algorithms and computing power are improving predictions, helping in
disaster management and planning.
Question 2:
Explain the role of Data Science in Stock Market Prediction. Discuss popular models and ethical
concerns.
Answer:
Introduction
Stock market prediction aims to forecast stock prices and trends using historical data. Data Science
provides tools to analyze large datasets and identify patterns for informed trading decisions.
1. Statistical Models:
o ARIMA (Auto-Regressive Integrated Moving Average):
▪ Models volatility.
o Reinforcement Learning:
2. Feature Engineering:
Ethical Concerns:
1. Market Manipulation:
2. Insider Trading:
3. Economic Inequality:
o Data Science tools might benefit large institutions over retail investors.
Conclusion:
Data Science has transformed stock market prediction, making it more data-driven and algorithmic.
However, ethical considerations and market risks must be carefully managed.
Question 3:
Describe the process of Real-Time Sentiment Analysis. Explain its applications and challenges.
Answer:
Introduction
Real-time sentiment analysis involves analyzing text data (e.g., social media posts) to determine the
sentiment (positive, negative, neutral) as it happens. It is widely used in marketing, finance, and
public opinion monitoring.
2. Preprocessing:
3. Feature Extraction:
4. Model Selection:
5. Real-Time Processing:
Applications:
1. Brand Monitoring:
3. Political Analysis:
Challenges:
2. Language Diversity:
3. Scalability:
Conclusion:
Real-time sentiment analysis provides actionable insights but requires robust models and
infrastructure to handle linguistic complexities and data flow.
Question 4:
Explain the concept of Object Recognition in Data Science. Discuss its techniques, applications, and
challenges.
Answer:
Introduction
Object Recognition is a computer vision task where a system identifies and classifies objects within
an image or video. It is a core technology behind many AI applications, combining techniques from
machine learning, deep learning, and image processing.
▪ Use hand-crafted features like SIFT (Scale Invariant Feature Transform), HOG
(Histogram of Oriented Gradients).
▪ R-CNN, Fast R-CNN, Faster R-CNN: Propose regions and classify them.
o Instance Segmentation:
▪ Mask R-CNN extends object detection by also generating segmentation
masks for each object.
Example Workflow:
1. Data Collection:
o Images with labeled objects (e.g., COCO dataset).
2. Data Preprocessing:
o Resizing, normalization, and augmentation (rotation, flipping).
3. Model Training:
o CNNs trained with annotated data.
Applications:
1. Autonomous Vehicles:
Challenges:
1. Variations in Objects:
o Requires fast and efficient models for applications like self-driving cars.
3. Data Requirements:
4. False Positives/Negatives:
o Misclassifications can have serious consequences in critical applications.
Conclusion:
Object recognition is a critical area in AI and Data Science with powerful applications across
industries. While deep learning has made major advancements, challenges like real-time accuracy
and robustness remain areas of active research.
Question:
Differentiate between Linear Regression and Logistic Regression.
P(y=1)=11+e−(mx+c)\text{P}(y=1) = \frac{1}{1 +
Equation y=mx+cy = mx + c
e^{-(mx + c)}}
Loss Function Used Mean Squared Error (MSE). Log Loss / Cross-Entropy Loss.
Output
Predicts actual values. Predicts probability of a class.
Interpretation
Question:
What is predictive analysis? Write the steps for weather forecasting analysis using data science.
Answer:
Predictive Analysis:
Predictive Analysis refers to the use of data, statistical algorithms, and machine learning techniques
to identify the likelihood of future outcomes based on historical data.
It focuses on forecasting what might happen under specific conditions by finding patterns and trends.
Key points:
• It is not just descriptive (what happened) but predictive (what is likely to happen).
• Commonly used in fields like marketing, finance, healthcare, and weather forecasting.
1. Data Collection:
• Gather historical weather data from sources like satellites, weather stations, and sensors.
2. Data Preprocessing:
• Normalization: Scale the features to a standard range for better model performance.
3. Feature Engineering:
4. Model Selection:
• Choose appropriate models for prediction:
• Use evaluation metrics like MAE (Mean Absolute Error), RMSE (Root Mean Squared Error).
6. Prediction:
• Use the trained model to forecast future weather conditions.
7. Visualization:
• Deploy the predictive model into real-world systems (e.g., weather apps, news platforms).
• Continuously monitor and update the model as new data becomes available.
Conclusion:
Predictive analysis enables accurate and timely weather forecasting, which is vital for agriculture,
disaster management, and daily life planning. Using data science methods ensures more reliable
predictions by leveraging vast amounts of historical and real-time data.
Question:
How can stock market analysis be performed using machine learning? Explain with an example.
Answer:
Introduction:
Stock market analysis using Machine Learning (ML) involves applying algorithms to historical and
real-time stock data to predict future stock prices or trends.
Machine learning models can find hidden patterns, correlations, and trends that human analysis
might miss.
1. Data Collection:
o Volume traded.
o Macroeconomic indicators (e.g., interest rates, GDP).
2. Data Preprocessing:
• Feature Engineering:
o Create technical indicators (e.g., Moving Average, RSI).
3. Model Selection:
o Time-Series Models:
▪ LSTM (Long Short-Term Memory networks) for sequential stock data.
• Split the dataset into training and testing sets (e.g., 80% training, 20% testing).
5. Prediction:
Example:
Suppose we want to predict if the stock price of Company X will rise tomorrow.
• Data Used:
o Last 5 years' daily stock prices (open, high, low, close, volume).
• Steps:
o Prepare features and target variable (1 if tomorrow's closing > today's, else 0).
Conclusion:
Machine Learning enables efficient, data-driven stock market analysis by predicting price trends or
movement directions. However, stock markets are highly volatile, so even the best models should be
used cautiously along with financial knowledge and risk management strategies.
Question:
Define case study of data science application.
Answer:
Definition:
A case study of data science application refers to a detailed examination of how data science
techniques are applied to solve real-world problems in different domains like healthcare, finance,
weather forecasting, retail, and more.
It shows the end-to-end process — from collecting and processing data, applying machine learning
or statistical models, to making decisions or predictions based on insights.
1. Problem Definition:
2. Data Collection:
o Gathering relevant and sufficient data from different sources.
3. Data Preprocessing:
o Cleaning and preparing data (handling missing values, normalization).
4. Model Building:
5. Evaluation:
o Using the model’s results to make decisions or take action in real-world systems.
• Weather Forecasting:
Using historical weather data and machine learning models like LSTM to predict temperature
and rainfall.
Conclusion:
Case studies help demonstrate the practical use of data science methods and show how they add
value by solving complex real-world problems. They are important for learning and improving the
application of data-driven technologies across industries.
Question:
Illustrate Stock Market Prediction.
Answer:
Introduction:
Stock Market Prediction is the process of forecasting future stock prices or stock market trends using
historical data and various statistical, machine learning, or deep learning techniques.
The goal is to help investors make informed decisions about buying, selling, or holding stocks.
1. Problem Definition:
• Objective: Predict whether the price of a stock (e.g., TCS stock) will rise or fall tomorrow.
2. Data Collection:
o Volume traded.
3. Data Preprocessing:
• Handle missing values.
• Normalize the price and volume data.
4. Feature Selection:
o Moving averages.
o Trading volume.
5. Model Building:
Example Illustration:
Suppose:
• You collected 5 years of daily closing prices for Reliance Industries.
• Created a feature: 10-day moving average.
• After training, your model predicts with 80% accuracy whether the stock will rise or fall
tomorrow.
• Based on the model's output, you decide whether to buy, sell, or hold the stock.
Conclusion:
Stock market prediction using data science methods like machine learning helps in making better
investment decisions.
However, because the market is influenced by unpredictable external factors (like political events),
no prediction model can be 100% accurate.
Thus, stock prediction models should be used carefully along with expert advice.