IDA - Sample Questions FA1
IDA - Sample Questions FA1
1. Which of the following is a popular Python library used for data manipulation?
a) NumPy
b) Pandas
c) Matplotlib
d) Seaborn
2. In Exploratory Data Analysis (EDA), which technique is commonly used to
visualize the distribution of a single variable?
a) Scatter plot
b) Histogram
c) Box plot
d) Heatmap
3. Which of the following is an example of a supervised learning algorithm?
a) K-means clustering
b) Linear Regression
c) Principal Component Analysis (PCA)
d) DBSCAN
4. Which Python function is used to load a CSV file into a Pandas DataFrame?
a) read_file()
b) load_csv()
c) read_csv()
d) import_csv()
5. Which of the following metrics is used to evaluate the performance of a
classification model?
a) Mean Squared Error (MSE)
b) Accuracy
c) R-squared
d) Adjusted R-squared
1. You have been given a dataset data.csv containing information about customer
purchases. Write a Python script using Pandas to perform the following tasks:
a) Load the dataset into a DataFrame.
b) Display the first 5 rows of the DataFrame.
c) Handle any missing values by filling them with the mean of the respective
columns.
2. Given a DataFrame df with a categorical column Category and a numerical
column Sales, write a Python script to create a bar plot showing the total sales
for each category.
3. Using the Python library Scikit-learn, write a script to split the dataset df into
training and testing sets with 80% of the data for training and 20% for testing.
4. Explain the working principle of a Decision Tree algorithm. How does a Decision
Tree make decisions at each node, and what are the criteria used to split the
data?
5. Evaluate the performance of a Decision Tree model by explaining how the
Accuracy score and Confusion Matrix can be used to assess its effectiveness.