AAM 1st Unit QB
AAM 1st Unit QB
List types
Supervised learning is a type of machine learning where the model is trained on a labelled dataset,
meaning that each input data point is associated with a corresponding target label. The goal is to
learn a mapping from inputs to outputs, based on the labelled examples provided during training, so
that the model can make predictions on unseen data.
1. Linear Regression
2. Logistic Regression
3. Decision Trees
4. Random Forest
7. Naive Bayes
1. Feature Selection
2. Feature Transformation
3. Feature Creation
6. Feature Scaling
7. Feature Extraction
5. List out feature engineering techniques for text data.
Feature engineering techniques for text data:
1. Tokenization
2. Stopwords Removal
3. Stemming
4. Lemmatization
7. N-grams
8. Word Embeddings
9. Topic Modeling
1. **Data Collection**: Gather the data needed to train the model. This data should include both
input features and corresponding target labels.
2. **Data Preprocessing**:
- **Cleaning**: Handle missing values, outliers, or errors in the dataset.
- **Feature Engineering**: Transform, select, or create features to represent the data effectively.
- **Normalization/Scaling**: Scale the features to ensure they are on a similar scale, which can
improve the performance of some algorithms.
- **Encoding**: Encode categorical variables into numerical format if necessary.
3. **Splitting the Data**: Divide the dataset into two or three subsets:
- **Training Set**: The portion of data used to train the model.
- **Validation Set**: (Optional) Used to tune hyperparameters and evaluate model performance
during training.
- **Test Set**: (Optional) Used to evaluate the final model's performance after training.
6. **Model Evaluation**:
- **Validation**: Evaluate the model's performance on the validation set. Adjust hyperparameters
if necessary to improve performance.
- **Test**: Evaluate the final trained model on the test set to assess its performance on unseen
data and ensure generalization.
1. **Understanding the Data**: Begin by understanding the nature of the data and the problem at
hand. This involves analyzing the data's structure, distributions, relationships between variables, and
the specific requirements of the machine learning task.
2. **Feature Selection**:
- Identify relevant features that are likely to have predictive power for the target variable.
- Remove irrelevant features that do not contribute to the predictive performance or may introduce
noise into the model.
3. **Feature Transformation**:
- Scale or normalize numerical features to ensure they have similar ranges and distributions.
Common techniques include min-max scaling or z-score normalization.
- Transform skewed distributions using techniques like logarithmic or Box-Cox transformations to
make them more symmetrical.
7. **Feature Scaling**:
- Scale the features to ensure that they have similar magnitudes and do not dominate the model
training process. This is particularly important for algorithms sensitive to feature scales, such as
gradient descent-based optimization algorithms.
By following these steps, feature extraction can effectively transform raw data into informative
features that enable machine learning models to learn patterns and make accurate predictions on
new, unseen data.