ML - ML in Nutshell
ML - ML in Nutshell
│ ├── Bagging
│ └── Cross-Validation
│ └── Regularization
│ ├── Ridge
│ ├── Lasso
A. Linear Regression
• Pros:
o Simple and easy to interpret
o Fast and computationally inexpensive
• Cons:
o Assumes linearity
o Sensitive to outliers and multicollinearity
B. Logistic Regression
• Pros:
o Efficient and interpretable
o Works well with linearly separable data
• Cons:
o Poor performance with non-linear data
C. Decision Trees
• Pros:
o Easy to understand and visualize
o Non-linear relationships can be captured
• Cons:
o Prone to overfitting
• Pros:
o Effective in high-dimensional spaces
o Can handle non-linear data using kernel trick
• Cons:
o Choosing the right kernel is challenging
o Requires feature scaling
• Pros:
o Simple and effective
o No training phase
• Cons:
o Slow for large datasets
o Sensitive to irrelevant features and feature scaling
F. Naïve Bayes
• Pros:
o Works well with high-dimensional data
o Fast and requires less training data
• Cons:
o Assumes feature independence, which rarely holds
• Pros:
o Effective in finding linear combinations for class separation
• Cons:
o Assumes Gaussian distribution and equal covariance
H. Ensemble Methods
i. Random Forest
• Pros:
o Reduces overfitting compared to individual trees
o Handles large datasets well
• Cons:
o Can be slow and memory-intensive
ii. Bagging
• Pros:
o Reduces variance
• Cons:
o Less interpretable
• Pros:
o High predictive power
o Handles missing data and categorical features (LightGBM)
• Cons:
o Prone to overfitting if not tuned properly
2. Unsupervised Learning
Unsupervised learning finds patterns from unlabeled data.
A. K-Means Clustering
• Pros:
o Simple and scalable
o Efficient for large datasets
• Cons:
o Requires predefined number of clusters
o Sensitive to outliers and scaling
B. Hierarchical Clustering
• Pros:
o Dendrogram provides visual intuition
o No need to predefine number of clusters
• Cons:
o Computationally intensive
• Pros:
o Helps in visualization and removing multicollinearity
• Cons:
o May lose interpretability
3. Recommender Systems
A. Collaborative Filtering
• Pros:
o Personalized recommendations
• Cons:
o Suffers from cold start and sparsity
B. Matrix Factorization
• Pros:
o Effective for large, sparse matrices
• Cons:
o Requires tuning and matrix decomposition
C. Implicit vs. Explicit Feedback
B. Cross-Validation
• Pros:
o More reliable model evaluation
• Cons:
o Slower training process
• Pros:
o General optimization algorithm
• Cons:
o May converge to local minima
6. General Concepts
• Supervised vs. Unsupervised Learning – Labeled vs. pattern discovery
• Bias-Variance Tradeoff – Underfitting vs. overfitting
• Occam’s Razor – Prefer simpler models
• Data Filtering – Preprocessing step for cleaning data