Fam 2023 Winter Micro
Fam 2023 Winter Micro
2
### (c) History and Evolution of ML
1. **1940s-50s**: Foundations (McCulloch-Pitts neuron, Turing's learning machines)
2. **1950s-60s**: Early algorithms (perceptron, nearest neighbor)
3. **1960s-70s**: AI winter due to limitations
4. **1980s**: Revival with backpropagation, decision trees
5. **1990s**: Practical applications (SVM, boosting, web applications)
6. **2000s**: Big data era (ensemble methods, deep learning emerges)
7. **2010s**: Deep learning dominance (AlexNet, transformers)
8. **2020s**: Large language models, multimodal learning
Key milestones: Turing Test (1950), Perceptron (1957), Backpropagation (1986), ImageNet (2012), GPT
(2018+)
Where:
- P(A|B) = Posterior probability (of A given B)
- P(B|A) = Likelihood (of B given A)
- P(A) = Prior probability of A
- P(B) = Marginal probability of B
3
### (b) Data Analytics vs. Data Science
| Aspect | Data Analytics | Data Science |
|--------|----------------|--------------|
| Focus | Analyzing data to answer specific questions | Extracting insights from data using scientific methods |
| Scope | Narrower, focused on business metrics | Broader, includes analytics plus more |
| Techniques | Descriptive statistics, visualization | Machine learning, predictive modeling |
| Goal | Find patterns, support decision-making | Build data products, predictive models |
| Tools | Excel, Tableau, SQL | Python/R, TensorFlow, big data tools |
| Output | Reports, dashboards | Models, algorithms, systems |
| Skills | Business intelligence, SQL | Programming, statistics, ML |
| Data Size | Typically smaller datasets | Often big data |
| Question Types | What happened? Why? | What will happen? How can we make it happen? |
Key points:
- Requires predefined K (number of clusters)
- Uses Euclidean distance metric
- Sensitive to initial centroids (solution: k-means++)
- Convergence guaranteed but may be to local optimum
- Applications: Customer segmentation, image compression, anomaly detection
4
## 4. Attempt any THREE of the following:
Types of architectures:
- **Reactive**: Direct percept→action mapping
- **Deliberative**: Maintains internal state and plans
- **Hybrid**: Combines reactive and deliberative approaches
5
### (d) Machine Learning Life Cycle
1. **Problem Definition**: Understand business problem, define objectives
2. **Data Collection**: Gather relevant data from various sources
3. **Data Preparation**:
- Cleaning (missing values, outliers)
- Transformation (normalization, encoding)
- Feature engineering
4. **Exploratory Data Analysis (EDA)**: Understand data through statistics and visualization
5. **Model Selection**: Choose appropriate algorithms based on problem type
6. **Model Training**: Fit models to training data
7. **Model Evaluation**: Assess performance using metrics and validation techniques
8. **Model Tuning**: Optimize hyperparameters for better performance
9. **Model Deployment**: Integrate model into production environment
10. **Monitoring & Maintenance**: Track performance, retrain as needed
11. **Feedback Loop**: Incorporate new data and insights
**Best-First Search**:
- Expands most promising node based on evaluation function f(n)
- Uses priority queue ordered by f(n)
- Greedy best-first uses f(n) = h(n) (heuristic only)
- A* uses f(n) = g(n) + h(n) (path cost + heuristic)
6
### (b) Knowledge-Based Agent Architecture
**Architecture**:
1. **Knowledge Base (KB)**: Repository of facts and rules
2. **Inference Engine**: Applies logical rules to KB to deduce new information
3. **Perception**: Translates percepts into KB updates
4. **Action Selection**: Decides actions based on KB state
**Techniques**:
1. **Knowledge Representation**:
- Propositional logic
- First-order logic
- Semantic networks
- Frames
- Ontologies
2. **Reasoning Methods**:
- Forward chaining (data-driven)
- Backward chaining (goal-driven)
- Resolution
- Unification
3. **Learning**:
- Knowledge acquisition
- Rule induction
- Explanation-based learning
**Implementation Steps**:
1. **Data Preparation**:
- Handle missing values
- Normalize/standardize features
- Split into training/test sets
2. **Model Training**:
- Normal Equation: β = (XᵀX)⁻¹Xᵀy (for small datasets)
- Gradient Descent (for large datasets):
Initialize β's randomly
Repeat until convergence:
Compute predictions ŷ = Xβ
Compute error e = ŷ - y
Update βⱼ = βⱼ - α(1/m)Σ(e·xⱼ) for all j
3. **Evaluation**:
- R² score (coefficient of determination)
7
- Adjusted R²
- MSE, RMSE
# Prepare data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Evaluate
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print(f"MSE: {mse}")
print(f"Coefficients: {model.coef_}")
print(f"Intercept: {model.intercept_}")
```
Comparison:
- MSE/RMSE penalize large errors more heavily
- MAE is more interpretable but less sensitive to bad predictions
- RMSE is preferred when large errors are particularly undesirable
8
### (b) Forms of Data
1. **Structured Data**:
- Tabular data (rows and columns)
- Relational databases
- Spreadsheets
- Example: Customer transaction records
2. **Unstructured Data**:
- No predefined format
- Text documents
- Images, videos
- Audio recordings
- Example: Social media posts
3. **Semi-Structured Data**:
- Not fully relational but has some structure
- JSON, XML files
- Email (structured headers, unstructured body)
- Example: Web logs
4. **Temporal Data**:
- Time-series data
- Timestamped events
- Example: Stock prices over time
5. **Spatial Data**:
- Geographic information
- Maps, GPS coordinates
- Example: Delivery routes
6. **Graph Data**:
- Nodes and edges
- Social networks
- Knowledge graphs
- Example: Facebook friend connections
9
### (c) Beyond Classical Search
Techniques that extend or go beyond traditional search algorithms:
3. **Adversarial Search**:
- Game playing with opponents
- Minimax algorithm with alpha-beta pruning
- Examples: Chess, checkers AI
4. **Online Search**:
- For environments where states are discovered during execution
- Examples: Robot exploration, real-time systems
5. **Nondeterministic Search**:
- For environments with uncertainty
- Markov decision processes (MDPs)
- Reinforcement learning approaches
6. **Metaheuristics**:
- High-level strategies for guiding search
- Examples: Tabu search, ant colony optimization, particle swarm optimization
7. **Evolutionary Algorithms**:
- Inspired by biological evolution
- Maintain population of candidate solutions
- Apply selection, crossover, mutation operators
10