0% found this document useful (0 votes)
17 views7 pages

Capstone Project

Cghjkkkskskksjdhxbhxbbxhxbxbbxhdhdhhdhgdbdbgdhxbbdbdbdbbdhdhdsggdgdgdggdgggdgxgdhhsuhshhsgsgggggg ggggh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
17 views7 pages

Capstone Project

Cghjkkkskskksjdhxbhxbbxhxbxbbxhdhdhhdhgdbdbgdhxbbdbdbdbbdhdhdsggdgdgdggdgggdgxgdhhsuhshhsgsgggggg ggggh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 7
© At a Glance mmprehensive, independe! Is, knowledge, and exper , sa basic analysis and evaluation of th, Projeey nt, and final project undertaken as 4 Nike © Acapstone project is @ ee tise a student has acquired. Of ae 1d to assess the skil desane m-defining process require rethods. provides @ solution-based approach to solving problems © A successful proble problems, their reasons, and © Design Thinking methodology + During coding, we follow problem well. © Once the business problem is clearly st the problem. + The analytical approach chosen characterises the requirem tie «During the initial data collection phase, data scientists identify available data sources structured uns, ‘and semi-structured) relevant to the problem area. «The modelling stage, that begins with the initial version of the prepared data set, focuses on co. predictive or descriptive models based on the previously-stated analytic approach, ‘+ The data scientist reviews the model during development and before deployment to determine | land ensures that it correctly and completely answers the business problem. decomposition methodology that can be applied to rea i¢. ated, the data scientist can define an analytical appr... irements for the data. + The train test procedure measures the performance of machine learning algorithms when they n predictions on data that were not used to train the model. ‘+ The training dataset is used to fine-tune the machine learning model and train the algorithm Test dataset algorithms make predictions using the input elements from the training data * Cross-validation is a resampling technique for evaluating mact data. 1e learning models on a small sam A loss function determines how well a certain algorithm models the data. Loss function leams to lower prediction error over time with the help of some “optimissticn/cb function” Loss functions can be divided into two categories: regression losses and classification losses. * MSE is sensitive to outliers. The Root Mean Square Error (RMSE) is a metric for determining how well a regression line fits the data po" Hyperparameters are parameters whose values govern the learning process. ——__—— Fyercise ee o Solved Questions Ai Quiz SECTION A (Objective Type Questions) A. Tick (V) the correct option, 1. Which ofthe flowin ng is nota part of Des 19 Thinking Py a. Empathise ming Process? © Prototype Ob. sympathise QO 4. vefine | Touchpad Artiticial ineligence (Wer. 2.0)xi1 10, —— ch atopic neperdertiy 10 fh 9 00 OEE isa project where students must re a jhe subject matter 3, Almodel Ct culminating reset ¢_ senior report C4. capstone ware Error (MSE) is the most commonly used regression ls function MIE ste 22" Ae Mean $a 4 variable and predicted values. Identify one feature of Mi petween our target tis sensitive to outliers a b « dg. Itshoul ris used on data, conditioned on the output variables tris good to use if the target data is normally distributed sraund a median v3i4e Id be compared with Mean Absolute Error, where the optimal prediction i te #2" tn problem decomposition: «Understand the problem and then restate the prablem in your own wards Gather all simple facts to create a complicated piece Break larger units into simpler ones Code one small unit at a time Which of the following is true? a Gand (ii) OC». OG and (ov c. (il)and (iv) CQ & @.6i, Giiyand ta For Al techniques to be applied to a dataset. the data must have & ‘a. association CO». ‘relationship pattern CO @. Eitheraorb ‘An optimum AI model should have @ value less than 160, a. Mean Square Error Od. Mean Absolute Error . Quantile Loss CO) Root Mean Square Error Determining whether a credit card transaction is fraudulent or not ih ves technique a, Decision Trees CO» Classification Regression Od Custering The train test split is a technique for evaluating the performance of a machine learning aigortnen Which machine learning algorithms can it be used for? (0) Regression (i) Clustering (i Classification (iv) Deep Learning a Only Ob. oniyai) Both (i) and (ii) 4. Both Gi) and (iv) Which of the following is NOT a Classification Loss function? & Log Loss Od. Hinge Loss Exponential loss OQ, 4 Mean Absolute Error After the inital data collection, techniques such as descriptive statistics and canbe applied to datasets to evaluate the content and quality of the data. 2 visualistions Sb analysis © Validation C6 machine earring Capstone Project 139 LM pate CERIE jation betwee! A researcher wants to study the assoc 1» gender and usage of mobile un il be phone. Data collected for this study w Cb Quantitative data 2 Qualitative data CO 4 Classified data Continuous data a Gathering P 12, Primary way to collect DATA (Data Ob. Survey a. Experiment © Observation oO 4 © iewes for predictive modelling. 13. The data scientist will use Ob Machine Learning a. Artificial Inteligence Ca. Deep teaming & Training Set Classification loss? 14. Which one does NOT belong to ©) ‘eanAtette tno Log loss oa oO d. Hinge Loss Exponential Loss 15. Which process does NOT come under Capstone Project? ow ct Cycle 3. AIModel QO & APreject cya © Deployment © 4@. Data Gathering 16. Which one does NOT belong to Regression loss? & Log Loss © db. Mean Absolute Error © Log cosh Loss O a. Quantile Loss 27. Adding a non-important feature to a linear regression model may result in i. Increase in R-square ji. Decrease in R-square Only Lis correct © _b. only2iscorrect Either 1 or2 QO 4. Neithera nor 2 38. Which ofthe following options is/are true for K-fold cross-validation? {CBSE Sample Pare ‘Increase in K wil result in higher time required to cross validate the result '\ Higher values ofK will result in higher confidence on the Cr0s5-validation result as compared 10 lower value of K Wi, IEK=N, then itis called Leave one out “ross-validation, where N is the number of observations me tea Ob 2ang3 © Land nee O 4 r2anaa Fill in the blanks, 1 The technique i ; 0 ‘ve sed for evaluating an Al model and splits the dataset into two ses Al model is used to f .. © forecast trends for a product ‘ ote whether the following statements are true or falze yperparameters are internal to an Al model rere ino such thing as an ideal split percentage ross-validation is used for evaluating machine learning models on a lar 1¢ sample of d: Every project starts with 2 business understanding oem The data collection stage, that begins with the initial version ofthe prepared data set focuses on constructing predictive or descriptive models. Regression predicts a real number as the last component of the result. The decision tree categorises the dataset into groups based on several criteria. Match the Following. MSE Test dataset Real life problems Problem Decomposition ve sensitive to outliers brainstorming quantity Regression functions predict e. Evaluation stage Ideate in DT SECTION B (Subjective Type Questions) short answer type questions. ans. Ans, What is a loss function? Name any two Regression Loss functions. A loss function is used by machines to learn. It's a way of determining how well a certain algorithm models the data. If the forecasts are too far off from the actual findings, the loss function will return a very large number. Loss function learns to lower prediction error over time with the help of some “optimisation/objective function” Regression Loss functions are RMSE and MSE. Can MSE be a negative value? Why/Why not? Give the equation to calculate MSE MSE cannot be a negative value. The difference between the predicted and actual values can be negative. However, these differences are squared. Hence, all results are either positive or zero. Ly,-¥P MSE = 7 What is meant by the iterative nature of the problem-solving methodology? ‘As data scientists have a better understanding of the data and models, they typically return to a prior stage to make changes, Models arer't built once, deployed, and forgotten about; instead, they're constantly refined and adapted to changing situations through feedback, refinement, and redeployment. As a result, both the model and labour that goes into it can continue to add value to the business for as long as the solution is required. Hence the problem-solving methodology is iterative in nature “The lower the value of MSE, the better is the model”. Do you agree? Why/Why not? of the sum of squares for all data points. MSE is used to see MSE is calcul ression line as the average calculated for'a reg fe, The lower the MSE, the closer the forecast is to the actual how close an estimate or forecast is to an actual valu So. smaller values indicate a better model. What is meant by feedback of model effectiveness? the modets accuracy and utility by analysing it. They can automate any Or al Fine feedback-gathering, model sessment, refining, and redeployment phases to speed uP the model eles P “meres Capstone Project | 141 —. a 77 Ans, Ans, Ans. sition? ‘do you mean by time series decomp What do y damental step in time series analysis because it helps in, data and can aid in making forecasts or predictions. it ¢qy As Time series decomposition is a fun . Sider. components. ders different contributing factors in the combination of level, trend, seasonality and noise Name the components of the time series decomposition. The components of the time series decomposition are as follows: + Level: The average of the series. + Trend: Any increasing or decreasing value in the series. + Seasonality: Any repeating short-term cycle in the series. + Noise: Any random variation in the series. What is K-NN algorithm and how does it determine the category for a new instance? KcNearest Neighbour algorithm is one ofthe simplest Supervised Learning-based Machine Learning The K-NN algorithm assumes similarity between the new case and the existing cases and assigns the iowa’ to the category that matches the existing cases the most closely. Tis algorithm can be applied to both vin and regression. ty B. Long answer type questions. 1 Ans Ans. Ans. Explain the cross-validation procedure. Cross-validation is a resampling technique for evaluating machine learning models on a small sampie The process includes only one parameter, k, that specifies the number of groups into which a given data should be divided. It's a popular strategy since it's straightforward to grasp and produces a less biased OF opt estimate of model competence than other approaches, such as a simple train/test split. The followings the procedure i. Randomly shuffle the dataset. ji, Organise the data into k groups. side spe tse tes ili, For each distinct group: © Asa test data set, take a group * For training data set, use the remaining groupings. * Fit the model to the training set and evaluate it against the test set. * Keep the evaluation score but toss out the model iv. Using the sample of model evaluation scores, summarise the mode's ability \Whatis Train Test Split Evaluation? State the reasons for choosing this technique The ‘rain test procedure measures the performance of machine learning algorithms when they need 19%! Predictions on data that were not used to train the model. The technique divides the provided dataset"? Subsets: the training dataset and test dataset. The reasons for choosing this technique are + Large dataset © Toestim: ? © estimate the machine learning modet's performance on new data that was not used to train the mo + Better computational efficiency * Aquick overview of model performance Explain the 3 stages of data preparation, ata collection may be required to fil the gap 3. 142 | Touchpad Anificial Intelligence (Ver. 2.0)-xXi1 ans i all modeling stage. Activities to prepare data include the activities to build the dataset used in the subsequent «data cleansing (handling missing or invalid val Walid valu : 1s, removin «joining data from multiple sources (ies, tables, plat i ee aleaereine onvaoaiee tables, platforms), and «the conversion of data to more useful variabl variables. pata preparation is usually the most time. oe eadetisineten ine ‘consuming procedure in a data science project. Automating certain data Peed up the process by minimising ad hoc preparation time. sre hy ‘What are hyperparameters? What is their purpose? Give examples of few hyperparameters. Hyperparameters are paramete vain the eonlogr area) = ee pats Govern the learning process. They are ‘top-level’ parameters that el parameters that come from i, as the prefix ‘hyper suggests. Since the model cannot modify its values durin learnit i (rot eraoples of hyparpereratone os wa ua. perpaemetns aed tobe eternal tthe model. + The ratio of train-test split ‘+ Optimisation al : ptimisation algorithms’ learning rate (eg, gradient descent) +The loss function that the model will employ + Aneural network's number of hidden layers ‘+ Acclustering task's number of clusters Explain the purpose of evaluation and deployment stage. Evaluation Stage The data scientist ‘+ utilises a testing set for predictive models (that is separate from the training set but follows the same probability distribution and has a known outcome) The testing set is used to assess the model and adjust it as necessary. «Fora final assessment, the final model is sometimes applied to a validation set as well. In addition, data scientists can use statistical significance tests to verify the model's accuracy. Deployment Stage The model is deployed into the production environment of an equivalent test environment once i has been built «. It is usually used in a restricted capacity until its effectiveness has been and authorised by the business sponsor live business process frequently necessitates the involvement of thoroughly assessed. Deploying a model into a ‘additional internal teams, skills, and technology’ €.[Competency-based/Application-based questions 1. Consider the following statements containing an assertion Select the appropriate option f é b « d Ans, Ans, ‘and a reason for the statements given above: Both A and Rare true and Ris the correct explanation ofA oth A and R are true and Ris not the correct explanation ofA Ais tue but Ris false Ais False but R is true Assertion (A): At the core of © Reason (R): Finding the right patter" is usually anit b Assertion (A): Consider that the goal Reason (R): In such a Case. predictive very AI madelis finding paterns data erative process. 4 an Al model is to predict an answer such as "yes" oF "0°. o modeling can be used Wnt ti Assertion (A): Nowadays, predictive models are notable Yo better predict rare events. failure = Reason (R): Today's high-performance database analytics enable data scientist, contain large or even all of the available data Ans. d 10 Utne 15, 5 \w_ Assertion (A): We follow problem decomposition methodology that can be applied Reason (R): Real-life problem solving is complicated Ans a Orealitg Assertion (A): The data scientist reviews the model during development and before its quality and ensures that it correctly and completely answers the business problem, “8p Reason (R): Data scientist uses statistical significance tess to very the model secre Ans b Ni Assertion (A): After the intial data collection, techniques such as descriptive state ang opplied to datasets Reason (R): This step is carried out to evaluate the content, quality, and initial ins Ans. a Vil__ Assertion (A): Loss function learns to increase prediction error over time with the objective function” (GHNS Of the dang he o ome Reason (R): A loss function is used by machines to learn Ans. d Will, Assertion (A): Real-world data can be strange and deceptive. Reason (R): Data can be gathered from a variety of sources, including Government websites Ans. b Assertion (A): The Root Mean Square Error (RMSE) is a metric for determining how wel a regression data points. Reason (R): RMSE places a higher weight on the errors due to the squaring element of the functan Ans ¢ % Assertion (A): A split technique divides the provided dataset into two subsets ~ training and Reason (R): As a result, the process is frequently referred to as k-fold cross-validation Ans ¢ test su 2. Which of these is NOT analytic based on the type of question? [CBSE Samp a a Descriptive © _ . Statistical Analysis © Forecasting O 4. bata evaluation Ans. d 3. Which of the following statements is/are INCORRECT: 1) Different transforms of the data used to train the same machine learning model. 11) Different machine learning models cannot be trained on the same data, ii) Different configurations for a machine learning model trained on the same data a i ( Both ii) & iil ( Ans. b 4. If the problem is based on probabilities of an can be used? a Predictive Mode! c © Diagnostic Oa descriptive ) b iy ) — d. Both i) & ii) action, then which analytic approach ) —b. Prescriptive a 144 | Touchpad Artificial Intelligence (Ver. 2.0)-xil ww

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy