Industrial Internship: Monday
Industrial Internship: Monday
Summarize your thoughts regarding your internship this week. Include duties you have performed,
facts, and procedures you have learned, skills you have mastered, and observations you have made.
Monday:
Feature Engineering:-
One-Hot Encoding using get_dummies. Advantages: Straightforward to implement, Does not
require hrs of variable exploration, Does not expand massively the feature space number of
columns in the dataset. Disadvantage: Does not add any information that may make the variable
more predictive, does not keep the information of the ignored labels.
Different types of feature engineering techniques: Imputing, Handling Outliers, Binning, Log
transform, one-hot encoding, grouping operations, scaling.
Tuesday:
Krish Naik’s Explanation On:-
Thursday:
Completed task assigned by mentor Pranshu Sharma:
Exploratory Data Analysis (EDA) on Vehicle Price Prediction (CarPriceDekho)
Steps are given below:
1) Importing the dependencies,
2) Reading the dataset
3) Checking the shape of dataset
4) Checking for numerical and categorical values
5) Checking for unique values in Seller Type, Fuel Type, Transmission, Owner Column
6) Checking for Null Values
7) Insights of the data
8) The columns in the data frame
9) Creating a new feature by eliminating year column
10) Encoding: One-hot encoding
11) Checking for correlation among dependent and independent values
12) Plotting a Heatmap of correlation matrix
13) Splitting the data into dependent and independent variables
14) Importing the dependencies
15) Getting the important features
16) Plotting a graph of important features
17) Splitting the data into train and test data
18) Importing the Random Forest Regressor
19) Hyper Parameter Tuning
20) Building the model
21) Checking the performance
Friday:
5 components of a data stack:-
1. Collection: Collect the data you need to understand how your product or service is being
used.
2. Integration: Moving all of your data into the data warehouse in a timely and reliable manner.
3. Data Warehouse: All the data you collect should be stored in a data warehouse to ensure
you can effectively use it.
4. Transformation: Raw data is not always suitable for analytics. We need a way to clean it up
before doing analysis. i. e. removing duplicates and incorrect data.
5. Visualization: Representing the data visually to help us understand it and spot patterns.
MixPanel Example.
Initial Traction:
Constraints:
More questions than you have time to answer
All-in-one tools can’t support the custom analysis you need
Typical questions you want to answer with data:
How does shipping time impact likelihood of a second order?
Does website loading time vary by user? Does it affect activation rate?
What activities in a first visit increase the likelihood of retention?
Data Collection
All-in-one products handle data collection, but now you need that data in a data warehouse, so it’s
time to move to a dedicated data collection solution. Segment is the most popular choice, allowing
you to relatively easily track events in your product using their open-source APIs.
Data Integration
By this stage you likely have many sources of data: event data, marketing data, customer support
data, financial data etc. Data integration tools help you move these data sources into your data
Warehouse
Data Warehouse
Data warehouses are optimised for performing analytics on large amounts of data. Your data team
can use SQL to do custom analytics, combining all of your data sources to spot patterns and trends.
Visualization /BI
A BI tool will help you visualize that data you’re collecting in the data warehouse, which they’ll
directly connect to. There is a small selection - there is a lot of choice in this area!
By this stage your data team will be spending a lot of time transforming your various data sources
Optimising
to an analytics-ready format. BI tools provide some functionality in this area, but it probably makes
sense to start using a dedicated “data modelling” platform.
Constraints:
●Low-hanging-fruit
Product insights are gone, more complex analytics is required
Price Notes
●Your data
Dataform team is growing, they need
Freeto be able to collaborate effectively
BigQuery users only,
Typical questions you want to answer with data: transitioning to Google Cloud
●How long after someone is on our From
dbt Cloud site should we show them a customer support chat?
$50 /month/user
●If someone wants to cancel their subscription, should we offer them a discount? How much?
●What is the most efficient way to pick parcels in the warehouse to reduce shipping times?
Example data stack progression
By this stage the data stack is almost complete - you have collection, integration, BI and a
Pre-product: N/A
warehouse. There are two changes you might want to make to help your data team to be successful:
● Switch to a more developer-friendly BI tool like Looker
Early usage: Amplitude
● Start using a dedicated transformation platform
Initial traction: Segment + Fivetran + BigQuery + DataStudio (+Amplitude)
Instructions: After the completed report has been signed by both the student and Head-
coordinator, the head-coordinator shall scan the form to a pdf format and email it to the Director-
1 (bpmishra435@gmail.com) of the company. Specific problems, concerns or suggestions from
either the student/head-coordinator should be emailed separately to the C. E. O. (info@cureya.in)
of the company.