0% found this document useful (0 votes)
8 views4 pages

Data set selection and evaluation_FINAL

The document outlines the schema and structure of a dataset related to startups, detailing various data types and their meanings. It also discusses exploratory analysis, identifying trends in startup funding, geographical attractiveness for investments, and market preferences. Additionally, it highlights data quality issues and proposes a plan for data cleansing in preparation for visualization using Tableau.

Uploaded by

tohongnhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

Data set selection and evaluation_FINAL

The document outlines the schema and structure of a dataset related to startups, detailing various data types and their meanings. It also discusses exploratory analysis, identifying trends in startup funding, geographical attractiveness for investments, and market preferences. Additionally, it highlights data quality issues and proposes a plan for data cleansing in preparation for visualization using Tableau.

Uploaded by

tohongnhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Data set selection and evaluation

To Hong Nhan / Huynh Hai Ngan / Bui Hong Anh


Dataset
A. Schema (structure) of the data set
No. Name Data types Meaning/ Purpose
1 permalink String The domain name of the organizations’
website
2 name String Name of the organization
3 homepage_url String An URL of the organization’s website
4 category_list String List of market category that the startup/
organization working in or belongs to
5 market String The main market that the startup/
organization working in
6 funding_total_usd Number (whole) The amount of funding that the startup/
organization has been received in USD
7 status String The operating status of the startup/
organization
8 country_code String/ The country code where the startup/
Geographic (Area organization located at establishment
Code)
9 state_code String/ The state code where the startup/
Geographic (Area organization located at establishment
Code)
10 region String/ The region (part of the state) where the
Geographic startup/ organization located at
(Country/Region) establishment
11 city String/ The city where the startup/ organization
Geographic (City) located at establishment
12 funding_rounds Number (whole) The number of funding rounds that the
startup/ organization participated in
13 founded_at Date Specific date of establishment
14 founded_month String Specific month of the year that the startup/
organization established
15 founded_quarter String Specific quarter and year that the startup/
organization established
16 founded_year Number (whole) Year of establishment
17 first_funding_at Date The first time the startup/ organization
received funding
18 last_funding_at Date The last time the startup/ organization
received funding
19 seed Number (whole) The amount of funding that the startup/
organization received in the initial stage
where they are in its early development
phase and requires funding to get started
20 venture Number (whole) The amount of funding that the startup/
organization received in the stage where a
startup has begun to establish itself and is
looking for funding to expand its operations
21 equity_crowdfunding Number (whole) The amount of funding that the startup/
organization received through a process of
raising funds from many investors, usually
through an online platform, by offering them
equity in the company.
22 undisclosed Number (whole) The amount of funding that is not publicly
disclosed
23 convertible_note Number (whole) The amount of financing debt that can be
converted into equity at a later stage of a
startup
24 debt_financing Number (whole) The amount of money borrows from
investors or financial institutions
25 angel Number (whole) The amount of funding the startup receives
from an individual investor who provides
funding to startups in exchange for equity
26 grant Number (whole) The amount of funding the startup received
that does not require repayment
27 private_equity Number (whole) The amount of funding the startup that is not
publicly traded in exchange for equity
received
28 post_ipo_equity Number (whole) The amount of funding the startup that has
gone public and is trading on a stock
exchange received
29 post_ipo_debt Number (whole) The amount of debt the startup that has gone
public and is trading on a stock exchange
received
30 product_crowdfunding Number (whole) The amount of funding the startup received
to develop or produce a specific product from
several investors
31 round_A Number (whole) The amount of funding the startup received
at the first funding round
32 round_B Number (whole) The amount of funding the startup received
at the second funding round
33 round_C Number (whole) The amount of funding the startup received
at the third funding round
34 round_D Number (whole) The amount of funding the startup received
at the fourth funding round
35 round_E Number (whole) The amount of funding the startup received
at the fifth funding round
36 round_F Number (whole) The amount of funding the startup received
at the sixth funding round
37 round_G Number (whole) The amount of funding the startup received
at the seventh funding round
38 round_H Number (whole) The amount of funding the startup received
at the eighth funding round
B. Exploratory analysis
1. What potential areas of interest can you identify within the data set by considering any trends,
patterns, or relationships that may be worth investigating?

• It appears that the overall investment amount for startups and the number of businesses
formed have been rising throughout the years. It would be fascinating to explore when this
tendency began, whether it was lately or decades ago, and how far it has progressed. This can
help us better understand the startup ecosystem's present position and future potential in
comparison to big corporations or state-owned companies.
• It is commonly assumed that America is the ideal place for people to establish their own
businesses. However, we must evaluate this idea by examining overall financing amounts for
startups in different places, such as states in the United States and other parts of the world.
This allows us to determine which locations are most appealing for startup investment and
which regions are today's global startup hubs. This can assist us in determining where the
most innovation and entrepreneurial activity is occurring.
• There are several markets that companies may enter. It would be beneficial to study which
markets are garnering the most money and whether there are any patterns or discrepancies in
startup investment preferences across areas. This can assist us in determining which markets
are the most promising and where entrepreneurs are concentrating their efforts.
• We wonder if the average amount of capital received by firms that are still in operation is the
same as that received by those that have closed. This can help us determine if investors are
prudent enough not to squander money on firms that are unlikely to prosper. It would be
fascinating to examine this pattern and uncover any aspects that may impact a startup's
success or failure.

2. Formulate at least three specific questions you would like to answer using the data set. These
questions should be relevant to the group's interests and inspire further exploration and
visualization using Tableau.

Q1: When did the trend of increasing overall startup investment and the number of startups
founded begin? Is this trend continuing?
Q2: Is America still the most attractive region for startup funding or have other regions surpassed
it? What are the most attractive regions for startup funding globally and which one is the current
global startup hub?
Q3: Which markets are attracting the most funding for startups? Are there any regional variations
in the investment preferences of startups?
Q4: Is there a difference in the average funding amount for companies that are still in operation
compared to those that have closed? If the funding amount contributes to the success or failure of
a startup?

C. Plan for data cleansing


1. How is the overall quality of the data set? Can you identify any issues related to missing
values, inconsistent data entry, outliers, or other data quality concerns?
Kaggle gives usability score of this data set as 8.85, with 100% completeness of data. This means
that the data set is quite sufficient to be used in Tableau. However, when digging into the dataset,
there are several problems:
- Null value of columns range from 9% to nearly 95%. Especially, null rate of columns from 21
(equity_crowdfunding) to 38 (round_H) mostly exceeds 80%.
- There are unnecessary data columns, reasons vary such as:
▪ column 1 (permalink) & 3 (homepage_url): unrelated to the team’s data visualization
goals
▪ column 9 (state-code): data is only recorded for US-locating startups.
▪ column 14 (founded_month) & 15 (founded_quarter): data is duplicated & unnecessary
- column 6 (funding_total_usd): the data type is mis-categorized into string type instead of
number type, due to the fact that null value is noted as “-” instead of “0”.

2. What steps, methods, or tools will you do/use to address these issues in preparation for data
visualization? Note: you do not need to actually clean the data at this stage.

- Null values: missing data will be ignored, as the data set contains nearly 50,000 data points,
which seems to be quite sufficient to yield any significant charts & conclusion. However, data
from columns 21 to 39 will be eliminated.
- Unnecessary data: Columns 1, 3, 9, 14, 15 will be also eliminated due to unrelated interest.
- Wrong data-type: we use a calculated field to change the data points with “-” into “0”. After that,
data type will be changed into Number (whole) type
- Final step: the Excel file will be loaded into Tableau with Data Interpreter to clean out the
unrelated formats.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy