Data set selection and evaluation_FINAL
Data set selection and evaluation_FINAL
• It appears that the overall investment amount for startups and the number of businesses
formed have been rising throughout the years. It would be fascinating to explore when this
tendency began, whether it was lately or decades ago, and how far it has progressed. This can
help us better understand the startup ecosystem's present position and future potential in
comparison to big corporations or state-owned companies.
• It is commonly assumed that America is the ideal place for people to establish their own
businesses. However, we must evaluate this idea by examining overall financing amounts for
startups in different places, such as states in the United States and other parts of the world.
This allows us to determine which locations are most appealing for startup investment and
which regions are today's global startup hubs. This can assist us in determining where the
most innovation and entrepreneurial activity is occurring.
• There are several markets that companies may enter. It would be beneficial to study which
markets are garnering the most money and whether there are any patterns or discrepancies in
startup investment preferences across areas. This can assist us in determining which markets
are the most promising and where entrepreneurs are concentrating their efforts.
• We wonder if the average amount of capital received by firms that are still in operation is the
same as that received by those that have closed. This can help us determine if investors are
prudent enough not to squander money on firms that are unlikely to prosper. It would be
fascinating to examine this pattern and uncover any aspects that may impact a startup's
success or failure.
2. Formulate at least three specific questions you would like to answer using the data set. These
questions should be relevant to the group's interests and inspire further exploration and
visualization using Tableau.
Q1: When did the trend of increasing overall startup investment and the number of startups
founded begin? Is this trend continuing?
Q2: Is America still the most attractive region for startup funding or have other regions surpassed
it? What are the most attractive regions for startup funding globally and which one is the current
global startup hub?
Q3: Which markets are attracting the most funding for startups? Are there any regional variations
in the investment preferences of startups?
Q4: Is there a difference in the average funding amount for companies that are still in operation
compared to those that have closed? If the funding amount contributes to the success or failure of
a startup?
2. What steps, methods, or tools will you do/use to address these issues in preparation for data
visualization? Note: you do not need to actually clean the data at this stage.
- Null values: missing data will be ignored, as the data set contains nearly 50,000 data points,
which seems to be quite sufficient to yield any significant charts & conclusion. However, data
from columns 21 to 39 will be eliminated.
- Unnecessary data: Columns 1, 3, 9, 14, 15 will be also eliminated due to unrelated interest.
- Wrong data-type: we use a calculated field to change the data points with “-” into “0”. After that,
data type will be changed into Number (whole) type
- Final step: the Excel file will be loaded into Tableau with Data Interpreter to clean out the
unrelated formats.