Data Analytics
Data Analytics
WEEK 1:
- Gap analytics: it is not focused on comparing the current state with an ideal or desired state.
- Data ecosystems are made up of various elements that interact with one another in order to
produce, manage, store, organize, analyze, and share data.
WEEK 2:
- Analytical skills: The qualities and characteristics associated with solving problems using facts
- A technical mindset: The analytical skill that involves breaking processes down into smaller
steps and working with them in an orderly, logical way
- Data design: The analytical skill that involves how you organize information
- Understanding context: The analytical skill that has to do with how you group things into
categories
- Data strategy: The analytical skill that involves managing the processes and tools used in data
analysis
- Analytical thinking involves identifying and defining a problem and then solving it by using data
in an organized, step-by-step manner.
+ The five key aspects to analytical thinking. They are visualization, strategy, problem
orientation, correlation, and finally, big-picture and detail-oriented thinking.
Visualization is the graphical representation of information
With so much data available, having a strategic mindset is key to staying focused and on
track. Strategizing helps data analysts see what they want to achieve with the data and
how they can get there. Strategy also helps improve the quality and usefulness of the
data we collect. By strategizing, we know all our data is valuable and can help us
accomplish our goals.
Problem orientation: It's all about keeping the problem top of mind throughout the
entire project
Correlation does not equal causation. In other words, just because two pieces of data
are both trending in the same direction, that doesn't necessarily mean they are all
related.
Big-picture thinking is like looking at a complete puzzle. detail-oriented thinking is all
about figuring out all of the aspects that will help you execute a plan. In other words, the
pieces that make up your puzzle
- Gap analysis is used to examine and evaluate how a process currently works with the goal of
getting to where you want to be in the future.
WEEK 3:
- The data analysis life cycle:
+ Ask: Define the problem and confirm stakeholder expectations
Defining a problem means you look at the current state and identify how it's different
from the ideal state. For instance, a sports arena might want to reduce the time fans
spend waiting in the ticket line. The obstacle is figuring out how to get the customers to
their seats more quickly
Understand the stakeholder expectation. For instance, if your manager assigns you a
data analysis project related to business risk, it would be smart to confirm whether they
want to include all types of risks that could affect the company, or just risks related
to weather such as hurricanes and tornadoes.
+ Prepare: Collect and store data for analysis
+ Process: Clean and transform data to ensure integrity (data analysts find and eliminate
any errors and inaccuracies that can get in the way of results. This usually means cleaning data,
transforming it into a more useful format, combining two or more datasets to make information
more complete and removing outliers, which are any data points that could skew the
information)
+ Analyze: Use data analysis tools to draw conclusions
+ Share: Interpret and communicate results to others to make data-driven decisions
+ Act: Put your insights to work in order to solve the original problem
- The life cycle of data is plan, capture, manage, analyze, archive and destroy.
+ During planning, a business decides what kind of data it needs, how it will be managed
throughout its life cycle, who will be responsible for it, and the optimal outcomes.
For example, let's say an electricity provider wanted to gain insights into how to save people
energy. In the planning phase, they might decide to capture information on how much
electricity its customers use each year, what types of buildings are being powered, and what
types of devices are being powered inside of them. The electricity company would also decide
which team members will be responsible for collecting, storing, and sharing that data.
+ Capture data: data is collected from a variety of different sources and brought into the
organization. The common method for collecting data:
getting data from outside resources. For example, if you were doing data analysis on
weather patterns, you'd probably get data from a publicly available dataset like the
National Climatic Data Center.
Company's own documents and files ~ database: A database is a collection of data
stored in a computer system. When you maintain a database of customer information,
ensuring data integrity, credibility, and privacy are all important concerns
+ Analyze data: In this phase, the data is used to solve problems, make great decisions, and
support business goals.
+ Archive phase. Archiving means storing data in a place where it's still available, but may not be
used again
NOTE: Be careful not to mix up or confuse the six stages of the data life cycle (Plan, Capture,
Manage, Analyze, Archive, and Destroy) with the six phases of the data analysis life cycle (Ask,
Prepare, Process, Analyze, Share, and Act). They shouldn't be used or referred to
interchangeably.
- While the data analysis process will drive your projects and help you reach your business goals,
you must understand the life cycle of your data in order to use that process. To analyze your
data well, you need to have a thorough understanding of it. Similarly, you can collect all the data
you want, but the data is only useful to you if you have a plan for analyzing it.
- The Plan and Ask phases both involve planning and asking questions, but they tackle different
subjects. The Ask phase in the data analysis process focuses on big-picture strategic thinking
about business goals. However, the Plan phase focuses on the fundamentals of the project, such
as what data you have access to, what data you need, and where you’re going to get it.
- A database is a collection of data stored in a computer system.
WEEK 4:
- A query is a request for data or information from a database. When you query databases,
you use SQL to communicate your question or request. You and the database can always
exchange information as long as you speak the same language.
Course 2:
WEEK 1:
- Structured thinking is the process of recognizing the current problem or situation, organizing
available information, revealing gaps and opportunities, and identifying the options.
- Operators are symbols used in formulas, including + (addition), – (subtraction), *
(multiplication), and / (division).
- In data analytics, qualitative data measures qualities and characteristics + is subjective
- Dashboards monitor live, incoming data from multiple datasets and organize the
information into one central location. Reports are static collections of data.
- Small data is effective for analyzing day-to-day decisions. Big data is effective for analyzing
more substantial decisions.
+ Small data involves datasets concerned with a small number of specific metrics. Big data
involves datasets that are larger and less specific.
+ Small data focuses on short, well-defined time periods. Big data focuses on change over a long
period of time.
- Small data involves a small number of specific metrics over a shorter period of time. It’s
effective for analysing day-to-day decisions. Big data involves larger and less specific datasets
and focuses on change over a long period of time. It’s effective for analysing more substantial
decisions.
- Structured thinking is the process of recognizing the current problem or situation, organizing
available information, revealing gaps and opportunities, and identifying the options.
- Data analysts ask thoughtful questions to help them reach solid conclusions, consider how
to share data with others, and help team members make effective decisions.
- Stakeholders included the owner, the vice president of communications, and the director of
marketing and finance
- Data analysts work with a variety of problems. These include: making predictions,
categorizing things, spotting something unusual, identifying themes, discovering
connections, and finding patterns.
+ Making predictions: This problem type involves using data to make an informed decision
about how things may be in the future.
+ Categorizing things: This means assigning information to different groups or clusters based
on common features
+ Spotting unusual things: data analysts identify data that is different from the norm
+ Identifying themes: Identifying themes takes categorization as a step further by grouping
information into broader concepts
+ Discovering connections: enables data analysts to find similar challenges faced by different
entities, and then combine data and insights to address them (a scooter company is
experiencing an issue with the wheels it gets from its wheel supplier. That company would
have to stop production until it could get safe, quality wheels back in stock. But meanwhile,
the wheel companies encountering the problem with the rubber it uses to make wheels, turns
out its rubber supplier could not find the right materials either. If all of these entities could
talk about the problems they're facing and share data openly, they would find a lot of similar
challenges and better yet, be able to collaborate to find a solution)
+ Finding patterns: Data analysts use data to find patterns by using historical data to
understand what happened in the past and is therefore likely to happen again
- Effective questions follow SMART methodology:
+ Specific: Specific questions are simple, significant and focused on a single topic or a few
closely related ideas. For example, instead of asking a closed-ended question, like, are kids
getting enough physical activities these days? Ask what percentage of kids achieve the
recommended 60 minutes of physical activity at least five days a week?
+ Measurable: Measurable questions can be quantified and assessed. An example of an
unmeasurable question would be, why did a recent video go viral? Instead, you could ask how
many times was our video shared on social channels the first week it was posted?
+ Action-oriented: Action-oriented questions encourage change. So rather than asking, how can
we get customers to recycle our product packaging? You could ask, what design features will
make our packaging easier to recycle?
+ Relevant
+ Time-bound: Time-bound questions specify the time to be studied
- Fairness means asking questions that make sense to everyone. Even if a data analyst suspects
people will understand abbreviations, slang, or other jargon, it’s important to write questions
with simple wording.
WEEK 2:
- Data-inspired decision-making explores different data sources to find out what they have in
common
- An algorithm is a process or set of rules to be followed for a specific task
- Quantitative data is all about the specific and objective measures of numerical facts (how
often, how many,…)
- Qualitative data describes subjective or explanatory measures of qualities and
characteristics or things that can't be measured with numerical data (why question)
Reports thường được tạo ra và cập nhật theo chu kỳ, trong khi dashboard thường được cập
nhật liên tục hoặc theo thời gian thực.
- Data and metrics:
+ A metric is a single, quantifiable type of data that can be used for measurement (VD: data sẽ
bao gồm các thông tin liên quan đến quality, prices,… Nhưng muốn so sánh revenue giữa
salesperson thì phải dùng metric sales = quantity * price)
- Metric goal is a measurable goal set by a company and evaluated using metrics
- Types of dashboard:
+ Strategic: focuses on long term goals and strategies at the highest level of metrics
+ Operational: short-term performance tracking and intermediate goals
+ Analytical: consists of the datasets and the mathematics used in these sets
- Dashboards are visualizations: Visualizing data can be enormously useful for understanding
and demonstrating what the data really means.
- Dashboards identify metrics: Relevant metrics may help analysts assess company performance.
Some differences include the timeframe described in each dashboard. The operational
dashboard has a timeframe of days and weeks, while the strategic dashboard displays the entire
year. The analytic dashboard skips a specific timeframe. Instead, it identifies and tracks the
various KPIs that may be used to assess strategic and operational goals.
- Dashboards can help companies perform many helpful tasks, such as:
+ Track historical and current performance.
+ Establish both long-term and/or short-term goals.
+ Define key performance indicators or metrics.
+ Identify potential issues or points of inefficiency.
While almost every company can benefit in some way from using a dashboard, larger
companies and companies with a wider range of products or services will likely benefit more.
Companies operating in volatile, or swiftly changing markets like marketing, sales, and tech also
tend to more quickly gain insights and make data-informed decisions.
- Dashboards can provide convenient access to information and analytics and are easy to use in
collaboration. Moreover, they may be tailored to the specific needs of the businesses, like
tracking performance towards a milestone.
Using a previous example of the ice cream store, the store owner might use an operational
dashboard to track their day-to-day sales. Meanwhile, they might use a strategic dashboard to
decide whether they have enough capacity to expand their business.
- Mathematical approach: It means looking at a problem and logically breaking it down step-by-
step, so you can see the relationship of patterns in your data, and use that to analyze your
problem.
- Small data can be useful for making day-to-day decisions
- Big data: has larger, less specific datasets covering a longer period of time
WEEK 3:
* Working with spreadsheets
- Operators which are symbols that name the type of operation or calculation to be performed
(dấu)
- DIV error happens when a formula is trying to divide a value in a cell by zero or by an empty
cell (chia cho 0 hoặc ô trống)
- ERROR tells us the formula can't be interpreted as it is input (input của công thức không đúng)
VD: sum(B2:B4 C6:C8)~ thiếu dấu , giữa 2 vế
- The N/A error tells you that the data in your formula can't be found by the spreadsheet (không
tìm thấy thông tin) VD: khi dùng function vlookup, Vlookup(abc,….) nhưng giá trị cần tìm thực tế
là abcd không có dữ liệu abc trong bảng
- NAME error can happen when a formula's name isn't recognized or understood (tên công thức
chưa đúng) VD: Vloookup (thừa 1 chữ o)
- The NUM error tells us that a formula's calculation can't be performed as specified by the data.
- The VALUE error can indicate a problem with a formula or referenced cells (giá trị dùng trong
công thức chưa đúng)
- The REF error, which often comes up when cells being referenced in a formula have been
deleted (công thức chứa dòng dữ liệu bị xoá)
- The problem domain: the specific area of analysis that encompasses every activity affecting or
affected by the problem.
- Structured thinking is the process of recognizing the current problem or situation, organizing
available information, revealing gaps and opportunities, and identifying the options
- A statement of work is a document that clearly identifies the products and services a vendor or
contractor will provide to an organization. It includes objectives, guidelines, deliverables,
schedule, and costs.
- A scope of work is project-based and sets the expectations and boundaries of a project. A
scope of work may be included in a statement of work to help define project outcomes (an
agreed-upon outline of the work)
- Deliverables are items or tasks you will complete before you can finish the project.
- Reports notify everyone as you finalize deliverables and meet milestones.
- Milestones are significant tasks you will confirm along your timeline to help everyone know
the project is on track.
- Timelines include due dates for when deliverables, milestones, and/or reports are due.
WEEK 3:
- 3 common stakeholders group:
+ Executive team: provides strategic and operational leadership to the company. They set goals,
develop strategy, and make sure that strategy is executed effectively. These stakeholders think
about decisions at a very high level and they are looking for the headline news about your
project first. They are less interested in the details.
+ Customer-facing team: The customer-facing team includes anyone in an organization who has
some level of interaction with customers and potential customers.
+ Data science team: Organizing data within a company takes teamwork.
- Working effectively with stakeholders:
+ Discuss goals
+ Fell empowered to say no
+ Plan for the unexpected
+ Know your project
+ Start with words and visuals
+ Communicate often
PREPARING DATA
WEEK 1:
- Data is collected through interviews, observations (most use), forms, questionnaires, surveys
and cookies ( which are small files stored on computers that contain information about user)
* Data sources
- First-party data: This is data collected by an individual or group using their own resources.
Collecting first-party data is typically the preferred method because you know exactly where it
came from
- Second-party data: which is data collected by a group directly from its audience and then
sold
- Third-party data: data collected from outside sources who did not collect it directly
- Remember to consider time frame when collecting data
* Different type of data(format)
- Nominal data is a type of qualitative data that's categorized without a set order => this data
doesn't have a sequence (yes, no, not sure)
- Ordinal data: a type of qualitative data with a set order or scale ( rank the movie from 1 to 5)
- Internal data: which is data that lives within a company's own systems
- External data: data that lives and is generated outside of an organization (It’s useful when
analysis depends on as many data sources as possible)
- Structured data: data that's organized in a certain format, such as rows and columns
- Unstructured data: data that is not organized in any easily identifiable manner (audio, video
files, emails, social media,…)
- Data model: a model that is used for organizing data elements and how they relate to one
another
+ Data element: hey're pieces of information, such as people's names, account numbers, and
addresses
* Data types:
- A data type is a specific kind of data attribute that tells what kind of value the data is. Data
type can be number, text or string and boolean
+ Text or string: a sequence of characters and punctuation that contains textual information
( can include number that cannot be used for calculation such as house number,…)
+ Boolean: a data type with only two possible values: true or false
- Wide data: every data subject has a single row with multiple columns to hold the values of
various attributes of the subject
- Long data: data in which each row is one time point per subject, so each subject will have
data in multiple rows.
- Using a metadata repository, a data analyst can find it easier to bring together multiple sources
of data, confirm how or when data was collected, and verify that data from an outside source is
being used appropriately.
- Metadata is stored in a single, central location and it gives the company standardized
information about all of its data
- Data governance is a process to ensure the formal management of a company’s data assets
- CSV files use plain text and are delineated by characters, such as a comma. A delineator
indicates a boundary or separation between two things. (CSV file saves data in a table format)
* Sorting and filtering
- Sorting involves arranging data into a meaningful order to make it easier to understand,
analyze, and visualize
- Filtering means showing only the data that meets a specific criteria while hiding the rest
* Working with SQL
- Khi viết dòng lệnh có thể viết hoa hay viết thường đều được, ví dụ như select, SELECT, SeLect,..
- Có thể dùng ‘….’ Hoặc “…..” khi viết điều kiện. Giả sử như trường hợp điều kiện có sử dụng dấu
‘ chẳng hạn như Shepherd’s pie thì khi viết điều kiện, ta sẽ viết như sau:
where Favorite_food =” Shepherd’s pie” vì nếu viết là Favorite_food =’ Shepherd’s pie’ thì SQL sẽ
hiểu điều kiện chỉ là Favorite_food = Shepherd
- Cách viết comment trong SQL: viết sau dấu -- ví dụ như:
- Naming conventions: consistent guidelines that describe the content, date, or version of a
file in its name
WEEK 4:
PROCESS DATA FROM DIRTY TO CLEAN
WEEK 1: Data integrity
- A strong analysis depends on the integrity of the data
- Data integrity is the accuracy, completeness, consistency and trustworthiness of data
throughout its lifecycle
- Data replication is the process of storing data in multiple locations
- Data transfer is the process of copying data from a storage device to memory or from one
computer to another
- Data manipulation is the process of changing data to make it more organized and easier to
read
- Threads to data integrity: human error, viruses, malware, hacking and system failures
* Dealing with insufficient data
- Types of insufficient data:
+ Data from only one source
+ Data that keeps updating ~ it means that the data is still coming and not completed
+ Outdated data
+ Geographically-limited data
- Ways to adres insufficient data:
+ Identify trends with the available data
+ Wait for more data if time allows
+ Talk with stakeholders and adjust your objective
+ Look for a new dataset
* The importance of sample size:
- Random sampling is a way of selecting a sample from a population so that every possible
type of the sample has an equal chance of being chosen
- Increase the sample size to meet specific needs of your project:
+ For a higher confidence level, use a larger sample size
+ To decrease the margin of error, use a larger sample size
+ For greater statistical significance, use a larger sample size
NOTE: You could probably accept a larger margin of error surveying how residents feel about
the new library versus surveying residents about how they would vote to fund it. For that
reason, you would most likely use a larger sample size for the voter survey.
* Testing data
- Statistical power is the probability of getting meaningful results from a test
- Hypothesis testing is a way to see if a survey or experiment has meaningful results
- If a test is statistically significant, it means the results of the test are real and not an error
caused by random chance ( statistically significant = 60% => there is 60% the test is real and
realiable). 80% is considered to be the lowest accepted level of significance
- We need to consider all the factors before deciding the sample size to make sure a high
statistical power
* Proxy data
+ Outdated data
+ Incomplete data
+ Incorrect/inaccurate data
+ Inconsistent data
- Lựa chọn dữ liệu đạt 1 điều kiện nhất định: bình thường có thể chỉ dùng cú pháp where tuy
nhiên ví dụ trong dữ liệu phần country có cả USA và US , đển lựa chọn data mà customer từ US
thì sử dụng câu lệnh substr
- TRIM function: eliminate those extra spaces for consistency
- Concat: add strings together to create new text strings that can be used as unique keys
- Coalesce: an be used to return non-null values in a list. Null values are missing values
Check column product trước, nếu có null value ở cột product thì sẽ lấy thông tin product_code
WEEK 4: DOCUMENTING RESULTS AND THE CLEANING PROCESS
* Verifying and reporting results
- Verification is a process to confirm that a data cleaning effort was well- executed and the
resulting data is accurate and reliable.
- A changelog: a file containing a chronologically ordered list of modifications made to project
- Verification process:
+ Going back to your original unclean data set and comparing it to what you have now. Review
the dirty data and try to identify any common problems.
- Big picture when verifying data-cleaning:
+ Consider the business problem
+ Consider the goal
+ Consider the data
- The CASE statement goes through one or more conditions and returns a value as soon as a
condition is met
* Documenting results
- Documentation which is the process of tracking changes, additions, deletions and errors
involved in your data cleaning effort
- To see what has been changed in the data set, we can use query history in SQL and show edit
history in spreadsheet
- Common data errors:
+ Human error in data entry
+ Flawed processes
+ System issues
ANALYZE DATA TO ANSWER QUESTIONS
WEEK 1: DATA ANALYTICS BASICS
* Analysis process
- Analysis is the process used to make sense of the data collected
- The goal of analysis is to identify trends and relationships within the data
- 4 phases of analysis:
+ Organize data
+ Format and adjust data
+ Get input from others
+ Transform data
- Sorting is when you arrange data into a meaningful order to make it easier to understand,
analyze, and visualize
- Filtering is showing only the data that meets a specific criteria while hiding the rest
* Sorting ịn spreadsheets
- Sort sheet: all of the data in a spreadsheet is sorted by the conditions of a single column, but
the related information across each row stays together.
- Sort range: chỉ những specified cells được sort còn các ô còn lại sẽ giữ nguyên không thay đổi
- Sort function:
- Inner JOIN is a function that returns records with matching values in both tables
+ Phần inner join, departments represents the other table that we want to combine. We can
specify which column and each table will contain the matching join key by writing On…..
- LEFT JOIN is a function that will return all the records from the left table and only the
matching records from the right table
- RIGHT JOIN will return all records from the right table and only the matching records from
the left
- The importance of aliases: Aliases are used in SQL queries to create temporary names for a
column or table (AS)~ tạo 1 cái tên ngắn hơn khi existing name is too long. Dùng được cho cả
phần Select và From
- Ví dụ như phần left join and right join, ở phần Select, phần customers.XXX và sales.XXX thì
customers và sales là tên viết tắt của các table, XXX là tên column của từng table
* Count and count distinct
- COUNT is a query that returns the number of rows in a specified range
- COUNT DISTINCT is a query that only returns the distinct values in that range. This means
COUNT DISTINCT doesn't count repeating values
- Group by: nhóm theo tiêu chí
* Subquery
- A subquery is a SQL query that is nested inside of a larger query
- The inner query executes first so that the results can be passed on to the outer query to use
- The modulo operator is represented by the percent symbol. This is an operator that returns
the remainder when one number is divided by another = MOD function
- Division: /
+ Where command tells SQL to exclude the value of total bags <>0 since we cannot divide sth
by 0
- Extract command lets us pull one part of a given date to use
+ Select into: This statement copies data from one table into a new table but it doesn't add
the new table to the database
+ If lots of people will be using the same table, then the CREATE TABLE statement might be the
better option
SHARE DATA THROUGH THE ART OF VISUALIZATION
WEEK 1: VISUALIZING DATA
- Assessible visualizations:
+ Alternative text provides a textual alternative to non-text content. It allows the content and
function of the image to be accessible to those with visual or certain cognitive disabilities
+ Avoid relying solely on color to convey information, and instead distinguished with different
textures and shapes
TABLEAU
* Basic information about tableau
- Tableau is a business intelligence and analytics platform that you can use online to help
people see, understand, and make decisions with data
- A diverging color palette displays two ranges of values using color intensity to show the
magnitude of the number and the actual color to show which range the numbers from
- A dashboard is a tool that organizes information, typically from multiple data sets, into one
central location for tracking, analysis, and simple visualization through charts, graphs, and
maps
- 3 data storytelling steps:
+ Engaging audiences: capturing and holding someone’s interest and attention
+ Create compelling visuals: Visuals should take your audience on a journey of how the data
changed over time or highlight the meaning behind the numbers.
+ Tell the story in an interesting narrative: For example using word clouds. These words are
presented in different sizes based on how often they appear in your data set ( xuất hiện nhiều
thì text lớn còn xuất hiện ít thì text nhỏ)
- Story telling:
+ Setting is what’s happening and other background info.
+ The big reveal, or resolution, is how the data shows the way to solve the conflict or problem
you face.
+ The aha moment means sharing data-driven recommendations for success. To identify it,
ask: “What’s the fix moving forward?”
+ Characters are people affected by your story.
+ The plot is what creates the conflict that compels the characters to act—and what the data
analysis seeks to resolve.
PROGRAMMING
- To find out more about the function of a function name, we can type function name => Then
in the help window, there is information about the function name to read
- Variable: a presentation of a value in R that can be stored for use later during programming
- A variable name should start with a letter and can also contain numbers and underscores
- If you want to add a comment to explain what you’re doing in R, start using # and then write
the comment
- Create variables:
40) Giá trị sổ sách của tài sản:
A) Luôn là thước đo tốt nhất về giá trị của công ty đối với nhà đầu tư.
B) Đại diện cho giá trị thị trường thực của những tài sản đó theo GAAP.
C) Được xác định theo Nguyên tắc kế toán được chấp nhận chung (GAAP) và dựa trên chi phí
của những tài sản đó.
D) Luôn cao hơn chi phí thay thế của tài sản.
E) Được thể hiện trên báo cáo thu nhập của công ty.
Length: length()
+ Name vectors:
- Creating lists: Lists are different from atomic vectors because their elements can be of any
type—like dates, data frames, vectors, matrices, and more. Lists can even contain other lists.
- Aesthetic: A visual property of an object in your plot (the size, shape or color of your data
points)
- Geom: the geometric object used to represent your data (the size, shape or color of your data
points)
- Facets: display smaller groups or subsets of your data
- Labels and annotations: customize your plot
NOTE: không để dấu cộng xuống dòng
Alpha: độ đậm nhạt
Smooth:line
The geom underscore jitter function creates a scatter plot and then adds a small amount of
random noise to each point in the plot. Jittering helps us deal with over-plotting, which
happens when the data points in a plot overlap with each other. Jittering makes the points
easier to find
Đối với bar chat, R sẽ tự count số lần xuất hiện của x
*Facets function
- Facet functions let you display smaller groups or subsets of your data
- R Markdown files can be converted into HTML, PDF and Word, slideshow presentations, or
dashboards.
- YAML: a language for data that translates it so it’s readable
+ Syntax for YALM: ---……….--- ~ trong khoảng ba chấm bao gồm thông tin về title, author,
date, output. R markdown automatically generates these information nhưng ta có thể tự tạo
- Inline code: A data analyst inserts some code directly into their R Markdown file so that they
can refer to it directly in their write-upx
- Embed link to R Markdown file: nhiều khi link có tên dài nên ta có thể chuyển thành embed
“link”
From
To
- Embed image:
- Adding bullet points: using * before the thing you wanna add
- Code chunk: code added directly to an .Rmd file
- Delimiter: a character that indicates the beginning and ending of an item