Top 88 Data Modeling Interview Questions and Answers
Top 88 Data Modeling Interview Questions and Answers
Conceptual: Conceptual data model defines what should the system contain. This model is
typically created by business stakeholders and data architects. The purpose is to organize,
scope, and define business concepts and rules.
Logical: Defines how the system should be implemented regardless of the DBMS. This model is
typically created by data architects and business analysts. The purpose is to develop a
technical map of rules and data structures.
Physical: This data model describes how the system will be implemented using a specific DBMS
system. This model is typically created by DBA and developers. The purpose is the actual
implementation of the database.
(/images/2/041720_1127_Top88DataMo1.png)
The fact represents quantitative data. For example, the net amount which is due. A fact table
contains numerical data as well as foreign keys from dimensional tables.
There are two different types of data modelling schemes schemas: 1) Star Schema, and 2)
Snowflake Schema
Denormalization is used when there is a lot of involvement of the table while retrieving data. It
is used to construct a data warehouse.
Dimensions represent qualitative data. For example, product, class, plan, etc. A dimension
table has textual or descriptive attributes. For example, the product category and product
name are two attributes of the product dimension table.
Fact less fact is a table having no fact measurement. It contains only the dimension keys.
OLTP is an online transactional system. OLAP is an online analysis and data retrieving
process.
Tables in OLTP database are normalized. The tables in OLAP are not normalized.
OLTP is designed for real time business OLAP is designed for the analysis of business
operations. measures by category and attributes.
The collection of rows and columns is called as table. Each and every column has a datatype.
Table contains related data in a tabular format.
Data sparsity is a term used for how much data you have for entity/ dimension of the model.
Composite primary key is referred to the case where more than one table column is used as a
part of primary key.
Primary key is a column or group of columns that unequally identify each and every row in the
table. The value of primary key must not be null. Every table must contain one primary key.
Mens Betlas Co… BIG1,6.26 Inch S… Rechargeable Li… Men Dress Shoe…
₦ 2,900 ₦ 28,990 ₦ 7,546 ₦ 7,900
Metadata describes the data about data. It shows what type of data is actually stored in the
database system.
A data mart is a condensed version of a data warehouse and is designed for use by a specific
department, unit, or set of users in an organization. E.g., marketing sales, HR, or finance.
Forward engineering is a technical term used to describe the process of translating a logical
model into a physical implement automatically.
It is a data cube that stores data as a summary. It helps the user to analyse data quickly. The
data in PDAP is stored in a way that reporting can be done with ease.
A snowflake schema is an arrangement of a dimension table and fact table. Generally, both
tables are further broken down into more dimension tables.
Analysis service gives a combined view of the data that is used in data mining or OLAP.
Sequence clustering algorithm collects paths which are similar or related to each other and
sequences of data having events.
Time series algorithm is a method to predict continuous values of data in table. E.g.,
Performance one employee can forecast the profit or influence.
BI (Business Intelligence) is a set of processes, architectures, and technologies that convert raw
data into meaningful information that drives profitable business actions. It is a suite of
software and services to transform data into actionable intelligence and knowledge.
Bitmap indexes are a special type of database index that uses bitmaps (bit arrays) to answer
queries by executing bitwise operations.
Data warehousing is a process for collecting and managing data from varied sources. It
provides meaningful business enterprise insights. Data warehousing is typically used to
connect and analyse data from heterogeneous sources. It is the core of the BI system, which is
built for data analysis and reporting.
Junk dimension combines two or more related cardinality into one dimension. It is usually
Boolean or flag values.
Cardinality is a numerical attribute of the relationship between two entities or entity sets.
One-to-One Relationships
One-to-Many Relationships
Many-to-One Relationships
Many-to-Many Relationships
37) Define Critical Success Factor and list its four types
Critical Success Factor is a favorable result of any activity needed for organization to reach its
goal.
Industry CSFs
Strategy CSFs
Environmental CSFs
Temporal CSFs
Data mining is a multi-disciplinary skill that uses machine learning, statistics, AI, and database
technology. It is all about discovering unsuspected / previously unknown relationships
amongst the data.
39) What is the difference between Star schema and Snowflake schema?
In a star schema, only a single join creates A snowflake schema requires many joins to
the relationship between the fact table and fetch the data.
any dimension tables.
Offers higher-performing queries using Star The Snow Flake Schema is represented by a
Join Query Optimization. Tables may be centralized fact table which unlikely
connected with multiple dimensions. connected with multiple dimensions.
Identifying entity relationships in DBMS is used to identify a relationship between two entities:
1) strong entity, and 2) weak entity.
Recursive relationship is a standalone column in a table which is connected to the primary key
of the same table.
The process of validating or testing a model which would used to predict testing and validating
outcomes. It can be used for machine learning, artificial intelligence, as well as statistics.
44) What is the difference between logical data model and physical data model?
It is responsible for the actual A physical data model helps you to create a
implementation of data which is stored in the new database model from existing and apply
database. the referential integrity constraint.
It contains an entity, primary key attributes, A physical data model contains a table, key
Inversion keys, alternate key, rule, business constraints, unique key, columns, foreign
relation, definition, etc. key, indexes, default values, etc.
A different type of constraint could be unique, null values, foreign keys, composite key or check
constraint, etc.
Data modelling tool is a software that helps in constructing data flow and the relation between
data. Examples of such tools are Borland Together, Altova Database Spy, casewise, Case Studio
2, etc.
In the hierarchical database, model data is organized in a tree-like structure. Data is stored in a
hierarchical format. Data is represented using a parent-child relationship. In hierarchical DBMS
parent may have many children, children have only one parent.
It is not flexible as it takes time to adapt to the changing needs of the business.
The structure poses the issue in, inter-departmental communication, vertical
communication, as well as inter-agency communication.
Hierarchical data model can create problems of disunity.
Process-driven approach used in data modelling follows a step by step method on the
relationship between the entity-relationship model and organizational process.
50) What are the advantages of using data modelling?
It helps you to manage business data by normalizing it and defining its attributes.
Data modelling integrates the data of various systems to reduce data redundancy.
It enables to create efficient database design.
Data modelling helps the organization department to function as a team.
It facilitates to access data with ease.
Describes data needs for a single project but could integrate with other logical data models
based on the scope of the project.
Designed and developed independently from the DBMS.
Data attributes will have datatypes with exact precisions and length.
Normalization processes to the model, which is generally are applied typically till 3NF.
The physical data model describes data need for a single project or application. It may be
integrated with other physical data models based on project scope.
Data model contains relationships between tables that address cardinality and nullability of
the relationships.
Developed for a specific version of a DBMS, location, data storage, or technology to be used
in the project.
Columns should have exact datatypes, lengths assigned, and default values.
Primary and foreign keys, views, indexes, access profiles, and authorizations, etc. are
defined.
Two types of data modelling techniques are: 1) entity-relationship (E-R) Model, and 2) UML
(Unified Modelling Language).
The object-oriented database model is a collection of objects. These objects can have
associated features as well as methods.
It is a model which is built on hierarchical model. It allows more than one relationship to link
records, which indicates that it has multiple records. It is possible to construct a set of parent
records and child records. Each record can belong to multiple sets that enable you to perform
complex table relationships.
Hashing is a technique which is used to search all the index value and retrieve desired data. It
helps to calculate the direct location of data, which are recorded on disk without using the
structure of the index.
business or natural keys is a field that uniquely identifies an entity. For example, client ID,
employee number, email etc.
61) What is compound key?
When more than one field is used to represent a key, it is referred to as a compound key.
63) What is the difference between primary key and foreign key?
Primary key helps you to uniquely identify a Foreign key is a field in the table that is the
record in the table. primary key of another table.
Primary Key never accepts null values. A foreign key may accept multiple null
values.
Primary key is a clustered index, and data in A foreign key cannot automatically create an
the DBMS table are physically organized in index, clustered, or non-clustered. However,
the sequence of the clustered index. you can manually create an index on the
foreign key.
You can have the single Primary key in a You can have multiple foreign keys in a table.
table.
Keys help you to identify any row of data in a table. In a real-world application, a table could
contain thousands of records.
Keys ensure that you can uniquely identify a table record despite these challenges.
Allows you to establish a relationship between and identify the relation between tables
Help you to enforce identity and integrity in the relationship.
An artificial key which aims to uniquely identify each record is called a surrogate key. These
kinds of key are unique because they are created when you don't have any natural primary key.
They do not lend any meaning to the data in the table. Surrogate key is usually an integer.
Alternate key is a column or group of columns in a table that uniquely identifies every row in
that table. A table can have multiple choices for a primary key, but only one can be set as the
primary key. All the keys which are not primary key are called an Alternate Key.
Fourth normal form is a level of database normalization where there must not have non trivial
dependency other than candidate key.
Database management system or DBMS is a software for storing and retrieving user data. It
consists of a group of programs which manipulate the database.
A table is in 5th normal form only if it is in 4th normal form, and it cannot be decomposed into
any number of smaller tables without loss of data.
Normalization is a database design technique that organizes tables in a manner that reduces
redundancy and dependency of data. It divides larger tables into smaller tables and links them
using relationships.
73) Explain the characteristics of a database management system
MySQL
Microsoft Access
Oracle
PostgreSQL
dbase
FoxPro
SQLite
IBM DB2
Microsoft SQL Server.
Relational Database Management System is a software which is used to store data in the form
of tables. In this kind of system, data is managed and stored in rows and columns, which is
known as tuples and attributes. RDBMS is a powerful data management system and is widely
used across the world.
The main goal of a designing data model is to make sure that data objects offered by the
functional team are represented accurately.
The data model should be detailed enough to be used for building the physical database.
The information in the data model can be used for defining the relationship between tables,
primary and foreign keys, and stored procedures.
Data Model helps businesses to communicate within and across organizations.
Data model helps to documents data mappings in the ETL process
Help to recognize correct sources of data to populate the model
To develop Data model, one should know physical data stored characteristics.
This is a navigational system that produces complex application development,
management. Thus, it requires knowledge of the biographical truth.
Even smaller changes made in structure require modification in the entire application.
There is no set of data manipulation language in DBMS.
The aggregate table contains aggregated data that can be calculated using functions such as: 1)
Average 2) MAX, 3) Count, 4) SUM, 5) SUM, and 6) MIN.
A conformed dimension is a dimension which is designed in a way that can be used across
many fact tables in various areas of a data warehouse.
There are two types of Hierarchies: 1) Level based hierarchies and 2) Parent-child hierarchies.
82) What is the difference between a data mart and data warehouse?
Data mart Data warehouse
Data mart focuses on a single subject area of Data warehouse focuses on multiple areas of
business. business.
It is used to make tactical decisions for It helps business owners to take a strategic
business growth. decision
Data mart follows the bottom-up model Data warehouse follows a top-down model
Data source comes from one data source Data source comes from more than one
heterogeneous data sources.
XMLA is an XML analysis that is considered as standard for accessing data in Online Analytical
Processing (OLAP).
Junk dimension helps to store data. It is used when data is not proper to store in schema.
The situation when a secondary node selects target using ping time or when the closest node is
a secondary, it is called as chained data replication.
A virtual data warehouse gives a collective view of the completed data. A virtual data
warehouse does not have historical data. It is considered as a logical data model having
metadata.
Snapshot is a complete visualization of data at the time when data extraction process begins.
The ability of system to extract, cleanse, and transfer data in two directions is called as a
directional extract.
Prev (/teradata-interview-questions.html) Report a Bug
Next (/data-warehousing-pdf.html)
About
About Us (/about-us.html)
Advertise with Us (/advertise-us.html)
Write For Us (/become-an-instructor.html)
Contact Us (/contact-us.html)
Career Suggestion
SAP Career Suggestion Tool (/best-sap-module.html)
Software Testing as a Career (/software-testing-career-
complete-guide.html)
Interesting
eBook (/ebook-pdf.html)
Blog (/blog/)
Quiz (/tests.html)
SAP eBook (/sap-ebook-pdf.html)
Execute online
Execute Java Online (/try-java-editor.html)
Execute Javascript (/execute-javascript-online.html)
Execute HTML (/execute-html-online.html)
Execute Python (/execute-python-online.html)
© Copyright - Guru99 2021
Privacy Policy (/privacy-policy.html) | Affiliate
Disclaimer (/affiliate-earning-disclaimer.html) | ToS
(/terms-of-service.html)