SAP HANA Cloud - Foundation - Unit 4
SAP HANA Cloud - Foundation - Unit 4
The raw data is sent from the database directly to the application. The application then begins to process the data by combining it,
aggregating it, and performing calculations in order to generate meaningful output.
We can find ourselves moving a lot of raw data between the database and the application. Unfortunately, when we move raw data
to the application layer, we make the application code very complex. This is because the code has to deal with the data processing
tasks as well as managing all of the other parts of the application including process flow control, business logic, user Interface (UI)
operations, integrating data from multiple sources, and so on.
The application code is simplified, as it does not have to deal with many data processing tasks. These tasks are
pushed down to SAP HANA Cloud database where in-memory processing takes place.
The processing on the data is carried out where the data resides, so we do not have to move raw data from the
database to the application. We only move the results of the data processing to the application.
Dimension
Cube
Calculation views are created using a graphical editor with the tool SAP Business Application Studio. To develop a
calculation view you should be familiar with basic data modeling concepts.
Modeling Dimensions
The purpose of a dimension type of calculation view is to define a group of related attributes. For example, a material dimension
might include attributes such as material description, material color, material weight, and material price. Although this type of
calculation view can be directly consumed by an application, it is most likely to be found as a consumable view in a calculation
view of the type Cube with star schema to provide the cube's dimensions (see later for details).
It might be helpful to think of calculation views of type dimension as central master data views. You define them once and reuse
them many times. To get started with calculation views of type DIMENSION, you need to set the data category to DIMENSION.
You then proceed to define the source tables and joins if they are needed. You define filters, and then identify the output columns
that are to be exposed by the view. It is also possible to define additional derived attributes. An example of this could be a new
column to generate a weight category based on a range of weights. You could use an IF expression to check each weight and
assign a category.
A dimension calculation view can consume other dimension calculation views. For example you could create a new dimension
called partners that joins customer dimension and customer dimension.
Modeling Cubes
The next type of calculation view is the type Cube. This type of calculation view is used to define a data set comprised of attributes
and measures. This means they are ideal for transactional data modeling. For each measure you are able to define its aggregation
behavior so that business users can slice and dice in an ad-hoc manner and all measures are aggregated according to their behavior.
A Cube calculation view can consume other Cube calculation views or dimension calculation views. A cube calculation view is a
relational model not a dimensional model.
Applications can directly access this type of calculation view. They can also be accessed via SQL statements.
You then select the table, or tables, which are to be included in the model. Typically, you choose a transaction table so that you
have columns from which you can define attributes and measures. It is possible to include more than one transaction table. For
example, you may need to include a header and a line item table to form the complete picture of a sales transaction. In this case,
you simply join the tables. You can also merge transaction tables by implementing a union. Then, select the columns from the
tables that are to be exposed. You can optionally set filters and define additional calculated columns. Finally, rename any columns
to provide meaningful names to the business user.
With Business Application Studio you create all data modeling artifacts including:
Calculation Views
Procedures
Table and Scalar Functions
Flowgraphs
Analytic Privileges
Replication Tasks
The artifacts are created using either graphical editors or text editors. You are able to swap between the types of editors. The
artifacts are stored in simple text files which makes them very portable. Apart from SQL, the format of the files is usually JSON or
XML.
Watch this video to identify the key areas of the Business Application Studio.
For version control of the database artifact files, the open-source Git framework is used. Git is fully integrated in Business
Application Studio. This means as you develop artifacts you can commit them to the team repository. You can also go back to older
version of any artifact. Git allows developers to clone a project and work in parallel with other developers on sections of the project
without affecting others. When development is complete, the artifacts from different branches of the project can be merged together
using Git.
1.level (access role): veri modelinde yer alan bizim geliştirdiğimiz nesnelere olarak erişim. Bunlar CV içinde yer verdiğimiz TF,
Tables, Functions vb... nesnelerdir. Bunlara erişim db access permision ile mümkün oluyor, bunun anlamı bu nesnelere SELECT
yetkisi vermemiz gerekiyor. Bu yetki ilk olarak role verilirse, role dahil edilen her user için de geçerli olmuş olacak. Bu şekilde CV
içinde yer alan DB Nesnelerine erişim hakkımız olacak ama verileri göremeyeceğiz.
2.level (analytic privileges): you now need data access at the row level (satır ve/veya sütün bazında verileri görüntülemek/erişmek
için yetki gerekiyor). Verilere erişim “analytic privilage” ile mümkün oluyor. Analytic Privilage, user’a veya role assign edilmesi
yeterli oluyor.
Analytic Privilage = is an object created to describe for each CV roles to allow data access (CV bazında verilere erişmek için
oluşturulan role nesneleridir). Roller, sadece ŞK1000 verilerinin görüntülenmesi ve/veya Year2017-2018 dönemine ait ŞK1000
verilerinin görüntülenmesi için oluşturulabilir. A user can be assigned to multiple analytic privileges.
3.level (masking role): herhangi bir row veya column’da yer alan verileri bazı user’lar görüntüleyebilirken bazı user’lar
görüntüleyemez. Onlara maskeleme uygulanmıştır.
Distance — what is the longest distance a high-value customer has to travel to reach my sales outlet?
SAP HANA Spatial also provides algorithms that can determine clusters. This helps an organization to locate precise locations that
might be lucrative based on income data and other interesting attributes associated with consumers = SAP HANA Spatial ayrıca
kümeleri belirleyebilen algoritmalar da sağlar. Bu, bir kuruluşun gelir verilerine ve tüketicilerle ilişkili diğer ilginç özelliklere
dayalı olarak kazançlı olabilecek kesin konumları bulmasına yardımcı olur.
Predictive Analysis
Watch this video to learn about predictive analysis approaches.
Sampling — select a few records from large data sets (for example, we need 1000 people from each country).
Binning — grouping records into basic categories (for example, age ranges).
Partitioning — creating sets of data for training, testing, and validation used to train models and check their
predictive accuracy.
The majority of the algorithms are used for scoring or predictive modeling. There are many algorithms provided for all
major data mining categories including:
Association
Classification
Clustering
Regression
Time Series
Neural Networks
PAL algorithms are access can be called directly from procedures in SQLScript or they can be integrated into an SAP HANA
flowgraph which is built using a graphical editor in the Web IDE. A flowgraph defines the data inputs, data processing, and
outputs and parameters used in the predictive model.
Using PAL requires knowledge of statistical methods and data mining techniques. This is because the choice of which algorithm to
use must be made by the developer. So it is important that the developer initially understands the differences between the
algorithms used for data preparation, scoring, and prediction. But they must also know how to fine-tune the algorithms to reach to
desired outcome. For example, a developer would need to decide when to use the time series algorithm for double exponential
smoothing versus triple exponential smoothing and then how to adjust the parameters to consider trend or to eliminate outliers.
Developers who work with SAP HANA PAL are typically already working in predictive analysis projects or have a reasonable
understanding of the topic.
Graph Modeling
Graphs are used to model data that is best represented using a network. Examples include supply chain networks, transportation
networks, utility networks, and social networks. The basic idea behind graph modeling is that it allows a modeler to define a series
of entities (nodes) and link them with lines (edges) to create a network. This network represents how each node relates to all other
nodes.
Graph models can also indicate flow direction between entities and also any number of attributes can be added to nodes or the lines
that connect them. This means that additional meaning can be added to the network and queries can be executed to ask questions
relating to the network.
Imagine a complex supply chain mapped using a graph, where all manufacturers, suppliers, distributors, customers, and consumers
are represented with information stored along the connections. The benefit to this form of modeling is that it makes it easy to
develop applications that can traverse huge graphs at speed. As a result you can ask questions such as the following:
How many hours has the product traveled between two specified points in the network?
Describe the entire journey of a product by listing all of the stop-off points in the path
Graph processing allows you to discover hidden patterns and relationships in your network data, and all in realtime.
SAP HANA Graph provides tools for graph definition and additional language for graph querying to ensure that model
development is more natural and simplified. It also guarantees that the processing is flexible, and of course, optimized for in-
memory processing using a dedicated in-memory graph engine right inside the database.