Bank Fraud Detuct Project
Bank Fraud Detuct Project
1. INTRODUCTION
In banking, fraud can involve using stolen credit cards, forging checks,
misleading accounting practices, etc. In insurance, 25% of claims contain some
form of fraud, resulting in approximately 10% of insurance payout dollars. Fraud
can range from exaggerated losses to deliberately causing an accident for the
payout. With all the different methods of fraud, finding it becomes harder still.
1
Transaction Banking can be defined as set of instruments and services that
bank offers to their customers and their partners to financially support their
reciprocal exchange of cash. In banking sector millions of transactions occurring
annually. Most of the transaction were legitimate but some transaction events are
fraud and criminal attempts. Due to this type of transaction bank lost their trust in
their customer and also customer feels that his/her money at bad hands. There are
multidimensional data in our hand financial fraud detection is a complex task. The
bank transaction data is multi variant and time variant .Due to this complex data
model it needs to understand clearly by banking professionals to stop fraudulent
event transaction like money laundering and unauthorized transaction.
2
In many application scenarios, heterogeneous data sources need to be
integrated before visual or automatic analysis methods can be applied. Therefore,
the first step is often to preprocess and transform the data to derive different
representations for further exploration (as indicated by the Transformation arrow in
the Figureure). Other typical preprocessing tasks include data cleaning,
normalization, grouping, or integration of heterogeneous data sources.
After the transformation, the analyst may choose between applying visual or
automatic analysis methods. If an automated analysis is used first, data mining
methods are applied to generate models of the original data. Once a model is created
the analyst has to evaluate and refine the models, which can best be done by
interacting with the data. Visualizations allow the analysts to interact with the
automatic methods by modifying parameters or selecting other analysis algorithms.
Model visualization can then be used to evaluate the findings of the generated
models. Alternating between visual and automatic methods is characteristic for the
Visual Analytics process and leads to a continuous refinement and verification of
preliminary results. Misleading results in an intermediate step can thus be discovered
at an early stage, leading to better results and a higher confidence.
If a visual data exploration is performed first, the user has to confirm the
generated hypotheses by an automated analysis. User interaction with the
visualization is needed to reveal insightful information, for instance by zooming in
on different data areas or by considering different visual views on the data. Findings
in the visualizations can be used to steer model building in the automatic analysis.
In summary, in the Visual Analytics Process knowledge can be gained from
visualization, automatic analysis, as well as the preceding interactions between
visualizations, models, and the human analysts.
3
Figure 1.2 Visual Analytics (VA)
4
1.1.2 Analysis and Modification
The existing system does not have not Time variant.As transaction is
tremendously increasing in banks fraud detection system is needed.Due to various
crime happening in the banking sector It must be stopped or detected before
happened now a day's data stealing in banking sector happens at frequent interval
many loan application fraud are happening to reduce this we can use VA approach
from customer transactions.
1.1.3 Establish Project Statement
As there is a millions of transactions are happening daily in a bank. Moreover
some people make fraudulent transactions. This system is to detect fraudulent
transaction and customer and visualize it based on the previous bank transactions.
By considering the transaction of a group of bank customers a new system is
implemented by decision tree algorithm. The main part is designed using R
programming.
1.2 LITERATURE REVIEW
There is a number of surveys that focus on fraud detection. In 2002,
Bolton and Hand [32] published a review about fraud detection approaches. They
described the available tools for statistical fraud detection and identified the most
used technologies in four areas: credit card
fraud, money laundering, telecommunication fraud, and computer intrusion. Kou et
al. [20] presented a survey of techniques for identifying the
same types of fraud as described in [32]. The different approaches are
broadly classified into two categories: misuse and anomaly detection.
Both categories present techniques such as: outlier detection, neural
networks, expert systems, model-based reasoning, data mining, state
transition analysis, and information visualization. These works helped
5
us to understand diverse fraud domains and how they are normally
tackled.
6
A visual analytics based first financial data flow is presented by [34]. In this
approach, data are aggregated in order to allow users to draw analytical conclusions
and make transaction decisions. EventFlow [28] was designed to facilitate analysis,
query, and data transformation of temporal event datasets. The goal of this work is
to create aggregated data representations to track entities and the events related to
them.
When looking at approaches for event monitoring in general, Huang et al. [10]
presented a VA framework for stock market security. In order to reduce the
number of false alarms produced by traditional AI techniques, this work presents a
visualization approach combining a 3D tree map for market performance analysis
and a node-link diagram for network analysis.
Dillaetal.[6], In his paper he presented the current needs in FFD. The authors
presented a theoretical framework to predict when and how the investigators should
apply VA techniques. They evaluated various visualization techniques and derived
which visualizations support different cognitive processes. In addition, the authors
also suggest future challenges in this research area and discuss the efficacy of
interactive data visualization for fraud detection, which we used as a starting point
for our approach.
Carminati et al. [3] presented a semi-supervised online banking fraud
analysis and decision support based on profile generation and analysis.
While this approach provides no visual support for fraud analysis, it is
directly related to our approach since we are also focusing on profile
analysis. However, we believe that VA methods have great potential
to foster the investigation of the data and enable the analyst to better
fine-tune the scoring system. In the health domain, Rind et al. [33] conducted a
survey study focusing on information visualization systems for exploring and
querying electronic health records.
7
Moreover, Wagner et al. [36] presented a systematic overview and
categorization of malware visualization systems from a VA perspective. Both
domains of these studies are similar to FFD, since they both involve multivariate and
temporal aspects. However, the FFD domain demands for special consideration due
to the complexity of the involved tasks
1.3 OBJECTIVE
The main objective of this project is to evaluate and visualize fraudulent
transactions by considering the previous customer transaction data. To attain it
decision tree and random forest algorithm is used. This is done using R programming
and Tableau for visualizing fraudulent transactions and can make a suggestion , whet
ther the transaction is liable or not.
8
CHAPTER 2
2. SYSTEM ANALYSIS
2.1 EXISTING SYSTEM
Financial institutions handle millions of transactions from clients per
year. Although the majority part of these transactions being legitimate,
a small number of them are criminal attempts, which may cause serious
harm to customers or to the financial institutions themselves. Thus,
the trustability of each transaction has to be assessed by the institution. However,
due to the complex and multidimensional data at hand,
financial fraud detection (FFD) is a difficult task.
2.1.1 Drawbacks
• No time-oriented analysis
• Lot of data
• Difficult identify and classify frauds.
2.2 PROPOSED SYSTEM
There are millions of transactions happening in a bank. Moreover bankers find
same difficulties to identify the fraudu. This system is to achieve detect fraudulent
transactions and visualize it based on the previous transactions of customers.
2.2.1 Advantages
• Detect fraudulent transactions.
• Identification and Classification of frauds.
2.3 FEASIBILITY STUDY
A feasibility study is a high level capsule version of the entire system analysis
and design process. In this phase, the feasibility of the proposed system is analyzed
by classifying the problem definition. Feasibility is to determine if it is worth doing
9
and not a burden to the company .Once the problem definition has been approved, a
logical model of the system can be developed. The search for alternate solution has
to be analyzed carefully.
Three major considerations involved in the feasibility study are,
1. Economical Feasibility
2. Operational Feasibility
3. Technical Feasibility
2.3.1 Economical Feasibility
Economical feasibility is carried out to check to economic impact that the
system is to have on the organization. It attempts to weight both the cost of
developing as well as new implementing system, against benefits that should be
obtained by having the new system in place.
A simple economic analysis in which gives the actual comparison of costs and
benefit are much more meaning in this case. In addition, this proves to be useful
project progresses’ study was conducted to find out the economical feasibility, it was
proved that the developed system is within the budget. Some of the benefits include
the absence of database and independence of restricted background which makes the
system ease to work it.
2.3.2 Operational Feasibility
The operational feasibility is carried out to check whether the system provides
product apt to the requirement. It has provided the proposed system to be beneficial
only , it could be turned into information systems that will meet the organizational
operating requirements. This test of feasibility asks if the system will work when it
is developed and installed. Few data will the operational feasibility of the project
include: Sufficient support for the project from the users, his current system which
was developed is well accepted to the user been involved in the planning and
developed of the project.
10
2.3.3 Technical Feasibility
Determining the technical feasibility is the trickiest part of the feasibility
study. This is because, there is no too many detailed study on the design of the
system but on its implementation. Its focus is mainly on the technical requirements
of the system, hence it should be noted that there is no high demands on the available
technical resources.
Different technologies involved should be analyzed properly before the
commencement of a project. Once has to be very clear about the technologies
required for the developed of the new system. The developed system must have a
modest requirement such that only minimal or null changes are required for
implementing the system.
CHAPTER 3
11
3. SYSTEM SPECIFICATION
3.1 HARDWARE SPECIFICATION
System Pentium IV 2.4 GHz
Hard Disk 40 GB
Monitor 15 VGA Color
Ram 4 GB
CHAPTER 4
12
4. SOFTWARE DESCRIPTION
4.1 FRONT END
4.1.1 Introduction to R Studio
RStudio is a free and open-source integrated development environment
(IDE) for R, a programming language for statistical computing and graphics.
RStudio was founded by JJ Allaire, creator of the programming language
ColdFusion. Hadley Wickham is the Chief Scientist at RStudio
RStudio is available in two editions: RStudio Desktop, where the program is
run locally as a regular desktop application; and RStudio Server, which allows
accessing RStudio using a web browser while it is running on a remote Linux server.
Prepackaged distributions of RStudio Desktop are available for Windows, macOS,
and Linux.
RStudio is available in open source and commercial editions and runs on the
desktop (Windows, macOS, and Linux) or in a browser connected to RStudio Server
or RStudio Server Pro (Debian, Ubuntu, Red Hat Linux, CentOS, openSUSE and
SLES).
RStudio is written in the C++ programming language and uses the Qt
framework for its graphical user interface. Work on RStudio started at around
December 2010, and the first public beta version (v0.92) was officially announced
in February 2011. Version 1.0 was released on 1 November 2016. Version 1.1 was
released on 9 October 2017.
13
Figure 4.1 R Studio
4.1.2 Introduction to Tableau
Tableau is a Business Intelligence tool for visually analyzing the data. Users
can create and distribute an interactive and shareable dashboard, which depict the
trends, variations, and density of the data in the form of graphs and charts. Tableau
can connect to files, relational and Big Data sources to acquire and process data. The
software allows data blending and real-time collaboration, which makes it very
unique. It is used by businesses, academic researchers, and many government
organizations for visual data analysis. It is also positioned as a leader Business
Intelligence and Analytics Platform in Gartner Magic Quadrant. Using R serve we
can run R scripts in tableau.
5. PROJECT DESCRIPTION
5.1 PROBLEM DEFINITION
Financial institutions handle millions of transactions from clients per
year. Although the majority part of these transactions being legitimate,
a small number of them are criminal attempts, which may cause serious
harm to customers or to the financial institutions themselves. Thus,
the trustability of each transaction has to be assessed by the institution. However,
due to the complex and multidimensional data at hand,
financial fraud detection (FFD) is a difficult task.
5.2 OVERVIEW OF THE PROJECT
We developed a prototype using the approach of visual analytics in order to enhance
the FFD techniques. We used R for automated analysis and Tableau for data
visualization techniques.
5.3 MODULES
• Data collection and pre-processing
• Data transformation
• Predictive analysis
• Evaluation and visualization
5.4 MODULE DESCRIPTION
5.4.1 Data Collection and Pre-Processing
The data of private bank contains 42 variables including key variables:
“Deposit ”, “withdrawal”, “Profession” and “gender” which are related to the first
three main information:
• Balance
• Withdrawal
15
• Customer information
The data are collected and pre-processed (removing unwanted attributes and
missing values are filled). The source dataset contains header and double quote, the
first task is to remove header and double quote.
We spitted data into month wise transaction with different parameters like
withdrawal, balance, deposits, fund transferred(transactions)
5.4.2 Data Transformation
Data transformation is the process of transforming the data into appropriate
forms for mining process. In this process the data is consolidated so that the resulting
mining process may be more efficient and the patterns found may be easier to
understand. It includes the tasks like Normalization, Attribute Construction,
Aggregation, Attribute subset selection, Discretization, Generalization.
Normalization- the attribute data is scaled so as to fall within a small specified
range, such as – 2.0 to 2.0, 0.0 to 2.0.
If a database design is not perfect, it may contain anomalies, which are like a
bad dream for any database administrator. Managing a database with anomalies is
next to impossible.
Update anomalies:
If data items are scattered and are not linked to each other properly, then it
could lead to strange situations. For example, when we try to update one data item
having its copies scattered over several places, a few instances get updated properly
while a few others are left with old values. Such instances leave the database in an
inconsistent state.
Deletion anomalies:
We tried to delete a record, but parts of it was left undeleted because of
unawareness, the data is also saved somewhere else.
Insert anomalies:
16
We tried to insert data in a record that does not exist at all. Normalization is a
method to remove all these anomalies and bring the database to a consistent state.
Attribute construction- It is a new feature. In attribute construction, the new
attributes are constructed and .added from the given set of attributes to help the
mining.
Aggregation- Aggregation is a operation that is applied to the data. It helps
to compute or combine the previous data.
Discretization- Discretization helps for dividing the range of continuous
attribute into many intervals.
Generalization- Generalization is the process of replacing the low level or
primitive data with high level concepts through the use of concept hierarchies.
5.4.3 Predictive Model Building
Decision tree are powerful algorithm for both prediction and classification.
Decision tree represents some set of rules which will be able to understand by
humans. Decision tree is a classifier in the form of tree structure. Namely
• Decision node- specifies a test on single attribute.
• Leaf node- indicates the target attribute value.
• Edge- split of one attribute.
• Path- a disjunction of test to make the final decision.
5.4.4 Evaluation and Visualization
After analyzing of the transactions of customer , for past 1 year. visual
representation is made which are the fraudulent transactions and types of frauds. and
which customer is fraud. It is done by using R-programming.
17
Figure 5.1 Visualization of data
18
5.5 DATAFLOW DIAGRAM
19
5.6 SYSTEM DESIGN
5.6.1 Use case Diagram
20
5.6.2 System Architecture
21
5.7 INPUT DESIGN
Input design has a set of attributes which has 42 instances which are in CSV
files. Each CSV file has 500 records and it has for past one year. The input of the
project is the excel content in English language where complete information about
the bike data set are given. The datum are therefore transformed and only the needed
data’s are taken for evaluation.
5.8 OUTPUT DESIGN
Output generally refers to the evaluation and visualization of the fraudulent
transactions. In this project the output is the graphical representation of fraudulent
transaction data and who is doing liable transactions.
22
CHAPTER 6
6 SYSTEM TESTING
6.1 Architecture Testing
In this stage, the tester verifies how the fast system can consume data from
various data source. Testing involves identifying different message that the queue
can process in a given time frame. It also includes how quickly data can be inserted
into underlying data store for example insertion rate into a Mongo and Cassandra
database.
It involves verifying the speed with which the queries or map reduce jobs are
executed. It also includes testing the data processing in isolation when the underlying
23
data store is populated within the data sets. For example running Map Reduce jobs
on the underlying HDFS
24
Parallel testing means testing multiple applications or subcomponents of one
application concurrently to reduce the test time. In parallel testing, tester runs two
different versions of software concurrently with same input. The aim to find out
whether the legacy system and the new system are behaving same or
differently. It ensures that the new system is capable enough to run the software
efficiently.
Parallel Testing is to make sure the new version of the application performs
correctly, to make sure the consistencies are same between new and old version, to
check if the data format between two versions has changed and to check the integrity
of the new application.
In such cases, testers need to do the parallel testing, in order to evaluate that data
migration is done successfully. Also to check whether the changes in the new version
does not affect the system function. The tester must verify that changes are executed
properly, and the user is getting the desired output as per the requirement. Parallel
Testing has two level criteria
Parallel test entry criteria define the tasks that must be satisfied before parallel
testing can be efficiently executed.
Parallel test exit criteria defines the successful conclusion of the parallel
testing stage.
25
6.5 Loop Testing
Loop testing is a White box testing. This technique is used to test loops in the
program.
• Simple loop
• Nested loop
• Concatenated loop
• Unstructured loop
26
CHAPTER 7
7. SYSTEM IMPLEMENTATION
System implementation is the stage of the project where the theoretical design
is turned out into a working system. Thus it can be considered to be the most critical
stage in achieving a successful new system and in giving the user, confidence that
the new system will work properly and will be effective.
7.1 DATA COLLECTION AND PRE-PROCESSING
The data of private bank contains 42 variables including key variables:
“Deposit ”, “withdrawal”, “Profession” and “gender” which are related to the first
three main information:
• Balance
• Withdrawal
• Customer information
The data are collected and pre-processed (removing unwanted attributes and missing
values are filled). The source dataset contains header and double quote, the first task
is to remove header and double quote.
We spitted data into month wise transaction with different parameters like
withdrawal, balance, deposits, fund transferred(transactions)
7.2 DATA TRANSFORMATION
Data transformation is the process of transforming the data into appropriate
forms for mining process. In this process the data is consolidated so that the resulting
mining process may be more efficient and the patterns found may be easier to
understand. It includes the tasks like Normalization, Attribute Construction,
Aggregation, Attribute subset selection, Discretization, Generalization.
27
Normalization- the attribute data is scaled so as to fall within a small specified
range, such as – 2.0 to 2.0, 0.0 to 2.0.
If a database design is not perfect, it may contain anomalies, which are like a
bad dream for any database administrator. Managing a database with anomalies is
next to impossible.
Update anomalies: If data items are scattered and are not linked to each other
properly, then it could lead to strange situations. For example, when we try to update
one data item having its copies scattered over several places, a few instances get
updated properly while a few others are left with old values. Such instances leave
the database in an inconsistent state.
Deletion anomalies: We tried to delete a record, but parts of it was left undeleted
because of unawareness, the data is also saved somewhere else.
Insert anomalies: We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to
a consistent state.
7.3 PREDICTIVE MODEL BUILDING
Decision tree are powerful algorithm for both prediction and classification.
Decision tree represents some set of rules which will be able to understand by
humans. Decision tree is a classifier in the form of tree structure. Namely
• Decision node- specifies a test on single attribute.
• Leaf node- indicates the target attribute value.
• Edge- split of one attribute.
• Path- a disjunction of test to make the final decision.
28
Figure 7.1 Decision tree
7.4 EVALUATION AND VISUALIZATION
After the analyses of the transaction of a customer for past 1 year visual
representation is made which are the fraudulent transactions and types of frauds. and
which customer is fraud. It is done by using R-programming.
29
CHAPTER 8
8. CONCLUSION AND FUTURE ENHANCEMENTS
8.1 CONCLUSION
This concludes the contributions to the bike FFD system and to find the
drawbacks during the predictive model building and evaluation process. This project
aims to fulfill to detect fraud transaction from a bank customer transaction dataset.
There is millions of transaction occurs in a day is to find which are fraud transaction
and who is fraudster. This project follows some objective to considering the
algorithms like decision tree algorithm and random forest algorithm.
30
RANDOM FOREST :
> dataset <- read.csv("C:/Users/silamparasan/Desktop/diabetes.csv")
> View(dataset)
> set.seed(2)
> id<-sample(2,nrow(dataset),prob = c(0.7,0.3),replace = TRUE)
> dataset_train<-dataset[id==1,]
> dataset_test<-dataset[id==1,]
> install.packages("randomForest")
> library(randomForest)
> dataset$Balance.12.<-as.factor(dataset$Balance.12.)
> dataset_train$Balance.12.<-as.factor(dataset_train$Balance.12.)
> dataset_train$Balance.12.<-as.factor(dataset_train$Balance.12.)
> bestmtry<-tuneRF(dataset_train,dataset_train$Balance.12.,stepFactor = 1.2,impr
ove = 0.01,trace = T,plot = T)
mtry = 3 OOB error = 0%
Searching left ...
Searching right ...
> library(randomForest)
> data_for<-randomForest(Balance.12.~.,data=dataset_train)
> data_for
Call:
randomForest(formula = Outcome ~ ., data = diabetes_train)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
OOB estimate of error rate: 26.5%
Confusion matrix:
31
NO YES class.error
NO 294 54 0.1551724
YES 87 97 0.4728261
> importance(dia_for)
> varImpPlot(dia_for)
> pred1_dia<-predict(dia_for,newdata = dataset_test,type = "class")
> pred1_data
> library(caret)
>confusionMatrix(table(pred1_data,ddataset_test$Balance.12.))
Confusion Matrix and Statistics
pred1_data NO YES
NO 25 0
YES 0 43
Accuracy : 0.9855
95% CI : (0.9931, 1)
No Information Rate : 0.6541
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 1
Mcnemar's Test P-Value : NA
Sensitivity : 1.0000
Specificity : 1.0000
Pos Pred Value : 1.0000
Neg Pred Value : 1.0000
Prevalence : 0.6541
Detection Rate : 0.6541
Detection Prevalence : 0.6541
Balanced Accuracy : 1.00
32
'Positive' Class : NO
DECISION TREE:
> dataset <- read.csv("C:/Users/silamparasan/Desktop/dataset.csv")
> View(dataset)
> set.seed(3)
> id<-sample(2,nrow(dataset),prob = c(0.7,0.3),replace = TRUE)
> dataset_train<-dataset[id==1,]
> dataset_test<-dataset[id==1,]
> library(rpart)
> nrow(dataset)
[1] 100
> nrow(dataset_test)
[1] 69
> nrow(dataset_train)
[1] 69
> colnames(dataset)
[1] "Account.nmuber" "Name" "Surname" "Gender"
[5] "Job.Classification" "Opening.Balance" "X01.Jan.18" "Withdraw..1."
[9] "Balance.1." "X01.Feb.18" "Withdraw.2." "Balance.2."
[13] "X01.Mar.18" "Withdraw.3." "Balance.3." "X01.Apr.18"
[17] "Withdraw.4." "Balance.4." "X01.May.18" "Withdraw.5."
[21] "Balance.5." "X01.Jun.18" "Withdraw.6." "Balance.6."
[25] "X01.Jul.18" "Withdraw.7." "Balance.7." "X01.Aug.18"
[29] "Withdraw.8." "Balance.8." "X01.Sep.18" "Withdraw.9."
[33] "Balance.9." "X01.Oct.18" "Withdraw.10." "Balance.10."
[37] "X01.Nov.18" "Withdraw.11." "Balance.11." "X01.Dec.18"
33
[41] "Withdraw.12." "Balance.12." "loan"
> dataset_model<-rpart(loan~.,data=dataset_train)
> plot(dataset_model,margin = 0.1)
> text(dataset_model,use.n = TRUE)
Warning message:
In labels.rpart(x, minlength = minlength) :
more than 52 levels in a predicting factor, truncated for printout
> text(dataset_model,use.n = TRUE,pretty = TRUE,cex = 0.8)
> library(caret)
Loading required package: lattice
Loading required package: ggplot2
Warning message:
package ‘ggplot2’ was built under R version 3.4.4
> library(lattice)
> pred_data<-predict(dataset_model,newdata = dataset_test,type = "class")
> pred_data
1 3 4 5 6 7 8 9 10 11 12 13 14 17 20 21 22 23 24 25 27 29 31
yes yes yes no no no yes yes yes yes yes yes yes yes no no no no no no no n
o no
32 33 34 35 36 38 39 40 41 43 44 45 46 47 48 49 51 52 57 58 59 60
62
yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes no no no no no
no yes
64 66 68 69 70 75 76 77 78 82 83 84 87 88 89 91 92 93 96 97 98 99 1
00
yes yes no yes no yes yes yes yes no no no no no no yes yes yes yes yes yes y
es yes
34
Levels: no yes
> library(caret)
> confusionMatrix(table(pred_data,dataset_test$loan))
Confusion Matrix and Statistics
pred_data no yes
no 25 1
yes 0 43
Accuracy : 0.9855
95% CI : (0.9219, 0.9996)
No Information Rate : 0.6377
P-Value [Acc > NIR] : 1.324e-12
Kappa : 0.9689
Mcnemar's Test P-Value : 1
Sensitivity : 1.0000
Specificity : 0.9773
Pos Pred Value : 0.9615
Neg Pred Value : 1.0000
Prevalence : 0.3623
Detection Rate : 0.3623
Detection Prevalence : 0.3768
Balanced Accuracy : 0.9886
'Positive' Class : no
APPENDIX 2
SCREENSHORT
35
DATASET
R STUDIO DESKTOP
36
Figure A2.2 Rstudio desktop
IMPORT DATA
37
Figure A2.2 Rstudio desktop
EXECUTION PART
38
Figure A2.4 Execution part
39
Figure A2.5 Generate decision tree
STATISTICAL REPORT
40
Figure A2.6 Statistical report
41
Figure A2.7 Tableau-import data
EXTRACTING DATA
42
Figure A2.8 Extracting data
ANALYTICS PART
43
Figure A2.9 Analytics part
44
Figure A2.10 Detection fraudulent events
JOB CLASSIFICATION
45
Figure A2.11 Job classification
REFERENCES
46
[1] Aigner.w, Miksch, Schumann.h, and Tominski.C. Visualization of
time-oriented data. Springer Science & Business Media, 2011.
49
(EuroRV3). The EurographicsAssociation, 2015. doi: 10.2312/eurorv3.20151146
50