0% found this document useful (0 votes)
22 views5 pages

Fuzzy Computing Applications For Anti Mo

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views5 pages

Fuzzy Computing Applications For Anti Mo

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1

Fuzzy Computing Applications for Anti-Money


Laundering and Distributed Storage System Load
Monitoring
Yu-To Chen, Johan Mathe
Google, Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA
{ytchen, johmathe}@google.com

Abstract—Fuzzy computing (FC) has made a great impact in to represent vague concepts [3]. Most importantly, Zadeh
capturing human domain knowledge and modeling non-linear offered a complete theory of fuzzy sets and fuzzy logic in 1965
mapping of input-output space. In this paper, we describe the [4], which enabled us to represent and manipulate ill-defined
design and implementation of FC systems for detection of money
laundering behaviors in financial transactions and monitoring of concepts. In addition, Zadeh defined fuzzy logic’s four facets
distributed storage system load. Our objective is to demonstrate [5], which provided us a language with syntax and semantics
the power of FC for real-world applications which are char- for computation. In particular, fuzzy logic allows us to use
acterized by imprecise, uncertain data, and incomplete domain linguistic variables to model dynamic systems by a set of
knowledge. For both applications, we designed fuzzy rules based fuzzy rules. Each rule consists of a set of linguistic variables.
on experts’ domain knowledge, depending on money laundering
scenarios in transactions or the “health” of a distributed storage These variables take fuzzy values, which are characterized by
system. In addition, we developped a generic fuzzy inference fuzzy membership functions. In addition, there is a reasoning
engine and contributed to the open source community. mechanism, fuzzy inference engine, which operates on the
fuzzy rules based on the generalized modus-ponens [6]. A
comprehensive review of fuzzy logic and fuzzy computing can
I. I NTRODUCTION be found in [7].
A. Motivation
There is a wide variety of industrial and financial problems
which require the analysis of uncertain and incomplete infor- C. Paper Structure
mation. To make matters worse, data used for the analysis are
often imprecise. These problems present a great opportunity In the next section we will focus on FC and their applica-
for the application of fuzzy computing technologies. tions in monitoring and detection. After a brief discussion of
In a distributed storage system, it is imperative to understand the problem of monitoring and detection in Section II, we
the usage patterns such as memory and CPU, so as to better will illustrate two applications of FC techniques. The first
load balance and avoid bottleneck. This can be accomplished application, described in Section III, consists in the adaptation
by using a tool which measures the state of the system of fuzzy rules in monitoring of distributed storage systems.
and reports the overall “healthiness” of the system. Such The second application, illustrated in Section IV, covers the
a tool ought to have a high level of sophistication which use of fuzzy logic inference to detect anomalies in money
incorporates real-time monitoring and decision-making. In a laundering. In the last section V we summarize the advantages
financial institution, we can still find labor-intensive tasks of using FC and discuss some potential extensions of these
such as review of suspicious account activities based on alerts technologies.
triggered by a rule-based system with hard-coded cutoffs. Due
to the complexity of these tasks, artificial intelligence (AI) and
in particular, fuzzy computing has been called upon in support II. F UZZY C OMPUTING A PPLICATIONS FOR M ONITORING
of monitoring of a distributed storage system and detection AND D ETECTION
of money laundering transactions. This paper focuses on the
use of fuzzy computing on these two aspects: monitoring and In this paper, we studies two fuzzy computing applications
detection. For starters, we will give a brief overview of fuzzy in the areas of monitoring and detection. They are fuzzy
computing in the next section. load monitoring for a distributed storage systems and fuzzy
anti-money laundering for a financial institution. In general,
monitoring is the first step to understand any complex real-
B. Fuzzy Computing world system. If certain undesired cases or scenarios were
The work of Post, Kleene and Lukasiewicz were among the discovered during the monitoring process, the next logic step
first treatment of imprecision and vagueness in multiple-valued would be to detect the recurring patterns, and subsequently to
logic systems as oppose to the classical Boolean logic [1], [2]. try to isolate the issues. Finally, a control strategy could be
In 1937, Max Black proposed the use of a consistency profile formalized to manage the problems.
2

Table I
F UZZY RULES FOR LOAD MONITORING
Memory\CPU LOW AVG HIGH
LOW HVG HG HB
AVG HG HG HB
HIGH HB HB HB

III. F UZZY L OAD M ONITORING FOR DISTRIBUTED


STORAGE SYSTEMS
A. Problem Description Figure 1. Monitoring system data flow
Google File System [8] is a distributed file system. A GFS
cluster consists of a single master and multiple chunkservers
(nodes) and is accessed by multiple clients. Different GFS or pager, depending on the severity of the event. This ap-
clusters can have very different usage patterns, and these usage proach, when applied to massive distributed systems, requires
patterns are not always correlated with the bytes usage of the an extremely deep understanding of the underlying system,
cluster. For instance, a cluster used as a backup storage will and in the case of correlated events, leaves all interpretation
have a very strong bytes usage with a very low traffic, but a to the operator. The fuzzy load inference system is able to
GFS cluster serving live traffic will see a very large throughput give an higher level view of the distributed system.
without necessarily having an important amount of bytes used.
Even though GFS has been designed to avoid the master D. Solution Description
being a bottleneck[8] , the CPU load and memory usage of
In order to integrate a fuzzy inference system into google’s
the master still increases as the traffic and the number of
monitoring infrastructure, we had to write a production quality
chunkservers in the cluster increases.
inference system in Python[17], that we called GFuzzy. The
Our goal is to use a fuzzy inference system that will give
rules are defined via protocol buffers – Google’s data inter-
us a good idea of the state of the GFS masters for multiple
change format [10].
purposes:
The python inference engine provides different fuzzification
• Provisioning: if the GFS master can be upgraded with
and deffuzification functions [17]. We used trianglular fuzzy
better hardware if possible sets for the fuzzification and the centroid method for deffuzi-
• Re routing users: if some big users can be moved to other
fication.
clusters The workflow is the following, as also described in figure
1:
B. Approach • A monitoring system collects data from multiple GFS
We noticed that during the on-call rotation of a system clusters
administrator, there is a usually lot of qualitative thinking • The fuzzy inference system polls the data from the
involved. Fuzzy logic here mimics this qualitative thinking to monitoring system, applies the fuzzy rules, and produces
allow faster incident response or capacity planning regarding an output
the GFS masters. • This output is then itself collected by another monitoring
After gathering the expert knowledge of various system system
administrators on the topic, here are some of the following • Alerts are sent to system administrators and capacity
fuzzy sets we found: planners when values go over a certain threshold.
• Memory usage high, memory usage low, memory usage
average
• CPU usage high, CPU usage low, CPU usage average 1) Inputs pre-processing: Before converting monitoring
• GFS Master Latency is high data into the fuzzy set, we average the data by using a gaussian
• Health is very good (HVG), health is Good (HG), health window on the last N days of data. This has the effect to
is bad (HB) smooth data spikes and filter out high frequency noise.
Rules regarding CPU and Memory have been gathered in table
I. E. Results
The system has been running for one year, showed some
Then we use one more rule related to the system latency:
initial false negative and positives. Our methodology for
• If RPC latency high then health is bad
finding these was to compare the decisions made by the on-
call engineer and the results of the fuzzy inference system. We
C. Prior Work performed the following improvements in the first iterations:
Most network monitoring systems [9] today work by com- • The initial implementation only contained RAM and CPU
paring measurements from hosts (such as latency, CPU usage, related fuzzy sets. We added the latency set because these
error rates etc.) to particular thresholds, and alerting via email are very relevant to the health of the GFS masters.
3

• We increased the size of the window and switched from of "deliciousness", so that it can develop an inference engine
a rectangular window to a gaussian window because the based on these linguistic terms later on.
rectangular window was too sensible to high frequency We began the fuzzy AML work by assuming a hypothetical
noise. financial corporation, XCorp, whose business involves servic-
• We tweaked some of our membership functions over time, ing clients for sending and receiving money via the Internet. A
by looking at our false positive/negative, every time by significant body of knowledge in Money Laundering (ML) was
asking the oncall engineer what were the thresholds that amassed via knowledge engineering. In practice, it was done
made him decide which data was the root cause of the through systematic identification of suspicious transactions by
problem regarding the GFS masters. customer service representatives, labor-intensive case studies
by analysts, and finally, generalization of ML indicators by
both aforementioned parties. The followings showed the main
F. Conclusions on Fuzzy Load Monitoring
data sources:
We successfuly designed, implemented and deployed a • From existing fraud models, which isolate fraudsters who
fuzzy inference engine for monitoring the masters of a dis- might be ML risk as well
tributed file system. Future work involves measuring the nodes • From customer service representatives who view accounts
(chunkservers [8]) of the file system, and using health of the and are in contact with customers
master and the nodes to automatically determine available • From analysts who conduct web searches looking for
capacity and availability of the GFS cluster. questionable websites accepting/offering XCorp pay-
ments
IV. F UZZY D ETECTION FOR A NTI M ONEY L AUNDERING • From customers or third parties who become aware
of suspicious activity, such as when customers report
A. Problem Description
account takeover
Anti-money laundering (AML) was implemented in the U.S. A case study was shown as follows to highlight a real-world
by the Bank Secrecy Act of 1970. AML refers to the legal example:
controls that require financial institutions and other regulated
• Suspect, using a UK postal address developed a pattern of
entities to prevent or report money laundering activities [11].
creating XCorp accounts, receiving funds, sending funds
According to Wikepedia [12], AML is a term “mainly used in
to several French accounts, funds were then withdrawn
the financial and legal industries to describe the legal controls
to bank accounts. Receiving and sending accounts were
that require financial institutions and other regulated entities
closed after transactions were made. 25 accounts were
to prevent or report money laundering activities.” In US laws,
identified, with a total transaction amount involved of
money laundering includes all financial transaction generating
$32K. Sending and recipient accounts shared IPs and
an asset or a value as the result of an illegal act. For example,
machine fingerprints.
tax evasion and false accounting. All the financial institutions
are required to identify transactions of a suspicious nature and From the case study, a number of ML indicators was gener-
report to the financial intelligence unit in their country [13]. alized:
One popular approach [14], [15] is to apply scenarios and • Multiple accounts controlled by one party

risk factors to transactions to detect potentially suspicious • Lots of account activities within a short time window,

activity. Transactional events that meet the rule parameters such as opening - sending - closing and opening -
become alerts. Finally, alerts are subject to additional work- receiving - withdrawing - closing
flow processes, such as suppression, risk scoring and routing. • No viable business reasons

In essence, it’s a rule-based system and depends heavily


on human domain knowledge. As well documented in the C. Prior Work
literature, a rule-based system suffers from rigid, brittle and There is no significant prior work on fuzzy anti-money laun-
inflexible rule conditions. With all the benefits coming with dering (AML) in academic journals and proceedings. However,
superior human cognitive capability, it inherits the imprecise there are a number of commercial software for AML, such as
nature of human linguistic expressions. SAS, Actimize, tellemetrix, just to name a few. As discussed,
one popular approach is to apply risk factors to transactions
B. Approach to detect potentially suspicious activity via scenarios and case
studies. Alerts will be issued if certain rate limiting criteria
Fuzzy logic comes to rescue. It is invented for such purpose
are met.
as to deal with impression in human’s nature language. For
instance, we intuitively understands statements like "the food
is delicious" and "the service is excellent." In addition, we D. Solution Description
can even make up rules such as "if the food is delicious and Our objective was to build a fuzzy inference system for
the service is excellent, then the tip would be handsome." AML. For demonstration, we would describe an AML sce-
All the linguistic terms such as "delicious", "excellent" and nario, then translated it into fuzzy rules. After that, we would
"handsome" are fuzzy in nature. Your deliciousness possibly outline the fuzzy system’s membership functions. Finally,
is not the same as mine. However, fuzzy logic defines degree results of fuzzy inference were summarized.
4

1) Fuzzy rules : The AML scenario is as follows. An


account is suspicious if
• large value received (>= $10,000)
• immediate withdraw (within 1-3 days)

Let’s label this scenario as AML1. Therefore, we were aiming


to design a fuzzy AML scoring, which takes amount received
and a match score (the degree of match for $ received and $
withdrawn) as input, and gives AML1 score as output.
• AML1 <- fuzzy.inference(amt.received, match.score)

The idea was to come up with a score from amount received Figure 2. Absolute change percentage vs match score
and match score. For instance: Table II
• AML1 score is very high, if amount received is big and AML SCORE FOR EACH OF THE 9 FUZZY RULES
match score is high Amount Rcvd\Match score S M B
• AML1 score is very low, if amount received is small and L M H VH
M L M H
match score is low
S VL L M
Therefore, the AML1 score is a real number bounded by
[0,1],which represents the possibility of the account is a ML
violation. were constructed for the inference, as shown in table II. For
Note that the match score represented the degree of match instance:
for $ received and $ withdrawn in a time window. In essence, • If amt received is big and match score is high, then AML1
the score was a function of three parameters: amount received, is very high
amount withdrawn and time window. For instance • If amt received is medium and match score is medium,
• Match score is high, if $ withdrawn is within [80%, then AML1 is medium
120%] of $ received • If amt received is small and match score is low, then
• Match score is high, if $ withdrawn is within [80%, AML1 is very low
120%] of $ received
• Match score is high, if $ withdrawn immediately after $ 2) Fuzzy membership functions: As described in the last
received section, there were three fuzzy sets: amt received, match
• Match score is zero, if $ withdrawn after 3 days $ score and AML1, where amt received and match score were
received antecedent, and AML1 was consequence. Their membership
• Match score is moderate, if $ withdrawn within 1-3 days functions were defined as shown in figures 3, 4 and 5,
after $ received respectively.
In essence, match score is a real number bounded by [0,1].
The higher the score, the closer the match of the two dollar
amount
� a−b � within 1-3 days. Let’s define a difference measure as
b , and label it as “absolute % change,” where a and b are
� �
the $ withdrawn and received in a time window, respectively.
Three levels of absolute % change scores were calculated as
there were three different time windows: 1-, 2- and 3-day. E. Experimental Results
In addition, a “match factor” was defined and it took integer We tested the fuzzy AML1 scoring on some sampled ac-
values in {1,2,3}, representing three levels of time windows. counts and their transactions. Of the 710 accounts got scored,
• Match.score = 0.9(1-abs.%.change/0.2)(match.factor-1), the median AML1 was 0.55. 73 accounts whose AML1 score
if abs.%.change <= 0.2 > 0.8 were routed to a AML queue for human reviewing. The
• Match.score = 0, if abs.%.change > 0.2 queue would be worked in descending order of AML1 scores.
Essentially, the score got 10% and 19% discount, if $ with-
drawn in 2 and 3 days, respectively. Refer to figure 2 for
details.

Assume that the AML1 score is a function taking two


arguments - amount received and match score. Further, the
domain of both arguments is a real number bounded in [0,1].
The idea for fuzzy scoring is to segment the space into disjoint
sub-regions with descending average AML1 scores first, then
use fuzzy inference to smooth out/make interpolations of
scores from region to region. For AML1, nine fuzzy rules Figure 3. Amount received membership functions
5

After that, we described the design and development of a fuzzy


inference system (FIS), which takes as input the transactions,
and gives as output the scores of suspicious ML behaviors.
The FIS provides an intuitive and robust way to combat ML,
while fulfills the obligations for financial institutions set by the
authority. Future work will focus on automatic tuning of fuzzy
rules and membership functions based the misclassification
rates and human agent’s feedback.

V. F INAL R EMARKS
Figure 4. Math score membership functions Fuzzy computing (FC) is having an impact on many indus-
trial and financial operations, from monitoring and predictive
modeling to diagnostics and control [16]. It provides us
with alternative approaches to traditional knowledge-driven
reasoning systems and it overcomes their main flaws in
the rigidness of the rule structure. We have demonstrated
two successful real-world deployments of FC applications in
monitoring and detection. In particular, we described how to
monitor the healthiness of a distributed file system and how to
detect the suspicious transactions in money laundering. Both
systems leverage the tolerance for imprecision, uncertainty and
incompletness, which is the hallmark of the problems to be
solved. In addition, we developed a generic fuzzy inference
Figure 5. AML1 membership functions engine and contributed to the open source community[17]. In
the future, we expect the combination of fuzzy computing with
advances in probabilistic reasoning, voice recognition, text
For demonstration, two suspicious accounts were described as processing and computer vision, etc., will further improve and
follow. expand our problem-solving capability for a large spectrum of
For row 1 in table III, its $ received > $10K (Big) and industrial and financial problems.
final match score = 1 (Large), so its AML1 score = 1 (Very
Large). In this example, its initial MatchScore =1 since its R EFERENCES
absolute % change = 0 ($ received = $ withdrawn). In addition, [1] N. Rescher, Many-valued Logic, McGraw-Hill, New York, NY, 1969.
MatchFactor = 1 since all the withdrawals logged in a day [2] J. Lukasiewicz, Elementy Logiki Matematycznej Elements of
after $ received. Therefore there is no discount for the initial Mathematical Logic, Warsaw, Poland: Panstowowe Wydawinctow
Naukowe,1929.
MatchScore. As a result, the final MatchScore is the same as [3] M. Black, Vaguenes: an Exercise in Logical Analysis, Phil.Sci. vol. 4.,
the initial one. pp-427-455, 1937.
For row 2 in the table III, its $ received > 10K (Big) [4] L.A. Zadeh, Fuzzy sets, Information and Control, vol. 8, pp.338-353,
1965.
and final match score = 0.83 (Large), so its AML1 score [5] L.A. Zadeh, Foreword, in Handbook of Fuzzy Computation, E.H.
= 0.92 (Large). In this example, its absolute percentage of Ruspini, P.P. Bonissone, and W. Pedycz, Eds., Bristol, UK: Institute
change change = 1.5% ($ received > $ withdrawn), hence of Physics, 1998.
[6] Y-M. Pok and J-X. Xu, Why is Fuzzy Control Robust, in Proc. Third
its initial MatchScore = 0.92. However, all the withdrawals IEEE Intl. Conf. on Fuzzy Systems (FUZZ-IEEE’94), pp. 1018-1022,
logged within 2 days after $ received, hence its MatchFactor Orlando, FL, 1994.
= 2. Thus it implied that there was a 10% discount to the [7] E.H. Ruspini, P.P. Bonissone, and W. Pedycz, Handbook of Fuzzy
Computation, Bristol, UK: Institute of Physics, 1998.
initial MatchScore. As a result, the final MatchScore = 0.83 [8] Ghemawat, S., Gobioff, H., And Leung, S.-T. The Google file system,
(= 0.92 × 0.9). In Proc. of the 19th ACM SOSP (Dec. 2003), pp. 29-43.
[9] Wikipedia, Network monitoring,
http://en.wikipedia.org/wiki/Network_monitoring.
[10] Google, Protocol Buffers - Googleś data interchange format,
F. Conclusions on Fuzzy Detection http://code.google.com/p/protobuf/ .
We have presented an approach that uses fuzzy computing [11] Paul Allan Schott Reference Guide to Anti-Money Laundering and
Combating the Financing of Terrorism, World Bank, 2006.
to detect money laundering (ML) patterns in complex financial [12] Wikipedia, Anti Money Laundering, http://en.wikipedia.org/wiki/Anti-
transactions. We showed the process of knowledge engineering money_laundering
for intelligence gathering and understanding of patterns of ML. [13] Jackie Harvey, (2005) An evaluation of money laundering policies,
Journal of Money Laundering Control, Vol. 8 Iss: 4, pp.339 - 345.
[14] SAS Anti Money Laundering , http://www.sas.com/industry/financial-
Table III services/banking/anti-money-laundering
E XPERIMENTAL RESULTS FOR AML [15] Kingdon, J., AI fights money laundering, Intelligent Systems, IEEE,
Vol. 19, Issue 3, May-Jun 2004, pp.87 - 89.
Act $ recv #wtxn $ wtx match %change MScore1 MScore2 AML1Score
[16] Bonissone et al. Hybrid Soft Computing Systems: Industrial and
1 $35,306 2 $35,306 1 0.0% 1.00 1.00 1.00 Commercial Applications, Proceedings of the IEEE, 1999.
8 $25,713 3 $25,326 2 1.5% 0.92 0.83 0.92 [17] GFuzzy, http://code.google.com/p/gfuzzy/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy