GODSight FYP FinalReport
GODSight FYP FinalReport
&
By
Group GodSight
190094B - Abegunawardhana U. K. K. P.
190093T - Bodaragama D. B.
190478E - Pulle D. M. P.
ii
CONTENTS
1 INTRODUCTION 1
2 PROBLEM STATEMENT 3
2.1 Phase 1: On-Chain Analytics Platform . . . . . . . . . . . . . . . . . . . 3
2.2 Phase 2: Anomaly Detection System . . . . . . . . . . . . . . . . . . . . 3
3 MOTIVATION 4
4 LITERATURE REVIEW 5
4.1 Data Extraction from Blockchain Networks . . . . . . . . . . . . . . . . . 5
4.2 On-Chain Analysis Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 On-chain Analysis Platforms/Applications . . . . . . . . . . . . . . . . . 7
4.4 Blockchain Anomaly Analysis . . . . . . . . . . . . . . . . . . . . . . . . 7
4.4.1 ANOMALY DETECTION WITH UNSUPERVISED LEARNING 8
4.4.2 GRAPH-BASED APPROACHES FOR ANOMALY DETECTION 8
4.4.3 STATISTICAL APPROACHES . . . . . . . . . . . . . . . . . . . 9
4.4.4 TREE-BASED MODELS . . . . . . . . . . . . . . . . . . . . . . 9
4.4.5 NEURAL NETWORKS IN TABULAR MODELS . . . . . . . . . 9
5 RESEARCH OBJECTIVES 11
5.1 Development of an Extendable Open-Source On-Chain Analysis Frame-
work for Blockchain Networks . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Creation of an Accurate Anomaly Detection Model for Blockchain Trans-
actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6 METHODOLOGY 12
6.1 Open-Source SaaS Platform . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.1.1 Metrics Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.1.1.1 Metric Categorization . . . . . . . . . . . . . . . . . . . 12
6.1.1.2 Metrics Formulation . . . . . . . . . . . . . . . . . . . . 14
6.1.2 System Design & Architecture . . . . . . . . . . . . . . . . . . . . 21
6.1.2.1 Framework Architecture . . . . . . . . . . . . . . . . . . 21
6.1.2.2 Extendibility . . . . . . . . . . . . . . . . . . . . . . . . 23
6.1.2.3 Customizability . . . . . . . . . . . . . . . . . . . . . . . 24
6.2 Bitcoin Anomaly Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 25
iii
6.2.1 BABD-13: DATASET OVERVIEW . . . . . . . . . . . . . . . . . 25
6.2.2 TABULAR-BASED BITCOIN ANOMALY DATASET CREATION 27
6.2.2.1 Tabular-Based Dataset Approach . . . . . . . . . . . . . 27
6.2.2.2 Statistical Feature Extraction . . . . . . . . . . . . . . . 27
6.2.2.3 Importance of Transaction Time and Value . . . . . . . 27
6.2.2.4 Differentiation Between Input and Output Transaction
Features . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2.2.5 Cross-Featuring Between Input and Output Transactions 28
6.2.2.6 Overview of Newly Created Features . . . . . . . . . . . 28
6.2.3 MACHINE LEARNING WORKFLOW . . . . . . . . . . . . . . . 28
6.2.3.1 Data Cleaning . . . . . . . . . . . . . . . . . . . . . . . 28
6.2.3.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . 28
6.2.3.3 Handling Data Imbalance . . . . . . . . . . . . . . . . . 29
6.2.3.4 Data Standardization . . . . . . . . . . . . . . . . . . . 29
6.2.3.5 Principal Component Analysis (PCA) . . . . . . . . . . 29
6.2.4 MODEL TRAINING AND ARCHITECTURE . . . . . . . . . . . 29
6.2.4.1 Tree-Based Models and Tabular NN Models . . . . . . . 30
6.2.4.2 Modified TabNet Model Architecture . . . . . . . . . . . 30
6.2.4.3 Introduction of 1D CNN Feature Extraction . . . . . . . 31
6.2.5 EVALUATION METRICS . . . . . . . . . . . . . . . . . . . . . . 31
6.2.5.1 Imbalanced Test Set Description . . . . . . . . . . . . . 31
6.2.5.2 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
iv
7.2.1 Created Dataset Details . . . . . . . . . . . . . . . . . . . . . . . 40
7.2.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.2.3 Data Pre-Processing . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.2.4 Model Training and Results Analysis . . . . . . . . . . . . . . . . 44
7.2.4.1 Results Overview . . . . . . . . . . . . . . . . . . . . . . 44
7.2.4.2 Analysis and Interpretation . . . . . . . . . . . . . . . . 44
8 CONCLUSION 46
v
LIST OF FIGURES
vi
LIST OF TABLES
vii
Chapter 1
INTRODUCTION
1
a transparent view of its inner workings, thereby fostering a collaborative ecosystem for
on-chain analysis. For the predictive analysis aspect, we delve into anomaly detection
within blockchain data, particularly emphasizing Bitcoin due to its global recognition
and established na- ture. Traditional methods have employed unsupervised learning tech-
niques for detect- ing irregularities in Bitcoin transactions [12]. However, our approach is
anchored on the BABD-13 dataset, which provides 13 types of labels for Bitcoin addresses
[13]. Our strat- egy aims to harness statistical methodologies, focusing on temporal pat-
terns and trans- actional values for multi-class classification. In essence, our project is a
testament to our commitment to advancing the blockchain ecosystem comprehensively.
As networks like Avalanche and Bitcoin continue to grow, our tools aim to evolve along-
side, offering users a full spectrum of insights into on-chain data and anomalies.
2
Chapter 2
PROBLEM STATEMENT
As Avalanche and Bitcoin networks swell in transactional volume, they inevitably become
targets for malicious actors. Traditional financial systems have established anomaly de-
tection mechanisms. However, the decentralized, global nature of blockchains brings
unique challenges. While analytics can offer insights, they don’t inherently guarantee the
authenticity of transactions.
Identifying irregularities or anomalous patterns amid the sea of legitimate transac-
tions is paramount. However, without specialized tools, this task is akin to navigating a
maze blindfolded. Current mechanisms either generate too many false positives or miss
subtle, sophisticated anomalies—both scenarios being undesirable. The need is for an
intelligent, learning system that evolves with the network, ensuring threats are identified
and addressed promptly.
3
Chapter 3
MOTIVATION
The blockchain landscape, led by pioneers like Avalanche and Bitcoin, is akin to a vast
ocean, teeming with data that holds the promise of transformative insights. As businesses,
researchers, and enthusiasts sail these waters, they often seek navigational aids - tools
that can help them decipher this data, transforming it into actionable intelligence.
At the forefront of this quest is the need for a robust on-chain analysis platform. How-
ever, the dynamic nature of the blockchain world demands more than just a static tool.
It calls for a framework - an adaptable foundation upon which diverse analytical tools
can be built, modified, and enhanced. Such a platform’s power isn’t just in its current
capabilities but in its potential to evolve, grow, and adapt. This is where extendability
emerges as a pivotal feature.
Consider the burgeoning realm of carbon trading, where blockchain can play a pivotal
role in tracking, verifying, and trading carbon credits. As this domain expands, new
metrics, insights, and data sources will emerge. An extendable on-chain analysis platform
can seamlessly integrate these new elements, ensuring it remains relevant and valuable.
But the power of extendability is truly unlocked when coupled with the principle
of open-source. By being open-source, the platform invites collaboration, innovation,
and enhancement from the global community. It’s not just a tool but a living entity,
continuously refining itself through collective wisdom.
Beyond the Avalanche ecosystem, the vast Bitcoin network holds its own set of chal-
lenges. With millions of addresses conducting transactions, there’s a pressing need to
understand the nature of these addresses. This is not just about transactional data but
about classifying addresses based on their behavior patterns. The aim? To develop an
advanced Bitcoin address classification model. Such a model can offer insights into ad-
dress categories, their transactional behaviors, and more, adding another layer of depth
to on-chain analysis.
In essence, our motivation is dual-pronged: to craft an extendable, open-source on-
chain analysis framework tailored for Avalanche and to delve deep into the Bitcoin net-
work, classifying its myriad addresses. Both endeavors, though distinct, resonate with
a shared vision - harnessing the power of blockchain data, making it accessible, under-
standable, and above all, insightful.
4
Chapter 4
LITERATURE REVIEW
Data extraction from blockchain networks, especially from leading cryptocurrencies such
as Bitcoin, Ethereum, and Avalanche, has become a pivotal research area. The transpar-
ent and immutable nature of blockchain ledgers offers a plethora of data. When effectively
extracted, this data can provide invaluable insights into transaction patterns, anoma-
lies, and potential illicit activities. In the study titled "DataEther: Data Exploration
Framework For Ethereum," the researchers introduced a systematic and high-fidelity
data exploration framework for Ethereum. They exploited its internal mechanisms and
instrumented an Ethereum full node [14]. The proposed method, DataEther, acquires
all blocks, transactions, execution traces, and smart contracts from Ethereum. This ap-
proach overcomes the limitations of existing methods, such as incomplete data, confusing
information, and inefficiency. According to the authors, several recent studies have made
intriguing observations by examining Ethereum data. However, these studies have certain
limitations in their methodologies for data acquisition. These methods can be broadly
categorized into four types:
Each of these methods has its own set of constraints and may not provide a com-
prehensive view of the Ethereum ecosystem. Another research titled "Understanding
Ethereum via Graph Analysis" focused on collecting all accounts and transactions from
the launch of Ethereum until November 1, 2018 [15]. They emphasized the complexi-
ties involved in extracting multifaceted transaction data from blockchain networks. The
researchers utilized a dual-pronged data extraction strategy, synchronizing all histori-
cal transaction data using the Ethereum client and concurrently tapping into Etherscan
APIs. In "Predicting Bitcoin Returns Using High-Dimensional Technical Indicators," the
authors used a BTC-USD dataset from investing.com, which included daily open, high,
low, and close prices of bitcoin from January 1st, 2012 to December 29th, 2017 [16].
5
They divided the dataset into training and test samples, emphasizing the importance
of robust data extraction methodologies. The paper "An On-Chain Analysis-Based Ap-
proach to Predict Ethereum Prices" gathered data from the public Ethereum blockchain
and online resources’ APIs from 2016 through 2021 [17]. They analyzed metrics with the
Ethereum price using on-chain data, aiming to provide a broader overview by incorporat-
ing on-chain metrics relating to miners, users, and exchange activity and their possible
impact on Ethereum pricing. The study "Anomaly Detection Model Over Blockchain
Electronic Transactions" revolved around the extraction of Bitcoin transaction data from
https://www.blockchain.com/charts [18]. The research "Transaction-based classification
and detection approach for Ethereum smart contract" showcased a dual-pronged data
extraction strategy [19]. They synchronized all historical transaction data using the
Ethereum client and concurrently tapped into Etherscan APIs. This dual method ensured
the capture of a vast spectrum of data, revealing the multifaceted nature of Ethereum’s
smart contract transactions. In conclusion, data extraction from blockchain networks,
especially from Bitcoin, Ethereum, and Avalanche, has seen varied methodologies. From
using public APIs to setting up dedicated nodes, researchers have adopted multiple strate-
gies to ensure comprehensive and accurate data retrieval.
6
transaction data, block data, and smart contract code. The historical significance
of on- chain analysis, tracing its roots back to 2011, was also highlighted. Several key
indicators, such as the Network Value-to-Transaction (NVT) ratio and the UTXO (un-
spent transac- tion outputs), were introduced, which have been instrumental in gauging
the value and movement of cryptocurrencies. In conclusion, on-chain analysis metrics
have proven to be invaluable tools in the cryptocurrency research landscape. By offering
a comprehen- sive view of transactions and patterns within blockchain networks, these
metrics provide researchers and investors with insights that can guide decision-making
and predict future market movements.
7
4.4.1 ANOMALY DETECTION WITH UNSUPERVISED LEARNING
Graph-based models are particularly adept at analyzing the complex relationships be-
tween entities in blockchain networks. Xiang et al. [13] introduced the Bitcoin Address
Behavior Dataset (BABD-13), which classifies Bitcoin addresses as either illicit or non-
illicit based on transaction patterns. They created a graph representation of Bitcoin
transactions, extracting features like node degree and clustering coefficient to distinguish
different types of behavior. This dataset includes 13 categories, each representing a par-
ticular type of crime, offering a comprehensive resource to understand the behavioral
traits of different addresses.
Akcora et al. [29] developed heuristic graph-based models to pinpoint high-risk Bit-
coin network transactions by combining structural and temporal features. They utilized
temporal patterns to track how ransomware-related activities occur on the blockchain.
Their innovative method, aptly called "Bitcoin Heist," distinguished ransomware trans-
actions from ordinary ones, providing a unique framework for detecting anomalies.
Weber et al. [30] explored graph embeddings to classify Bitcoin transactions, testing
graph convolutional networks to capture the structural properties of transaction graphs.
Their method used embedding techniques to develop features representing the graph
structure, providing a more sophisticated analysis of transaction networks. Their re-
search significantly improved financial forensics by enhancing anti-money laundering ef-
forts through the identification of suspicious Bitcoin activities.
Akcora et al. [29] developed heuristic graph-based models to pinpoint high-risk Bit-
coin network transactions by combining structural and temporal features. They utilized
temporal patterns to track how ransomware-related activities occur on the blockchain.
Their innovative method, aptly called "Bitcoin Heist," distinguished ransomware trans-
actions from ordinary ones, providing a unique framework for detecting anomalies.
Weber et al. [30] explored graph embeddings to classify Bitcoin transactions, testing
graph convolutional networks to capture the structural properties of transaction graphs.
Their method used embedding techniques to develop features representing the graph
structure, providing a more sophisticated analysis of transaction networks. Their re-
search significantly improved financial forensics by enhancing anti-money laundering ef-
8
forts through the identification of suspicious Bitcoin activities.
Tree-based models like Random Forest and XGBoost are highly effective classification
tools for anomaly detection. Bahnsen et al. [34] utilized Random Forest models to detect
fraud in Bitcoin transactions. Gai et al. [35] employed XGBoost models, using historical
transaction data to identify anomalies in blockchain networks. Conti et al. [36] showcased
the efficacy of ensemble models in identifying fraudulent behavior, demonstrating how
these methods can enhance anomaly detection.
Neural networks like Convolutional Neural Networks (CNNs) are highly effective in de-
tecting anomalies in tabular data, particularly in blockchain applications. Baosenguo’s
1D CNN model [37] performed impressively in a Kaggle competition. The model utilized
a fully connected layer to identify locality patterns in tabular data and was followed by
several 1D convolutional layers with shortcut-like connections. Despite tabular datasets
typically lacking locality features, the model excelled in classification by efficiently using
convolutional filters.
Arik and Pfister [38] designed TabNet, a groundbreaking attention-based neural net-
work model tailored specifically for tabular data. TabNet employs feature selection mech-
anisms via sequential attention steps, enabling the model to identify and focus on the
most relevant features while maintaining interpretability. This feature selection improves
predictive accuracy and provides transparency into which features matter most.
Joseph and Raj [39] introduced GATE (Gated Additive Tree Ensemble), which blends
decision trees and neural networks for superior classification of tabular data. GATE
combines tree ensemble learning and neural networks within a gated framework, boosting
classification accuracy by using a gating mechanism to filter out less informative features.
This hybrid approach enhances performance by learning intricate patterns often missed
by traditional tree models.
9
Popov et al. [40] proposed the Neural Oblivious Decision Ensemble (NODE), a deep
learning model crafted for tabular data. NODE integrates decision trees and neural
networks to capture hierarchical feature interactions while retaining the interpretability
of decision trees. The model employs a series of differentiable oblivious decision trees
that efficiently detect complex patterns in tabular datasets, resulting in high anomaly
detection performance.
10
Chapter 5
RESEARCH OBJECTIVES
2. Design a user-centric platform that offers a suite of tools and features to facili-
tate detailed on-chain analysis of blockchains, enhancing users’ ability to derive
meaningful interpretations and make informed decisions.
3. Adapt and extend the established analytical framework to seamlessly cater to an-
other blockchain network, (Bitcoin blockchain network), ensuring that it captures,
processes, and visualizes Bitcoin-specific on-chain data with the same depth and
clarity.
1. To design and train a precise anomaly detection model for identifying irregularities
in blockchain transactions.
11
Chapter 6
METHODOLOGY
The Analysis exploration of blockchain technology through on-chain metrics unveils the
operational dynamics, economic activities, security vulnerabilities, and user behaviors
embedded within. This analysis is paramount for a spectrum of stakeholders, from de-
velopers to investors, empowering them with the data to make informed decisions. Rec-
ognizing the intricate and diverse nature of blockchain networks, our team embarked
on categorizing on-chain metrics into distinct areas. This decision was inspired by a
comprehensive review of existing research and methodologies that highlighted the im-
portance of a structured Analysis framework for nuanced blockchain exploration. Such
an approach not only facilitates targeted analysis but also enhances interpretability and
supports the extensibility of our on-chain analysis tool, ensuring its relevance and utility
in an ever-evolving digital asset landscape.
The rationale behind this structured categorization stems from the need to distill ac-
12
tionable insights from the vast, complex data inherent in blockchain networks. As we
delve into the specifics of each category, it’s important to understand that this method-
ological choice is not arbitrary but rather a strategic effort to align our analysis with the
multifaceted nature of blockchain data. By doing so, we aim to equip users with a com-
prehensive toolkit for dissecting blockchain activities, fostering a deeper understanding
of the underlying trends and patterns that govern these digital ecosystems.
We decided to organize on-chain data into different groups because blockchain net-
works are very complicated and have many different parts. We noticed that there are
lots of different types of data on blockchains, and each type can tell us something impor-
tant, aligning with methodologies seen in foundational research such as [41] [42]. So, we
grouped these types of data into categories to make it easier to understand and analyze.
This helps us study them better and make our on-chain analysis tool even better in the
future.
Given this backdrop, the formulation of metric categories becomes a pivotal element
in our Analysis arsenal. It’s through this lens that we can sift through the blockchain’s
data-rich environment, identifying and classifying metrics that are most indicative of
the network’s health, performance, and intricacies. The following discussion provides a
detailed justification for each metric category we’ve established, elucidating the pivotal
role these categories play in enhancing our understanding of blockchain networks.
1. Transactional Metrics: The heart of any blockchain network beats with the
rhythm of its transactions. Recognizing the fundamental role that transactions
play in reflecting the network’s economic pulse, we prioritize transactional metrics
as a key category. Drawing inspiration from studies like [43], which illuminate the
economic significance of transaction patterns, our focus on transactional metrics
aims to uncover the liquidity flows, monetary dynamics, and economic vitality of the
blockchain. These metrics serve as a critical barometer for assessing the network’s
financial health and activity levels, providing essential insights for economic analysis
and strategic decision-making.
13
The categorical classification of metrics not only facilitates a comprehensive analysis
of blockchain networks but also lays a foundation for the extendibility of our analysis
platform. By structuring metrics into well-defined categories, we enable the platform to
adapt to emerging trends and innovations within the blockchain space. This modular
approach allows for the seamless integration of new metrics and categories, ensuring the
platform remains relevant and valuable to users as blockchain technology evolves [46].
Moreover, the categorization supports customized analysis tailored to specific user
needs and interests. Whether focusing on economic analysis, security assessment, or
network growth, users can leverage relevant metric categories to derive targeted insights.
This flexibility underscores our platform’s capability to serve a diverse user base, from
investors and developers to researchers and regulatory bodies [47].
In conclusion, the strategic categorization of on-chain metrics into distinct areas is
a deliberate and methodologically sound approach designed to enhance the depth, clar-
ity, and utility of blockchain analysis. Drawing upon existing literature and industry
best practices [48], we justify each category’s inclusion and articulate its significance in
providing a rich, multidimensional view of blockchain ecosystems. This framework not
only facilitates current Analysis needs but also ensures our platform’s adaptability and
relevance in the face of future blockchain developments.
Now that we have established, the metric categorization thoroughly, let’s walk through the
metric formulation. It is important to note that we have tailored our metric formulation
under two categories as well. They are,
1. General Metrics: Metrics that are compatible with any blockchain network.
The choice of the general metrics has been influenced by the literature in a way
such that, the features on which those general metrics have been created are found in
any blockchain network. For instance, [41] and [43] underscore the importance of on-
chain metrics for predictive analysis and reinforcement learning systems. Based on those,
we have identified the following general metrics. Moreover, we have been careful to
identify those metrics under the categories which we has identified under the Metric
Categorization section. Following is the table of the general metrics, where each general
metric is presented with its definition, calculation method, significance, and category.
14
Table 6.1: Table of the general metrics
15
Category Metric Name Metric Definition Calculation
Method
16
Category Metric Name Metric Definition Calculation
Method
Apart from the generalization, the beauty of our framework, GodSight lies in its cus-
tomization. Any individual, consortium, or organization that uses GodSight can tailor
their tool to represent any of the metrics that they identified and found useful. Building
on the foundation of customization and adaptability that characterizes our framework,
GodSight extends the capability of personalized metric development beyond predefined
categories. This unique feature empowers individuals, consortiums, or organizations lever-
aging GodSight to not only tailor metrics within established categories but also to pioneer
new categories that cater to evolving needs and novel insights specific to their blockchain
analysis objectives. This flexibility ensures that GodSight remains at the forefront of on-
chain Analysis, accommodating the dynamic landscape of blockchain technologies and
the diverse Analysis requirements of its users.
As a testament to the framework’s versatility and forward-thinking design, we in-
troduce the concept of Economic Indicators and Cross-Chain Metrics as exemplary new
categories born out of the need to highlight the importance of financial metrics in under-
standing market dynamics and investment potential and to understand and quantify the
complexities of interoperability and asset flow across different blockchain networks. These
categories emerge in response to the growing importance of interconnected blockchain
ecosystems, highlighting GodSight’s ability to adapt and innovate in alignment with the
latest trends and technological advancements in the blockchain space.
17
interoperability between blockchains, offering insights into the liquidity movement,
transactional coherence, and overall impact of these interactions on the broader dig-
ital asset market. By enabling the creation of such tailored categories and metrics,
GodSight not only enhances the depth and breadth of blockchain analysis but also
empowers users to navigate and exploit the intricacies of a multi-chain world.
18
Category Metric Name Metric Definition Calculation
Method
Network Total Staked The sum of all tokens The total staked
Health and Amount staked on a specified amount is obtained
Activity date within a by summing the value
blockchain subchain. of all tokens staked
within the subchain
on the designated
day.
Economic Total Burned The sum of all tokens Totals the amount of
Indicators Amount intentionally tokens that have been
destroyed or removed burned or removed
from circulation on a from circulation on
specified date within the specified day.
a blockchain
subchain.
19
Category Metric Name Metric Definition Calculation
Method
Concluding the Metrics Recognition section, it is clear that the development and im-
plementation of a structured framework for on-chain metric analysis significantly enhance
our ability to understand and interpret the complex dynamics of blockchain networks.
20
Through the meticulous categorization and formulation of metrics, our framework —
GodSight — embodies a holistic approach to blockchain Analysis, merging the rigor of
scientific analysis with the adaptability required to navigate the evolving digital asset
ecosystem.
By distinguishing between general metrics and those tailored specifically to the Avalanche
blockchain, we underscore the versatility and depth of our analysis capabilities. The intro-
duction of categories such as Economic Indicators and Cross-Chain Metrics further exem-
plifies our commitment to innovation, ensuring that GodSight remains at the forefront of
on-chain analysis by addressing emerging trends and the growing need for interoperability
among disparate blockchain systems.
Our framework’s emphasis on customization and extensibility not only caters to the
immediate Analysis needs of various stakeholders, from developers to investors, but also
anticipates the future demands of the blockchain community. By enabling users to define
and integrate new metrics and categories, GodSight fosters a collaborative and forward-
looking approach to blockchain Analysis, empowering users to uncover insights that are
both profound and actionable.
In conclusion, the Metrics Recognition section of our research delineates a founda-
tional aspect of our work, laying the groundwork for advanced on-chain analysis. The
structured categorization and thoughtful formulation of metrics serve as the cornerstone
of our framework, enabling a nuanced exploration of blockchain networks that is both
comprehensive and adaptable. As the blockchain landscape continues to evolve, so too
will our framework, ensuring its relevance and utility for years to come. This adaptability,
rooted in a deep understanding of blockchain metrics and their implications, positions
GodSight as an indispensable tool for navigating the future of blockchain analysis.
In the rapidly evolving domain of blockchain technology, the ability to adapt and inter-
pret extensive data efficiently is paramount. The GodSight framework, designed with
this challenge in mind, offers a robust solution for on-chain analysis through its sophisti-
cated system architecture and design. This chapter delves into the intricate components
and methodologies that make up the GodSight framework, showcasing how it seamlessly
integrates data extraction, computation, and extensibility. By leveraging a modular ap-
proach, the framework ensures that users can not only keep pace with current blockchain
technologies but also have the capacity to incorporate future advancements. This intro-
duction sets the stage for a detailed exploration of the system’s architecture, highlighting
the extendibility to cater to a diverse set of blockchain environments.
21
Below is a breakdown of the primary components:
Except for the Dashboard application, all components of the GodSight framework
are developed using Python, ensuring high interoperability and ease of integration with
other Python-based tools and libraries. Furthermore, through the features provided by
the Utils component, users can generate Docker images for the Extraction and Compu-
tation components. This capability facilitates the deployment of these components as
22
AWS Lambda functions, allowing for scalable, cloud-based operations that can efficiently
handle varying loads and data volumes.
This architecture not only supports the operational demands of on-chain Analysis but
also provides a robust, flexible platform for future expansions and enhancements.
6.1.2.2 Extendibility
The extendibility of the GodSight framework is a core feature, designed to ensure that the
system can adapt and evolve in response to new requirements and blockchain technologies.
This section describes how extendibility is implemented, the components involved, and
the processes for integrating new features.
• Role of the Utils Component: The Utils Component is crucial in the extendibil-
ity process. It contains all the necessary functionalities and scripts required to add
new blockchains and metrics types to the system. This component ensures that
new integrations are both seamless and standardized.
• Required Files and Scripts: Users looking to extend the framework to support
a new blockchain must provide specific files and scripts, including:
1. Extraction Function Script: A Python script that defines how to extract data
from the new blockchain.
2. Mapper Scripts: These scripts specify how to map the raw data from the
blockchain into the general format used by the framework.
3. Metric Scripts: These scripts allow users to define the custom metrics in a
detailed manner.
• Formatting and Examples: Each script must adhere to a specific format to en-
sure compatibility with the framework. For example, when integrating the Avalanche
blockchain, users would define an extraction function in Python that outputs data
in three lists: inputs, outputs, and transactions. The mapper scripts would then
format these lists to align with the predefined database schema.
• Validation Process: The validation process involves checking the syntax and
logic of the new scripts. The Utils Component automatically tests these scripts to
ensure they execute without errors and that the data mappings correctly reflect the
general model expectations.
• Database Schema and Storage Process: Each new blockchain integrated into
the framework requires a corresponding set of database tables that adhere to a
23
generalized schema optimized for on-chain Analysis. This schema includes fields
common to most blockchains, such as transaction IDs, dates, and values, but can
also accommodate unique features specific to each blockchain. The process of stor-
ing data involves:
By following these structured steps, the GodSight framework ensures that extendibil-
ity is not only feasible but also practical and efficient, allowing users to continually adapt
the system to meet the ever-changing landscape of blockchain technology. This approach
provides a robust foundation for the ongoing expansion and customization of the Analysis
capabilities.
6.1.2.3 Customizability
The customizability within the GodSight framework empowers users to define custom
metrics tailored to their specific Analysis needs. This system is designed to accommodate
the diversity of blockchain technologies by utilizing a common set of features commonly
found across most blockchains. Here’s how the rule-based system operates:
• Metric Definition: Users are required to define their metrics using these stan-
dardized features. Metric functions are articulated through a JSON format, the
specifics of which are detailed in the GodSight documentation. This documenta-
tion provides comprehensive guidelines and examples to assist users in formatting
their metric definitions correctly.
1. Format Validation: Ensures that the metric definition adheres to the JSON
format as specified in the framework documentation.
2. Logic Validation: Assesses the logical structure and feasibility of the defined
metric to guarantee that it can be computed accurately and efficiently.
• Database Storage and Metric Execution: After successful validation, the met-
ric is stored in the database under a unique name provided by the user. The actual
24
computation of these metrics is not automated within the framework; instead, users
must manually trigger the Computation Component. This component then pro-
cesses the data according to the defined metrics and generates the desired outputs.
This customizability of the GodSight framework not only facilitates the customization
of data analysis to suit various blockchain types but also ensures that users can effectively
measure and interpret blockchain activities through a flexible, user-defined metric system.
The system design & architecture of the GodSight framework are foundational to its
capability to deliver precise and scalable on-chain analysis. Through its component-based
structure, the framework ensures comprehensive data handling from extraction to visu-
alization, all the while maintaining flexibility in the integration of new blockchain tech-
nologies. The system’s reliance on a customized approach for metrics definition and the
strategic use of databases for extendibility underscore its innovative design. As blockchain
technology continues to grow in complexity and application, the adaptability and robust-
ness of the GodSight framework equip users with the tools necessary for effective analysis
and decision-making.
This methodology not only outlines the technical underpinnings of the system but
also reflects on the future potential of the GodSight framework to transform blockchain
analysis through continuous refinement and expansion.
The BABD-13 dataset provides comprehensive, labeled Bitcoin transaction data, en-
abling researchers to analyze the behavior of various Bitcoin addresses. Each transaction
is tagged according to its specific nature, with the original dataset consisting of 13 dis-
tinct labels that describe the activities associated with Bitcoin addresses. These labels
range from illicit categories like “Blackmail,” “Darknet Market,” and “Money Launder-
ing” to non-illicit categories like “Cyber-Security Service,” “Centralized Exchange,” and
“Individual Wallet.”
The dataset groups labels into two primary categories: illicit and non-illicit. Illicit la-
bels represent activities such as extortion, money laundering, and other forms of financial
fraud, while non-illicit labels include various legitimate uses, such as financial services
and cryptocurrency mining pools.
Illicit Types
• Blackmail: Involves various scams where victims are coerced or tricked into paying
a certain amount of cryptocurrency to specific addresses.
• Darknet Market: Markets operating on the darknet to trade illegal items using
cryptocurrencies.
25
• Government Criminal Blacklist: Addresses believed to be involved in criminal
activities.
• Ponzi Scheme: Fraudulent schemes that reward early investors using newer par-
ticipants’ investments.
Non-Illicit Types
• Cyber-Security Service: Providers offering services like VPNs and payment gate-
ways, accepting only cryptocurrency as payment.
Given the nature of the dataset, a significant challenge lies in the imbalance between
illicit and non-illicit categories. The majority of the dataset is dominated by non-illicit
addresses (over 93% of the data), while illicit activities like government blacklists, Ponzi
schemes, and money laundering make up only a small fraction (∼ 0.003% of the data).
Due to the low representation of some illicit types, those types were excluded from
the final selection. The finalized selected labels are:
• Blackmail - Class 1
• Tumbler - Class 3
• Non-illicit - Class 0
26
6.2.2 TABULAR-BASED BITCOIN ANOMALY DATASET CREATION
In this research, the goal is to create a simple, tabular-based Bitcoin anomaly dataset,
distinct from the graph-based BABD-13 dataset. The tabular format simplifies the rep-
resentation of Bitcoin transaction data, making it more accessible to on-chain analysis
where transactional data is typically maintained in tables. By using a statistical approach
to feature extraction, patterns can be discerned directly from transactional attributes,
facilitating anomaly detection across different classes.
To populate the dataset, statistical features are extracted for each address based on
transaction data. The extracted features include standard measures like:
Transaction time and value are critical components when analyzing Bitcoin transaction
data. Temporal analysis reveals patterns such as frequent transactions within short inter-
vals, possibly indicative of automated or suspicious activity. Transaction value analysis
highlights deviations that could signal illicit behavior, like sudden, large transfers.
27
the average input value. Similarly, output-specific features cover metrics like the number
of outputs and the average output value.
In addition to separate input and output features, cross-featuring involves combining both
aspects to create new feature sets. For instance, features like the ratio of input to output
value or the time difference between consecutive transactions regardless of direction can
offer deeper insights into transactional behavior.
This methodology yields a total of 76 features, each contributing unique insights into the
behavior patterns of Bitcoin addresses. The features cover a broad range of statistics:
This new dataset allows machine learning models to differentiate between different
Bitcoin address classes (illicit vs. non-illicit) with statistical precision, providing a prac-
tical alternative to graph-based anomaly detection approaches.
The machine learning workflow involves several crucial steps to transform the raw Bitcoin
transaction data into an optimized, balanced dataset ready for model training. This
ensures that the subsequent classification models are well-prepared to accurately detect
Bitcoin anomalies.
The first step is to clean the dataset by removing features with excessive null values.
Columns where a significant proportion of data is missing can lead to bias or inaccuracies
during analysis. By dropping these columns, the dataset becomes more reliable and
representative of relevant features. Any remaining null values are either imputed or
handled based on the specific needs of the analysis.
28
• Recursive Feature Elimination (RFE): This method recursively eliminates fea-
tures by training models with subsets of features and ranking them by importance.
The goal is to retain only the most relevant features.
Due to the overwhelming number of non-illicit transaction records in the dataset, han-
dling data imbalance becomes critical to ensure the model isn’t biased. The following
oversampling techniques are used:
To ensure that all features are on a comparable scale and that none disproportionately
influence the models, standardization is crucial. Each feature is normalized to have a
mean of zero and a standard deviation of one. This makes training more consistent and
ensures that the model can interpret the influence of each feature accurately.
In this phase of the methodology, different models are explored to identify the most suit-
able architecture for Bitcoin anomaly detection, given the nature of the tabular dataset
and the specific features extracted.
29
6.2.4.1 Tree-Based Models and Tabular NN Models
Tree-based models and tabular neural networks (NNs) have proven highly effective in
classification tasks involving structured data.
• Tree-Based Models: Decision tree models, particularly Random Forests and Gra-
dient Boosting Machines, are well-suited for handling tabular datasets. They can
identify complex patterns and interactions between features, offering insight into
the importance of each feature for classification.
In this study, a modified TabNet model is used to exploit its feature selection and
classification capabilities. The TabNet architecture combines:
The attention mechanism ensures that only relevant features are processed, reducing
unnecessary computation and enhancing the model’s interpretability.
30
6.2.4.3 Introduction of 1D CNN Feature Extraction
• Classifier: The classifier receives these important features and processes them to
predict the class probabilities. The modular design allows efficient adjustments and
training to improve classification accuracy.
• Additional Layers: Additional layers were integrated into the model to refine
feature selection and classification.
These architectural adjustments improve the model’s ability to detect anomalies across
the highly imbalanced classes in the Bitcoin transaction dataset.
To assess the performance of the anomaly detection models, it’s important to use metrics
that effectively capture their ability to handle the extreme class imbalance. The evalua-
tion process involves a highly imbalanced test set, where approximately 99% of the data
consists of non-illicit addresses and only 1% comprises illicit addresses.
31
6.2.5.2 Metrics
TP + TN
Accuracy =
TP + TN + FP + FN
where:
• F1 Score: The F1 score combines precision and recall into a single metric, pro-
viding a balanced measure for classification performance, especially in imbalanced
datasets.
Precision · Recall
F1 Score = 2 ·
Precision + Recall
where:
32
Chapter 7
RESULTS AND DISCUSSION
The GodSight framework exhibits notable efficiency and reliability in handling the extrac-
tion, computation, and visualization of on-chain data, offering a comprehensive platform
for blockchain analysis.
GodSight simplifies data processing by mapping all extracted transaction data into a uni-
fied format using a set of general features. This streamlined structure allows the extracted
data to be maintained in a single database table, while categorizing the information into
three key types: inputs, outputs, and transactions. This categorization ensures rich, de-
tailed data is available for on-chain analysis, enabling the framework to support a diverse
array of metrics.
The accuracy of extracted data depends on the APIs used to obtain transaction in-
formation. By default, the framework relies on open-source APIs, which are convenient
and sufficient for general purposes. However, for those seeking higher precision, users
can extract transaction data directly from a blockchain node. This flexibility allows
users to fine-tune their data sources according to their requirements, trading off between
convenience and accuracy.
The computation accuracy within GodSight is directly linked to the quality of the ex-
tracted transaction data and the computation logic used. The framework provides a well-
documented set of equations for each metric and adheres to the general features model
to maintain consistent calculations. However, since the user is responsible for mapping
features during the blockchain integration process, data accuracy can be influenced by
how well this mapping is executed. Additionally, as an open-source tool, users have the
liberty to refine or update the logic of the metrics to align with evolving requirements or
to address issues.
33
7.1.1.3 Responsiveness of the Dashboard
meta.json: The meta.json file is the starting point for adding any new blockchain to the
framework. It includes critical metadata such as the blockchain name, start date (useful
for historical analysis), subchain names, and basic metrics available for each subchain.
This ensures that whether integrating a multi-chain blockchain like Avalanche or a single-
chain blockchain like Bitcoin, the framework accommodates the different structural and
data nuances. For instance, Avalanche’s subchains (X, P, and C) are clearly distinguished
in the meta.json file, while Bitcoin, a single-chain blockchain, defaults to one subchain
called ’default.’
Extract File: The Extract file contains the logic for fetching transaction data from
the source. This ensures users can define custom extraction functions that align with their
preferred data source. For Avalanche, open-source APIs are used to collect transaction
data. By standardizing the input (a specific date) and output (lists of inputs, outputs,
and transactions), the Extract file maintains consistency in data retrieval, making it
easier for users to integrate new blockchains into the framework.
Mapper File: Since different blockchains have varying feature names and structures,
the Mapper file enables mapping from each blockchain’s native features to a general model
format. This ensures that all extracted data is consistent and usable within the frame-
work. For instance, users can directly map or derive feature values from the extracted
data. By providing this mapping flexibility, GodSight can handle the data complexity
across various blockchains.
Metric File: In addition to the meta.json, Extract, and Mapper files, the Metric file
allows users to define custom metrics tailored to their specific on-chain analysis needs.
These custom metrics leverage the general feature set established through the mapping
process, ensuring consistency and compatibility across various blockchains.
34
class CustomMetric:
def __init__(self, blockchain, chain, name, display_name, transaction_type,
category, description):
self.blockchain = blockchain
self.chain = chain
self.name = name
self.display_name = display_name
self.transaction_type = transaction_type # Options: "transaction", "
emitted_utxo", "consumed_utxo"
self.category = category
self.description = description
The GodSight dashboard is designed with user experience in mind, providing a compre-
hensive and intuitive interface for analyzing on-chain data. The dashboard achieves this
by offering a clean, responsive layout that allows users to quickly access and interpret
metrics.
35
7.1.3.1 Usability and Presentation
The dashboard uses React.js to deliver dynamic, highly interactive visualizations. The
charts and graphs effectively present metrics across different blockchains and subchains,
allowing for an intuitive exploration of blockchain activity. Users can apply filters to
customize their views, making it easy to focus on specific metrics or periods.
The target users of the GodSight framework typically include blockchain researchers,
analysts, and developers who have a solid understanding of blockchain structures and
features. These users should also possess basic coding skills to efficiently work with the
framework. The design of the framework aligns with these requirements, ensuring that the
core functionality is both comprehensive and accessible to those who meet the knowledge
prerequisites.
Blockchain Knowledge Requirement: Given the specialized nature of on-chain
analysis, the framework provides advanced tools that require users to have a thorough un-
derstanding of blockchain networks, transaction structures, and subchain features. This
understanding helps users accurately map and interpret data, particularly when adding
new blockchains to the framework or defining custom metrics.
Coding Skills: To work effectively with the framework, users need a basic grasp of
Python programming. This skill is necessary for creating complex metrics through the
code-level add-blockchain option and for modifying the computation logic as needed. For
simple metrics, however, the JSON-based API makes the process straightforward.
36
7.1.3.3 Ease of Creating Simple Metrics through the API
Creating simple metrics through the API is a powerful feature of the framework, con-
tributing to user engagement and productivity. Users can define metrics by providing a
JSON-based formula in one of two supported formats:
1. Format 1: This simpler format allows users to define metrics through basic ag-
gregation functions like sum, as seen in the "total_fees" example. The aggregation
functions are straightforward and provide an easy entry point for defining metrics.
Users specify the blockchain, subchain, aggregation column, and the aggregation
function itself.
formula = {
"aggregations": [
{
"name": "total_fees",
"column": "fee",
"function": "sum"
}
],
"final_answer": "total_fees"
}
print(formula)
2. Format 2: This format offers greater complexity for more nuanced metric calcu-
lations. Users can specify multiple aggregations, arithmetic operations, and condi-
tions on the data, such as filtering by transaction type. For instance, in "Normalized
Adjusted Send Fee Impact," aggregations include functions like sum and avg, while
37
arithmetic operations like subtraction and division are applied to intermediate re-
sults.
formula = {
"aggregations": [
{"name": "sum_amount_send", "column": "amount", "function": "sum", "
condition": {"column": "tx_type", "value": "send"}},
{"name": "avg_fee_send", "column": "fee", "function": "avg", "condition
": {"column": "tx_type", "value": "send"}},
{"name": "count_receive", "column": "tx_type", "function": "count", "
condition": {"column": "tx_type", "value": "receive"}},
{"name": "min_fee", "column": "fee", "function": "min"}
],
"arithmetic": [
{
"name": "adjusted_avg_fee",
"operation": "subtraction",
"operands": ["avg_fee_send", "min_fee"]
},
{
"name": "final_metric",
"operation": "division",
"operands": ["adjusted_avg_fee", "count_receive"]
}
],
"final_answer": "final_metric"
}
These API-based metrics are suitable for relatively simple calculations and help users
quickly define new metrics without needing to modify the framework codebase. If more
sophisticated metrics are required, the user can still opt for the add-blockchain option
and define metrics at the code level.
In comparing GodSight with existing on-chain analysis frameworks like Nansen, Dune
Analytics, Glassnode, and IntoTheBlock, our solution stands out for its unique
features and improvements. These established platforms offer comprehensive on-chain
data, market analytics, and wallet tracking, but they are closed-source. This ’black box’
approach restricts transparency and openness, contrary to the core blockchain ethos.
Furthermore, their extendibility is limited, as integrating a new blockchain can lead to
significant delays, leaving users awaiting desired chains.
GodSight addresses these issues through an open-source, modular approach. Our Vi-
sualization Pane allows users to interact with and analyze blockchain data while accessing
38
the underlying code that powers these insights. This transparency empowers developers
to customize and enhance the framework to suit their needs. Its modular structure en-
sures that integrating new blockchain technologies is straightforward and user-friendly,
enabling rapid adoption. Users can seamlessly add custom metrics and refine existing
ones, creating a personalized analysis environment.
However, data extraction remains a challenging area. In GodSight, we faced chal-
lenges, particularly with the Glacier API, which has strict rate limits that prevented
parallel data extraction. This challenge required careful scheduling and optimization
within the Data Pane to align with API limitations while still delivering accurate and
consistent results. To address such limitations in the future, we will continue improv-
ing data extraction workflows, integrating multiple strategies like instrumenting nodes
directly or utilizing new APIs to enhance data completeness and accuracy.
The GodSight framework, while offering a solid foundation for extensibility and cus-
tomization, has ample potential for growth and enhancement. Several opportunities lie
ahead to bolster the framework’s technical capabilities, expand its feature set, and provide
broader support for blockchain analysis.
39
looking metrics that forecast network activity, economic trends, and potential market
anomalies. This capability would be particularly valuable for researchers and developers
looking to anticipate changes in blockchain ecosystems.
The finalized dataset comprises Bitcoin addresses labeled according to four specific classes,
with a dominant proportion of non-illicit transactions:
• Class 1: Blackmail
• Class 3: Tumbler
The dataset contains transaction data from 2018 to 2021, sourced from the BABD-13
dataset. Most addresses are involved in only a few transactions (typically 1 or 2), which
created challenges in computing statistical features like skewness, kurtosis, and variance.
These features require a minimum number of data points to produce meaningful values,
so those columns were removed due to high null values. The resulting dataset contains
61 features, along with a label column.
40
Figure 7.3: Null data description
To evaluate model performance, the dataset was split into training and test sets. The
test dataset was designed to mimic real-world conditions, where the vast majority of
transactions are non-illicit. Thus, it comprises 99% non-illicit data and only 1% from the
anomaly classes. Specifically, the test dataset includes:
The training dataset also reflects the imbalance present in the original data.
This division ensures the models are evaluated on a challenging dataset with a realistic
imbalance between illicit and non-illicit transactions, enabling practical assessments of
model effectiveness in detecting anomalies.
41
7.2.2 FEATURE SELECTION
Feature selection plays a vital role in identifying the most informative attributes for
anomaly detection. By reducing the feature space to the most relevant columns, models
can learn more efficiently and yield better predictions. In this research, tree-based models
like Random Forest were used to rank feature importance for both multi-class (four
classes) and binary (illicit vs. non-illicit) classification tasks.
For the multi-class dataset, the following features emerged as the most important:
1. input_spending_value_usd_75th_percentile
2. output_value_usd_median
3. output_value_usd_25th_percentile
4. output_value_usd_minimum
5. input_spending_value_usd_25th_percentile
6. output_value_usd_75th_percentile
7. output_value_usd_maximum
8. input_spending_value_usd_median
9. input_spending_value_usd_minimum
10. output_value_minimum
These features encompass key statistical measures such as percentiles and medians for
transaction values in both input and output transactions. Percentile-based values help
distinguish subtle differences in transaction behaviors across the different classes.
When grouping all illicit classes together and comparing them against non-illicit trans-
actions, these ten features stood out:
1. input_spending_value_usd_75th_percentile
2. output_value_usd_median
3. output_value_usd_25th_percentile
4. input_spending_value_usd_25th_percentile
5. input_spending_value_usd_median
42
6. output_value_usd_75th_percentile
7. input_output_usd_max_ratio
8. input_output_usd_min_ratio
9. input_output_usd_mean_ratio
10. output_value_usd_maximum
The ratios between input and output values offer valuable insights into patterns that
distinguish illicit behavior, while the percentile and median-based features provide crucial
statistical measurements.
In both classification approaches, input and output transaction values, especially at
various percentiles, play a significant role in identifying illicit behaviors. The presence
of input-output ratios among the top features in the binary dataset highlights their ef-
fectiveness in distinguishing between illicit and non-illicit addresses. Overall, focusing
on these top features allows machine learning models to detect patterns indicative of
different types of behavior, simplifying the complex task of identifying Bitcoin anomalies.
43
• Noise Reduction: By projecting data onto fewer dimensions, PCA filters out
the noise inherent in high-dimensional data.
• Efficiency: Reducing the number of features speeds up model training and
simplifies the model’s complexity.
• Multicollinearity Mitigation: PCA orthogonally transforms the features, re-
ducing multicollinearity between them.
Different numbers of PCA components were tested (10, 15, 20, and 30). After
experimentation, using 30 components proved to be the most effective, as it retained
enough variance to capture the essential structure of the dataset while reducing
dimensionality significantly.
The training phase encompassed a variety of models, including tree-based methods like
Random Forest and XGBoost, alongside tabular neural networks like TabNet, GATE,
NODE, and a modified version of TabNet. Each model was evaluated on the same
imbalanced test dataset to ensure consistency and comparability.
The results reveal interesting patterns and trends among the models:
• Overall Performance: The modified TabNet model achieved the highest total
accuracy (0.712) compared to all other models. This improvement underscores the
effectiveness of the modifications, particularly the addition of a 1D CNN feature
44
extraction layer and switching the activation function to PReLU. These enhance-
ments helped the model distinguish between the nuanced patterns of non-illicit and
illicit addresses, leading to improved classification performance.
• Comparing TabNet Models: The original TabNet model exhibited slightly lower
performance than the modified version. By incorporating the 1D CNN feature
extraction layer and the more adaptive PReLU activation function, the modified
TabNet model benefited from a more efficient feature selection process and greater
discriminatory power. This allowed the model to reduce the gap between illicit and
non-illicit classifications, leading to better overall accuracy.
The improved performance of the modified TabNet model demonstrates the value
of tailored model architectures for detecting Bitcoin anomalies. The model effectively
captured important patterns across all classes, particularly in the non-illicit class, which
made up the vast majority of the test set. This improvement supports the hypothesis
that enhanced feature extraction and activation functions can significantly contribute to
better classification outcomes in tabular neural networks.
45
Chapter 8
CONCLUSION
This research focused on two key developments: the GodSight on-chain analysis frame-
work and a Bitcoin anomaly detection model, both designed to advance blockchain data
analysis through comprehensive insights and anomaly detection. GodSight, an open-
source and modular framework, provides efficient data extraction and analysis with ac-
curate, real-time metrics through a responsive dashboard. Its user-defined metric cus-
tomization and multi-blockchain support offer unparalleled adaptability. Compared to
existing platforms like Nansen and Glassnode, GodSight stands out for its versatility,
responsiveness, and efficient data extraction.
The Bitcoin anomaly detection model leveraged a newly created tabular-based dataset
built upon statistical features derived from transactional data. This dataset, which simpli-
fies the representation of Bitcoin address behaviors compared to the graph-based BABD-
13 dataset, includes 76 features like medians, quantiles, and ratios. By differentiating
transactional patterns between input and output activities, it offers an alternative ap-
proach to uncovering behavioral patterns. The modified TabNet model, incorporating a
1D CNN layer and PReLU activation, achieved high accuracy in detecting illicit activi-
ties, highlighting the potential of this tabular dataset in distinguishing between legitimate
and suspicious Bitcoin addresses.
Moving forward, the optimization of GodSight’s data extraction processes and schema
structure will further improve its analysis capabilities. The integration of predictive anal-
ysis metrics via machine learning models will enhance GodSight’s insights. Expanding
cross-chain metrics, refining specialized neural network architectures, and improving pre-
dictive analysis will ensure the framework remains a robust tool for comprehensive on-
chain analysis. These advancements will position GodSight as a transformative solution
that not only identifies anomalies with precision but also delivers actionable intelligence
for blockchain data analysis.
46
REFERENCE
[1] K. Wüst and A. Gervais, “Do you need a blockchain?,” tech. rep., Department of
Computer Science, ETH Zurich; Department of Computing, Imperial College Lon-
don, 2017. [Online]. Available: https://eprint.iacr.org/2017/375.pdf.
[2] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” tech. rep., 2008.
[Online]. Available: https://bitcoin.org/bitcoin.pdf.
[4] J. Strebko and A. Romanovs, “The advantages and disadvantages of blockchain tech-
nology,” 2018.
[5] Z. Zheng, S. Xie, H.-N. Dai, X. Chen, and H. Wang, “Blockchain challenges and
opportunities: A survey,” International Journal of Web and Grid Services, vol. 14,
p. 352, 2018.
[6] A.-F. Ţicău Suditu, “Blockchain technology and electronic wills,” tech. rep. [Online].
Available: https://www.academia.edu/77104514/Blockchain_Technology_and_
Electronic_Wills.
[7] V. M. Araujo, J. A. Vázquez, and M. P. Cota, “A framework for the evaluation of saas
impact,” International Journal in Foundations of Computer Science & Technology
(IJFCST), vol. 4, May 2014. [Online]. Available: https://arxiv.org/pdf/1406.
2822.pdf.
[8] Özdemir and Yılmaz, “The effect of using digital storytelling in geography lessons
on students’ academic achievement and their attitudes towards geography lesson,”
Journal of Education and Training Studies, vol. 6, no. 4, pp. 15–27, 2018.
[9] M. Godse and S. Mulik, “An approach for selecting software-as-a-service (saas) prod-
uct,” in CLOUD 2009 - 2009 IEEE International Conference on Cloud Computing,
pp. 155–158, 2009.
47
Management, Design and Development of e-Courses: Standards of Excellence and
Creativity", pp. 22–26, IEEE, 2013.
[11] F. Aslam, “The benefits and challenges of customization within saas cloud solutions,”
American Journal of Data, Information and Knowledge Management, vol. 4, no. 1,
pp. 14–22, 2023.
[13] Y. X. et al., “Babd: A bitcoin address behavior dataset for address behavior pattern
analysis,” arXiv preprint arXiv:2204.05746, 2022.
[14] T. Chen, Z. Li, Y. Zhang, X. Zhang, et al., “Dataether: Data exploration framework
for ethereum,” in 2019 IEEE.
[15] T. Chen, Z. Li, Y. Zhu, X. Zhang, et al., “Understanding ethereum via graph anal-
ysis,” ACM Transactions on Internet Technology, vol. 20, no. 2, 2020.
[16] J. Huang, W. Huang, and J. Ni, “Predicting bitcoin returns using high-dimensional
technical indicators,” The Journal of Finance and Data Science, vol. 5, no. 3, 2018.
[19] T. Hu, X. Liu, T. Chen, X. Zhang, X. Huang, W. Niu, J. Lu, K. Zhou, and Y. Liu,
“Transaction-based classification and detection approach for ethereum smart con-
tract,” Information Processing & Management, vol. 58, 2021.
[20] “An on-chain analysis-based approach to predict ethereum prices,” IEEE Access,
December 2021.
[21] “Predicting cryptocurrencies market phases through on-chain data long-term fore-
casting,” in International Conference on Blockchain and Cryptocurrencies, (Dubai),
July 2023.
[22] “Blockchain-based cryptocurrency price prediction with chaos theory, onchain anal-
ysis, sentiment analysis and fundamental-technical analysis,” Chaos Theory and Ap-
plications, November 2022.
[23] Nansen, “On-chain insights for crypto investors & teams,” tech. rep., Nansen
| On-chain Insights for Crypto Investors & Teams, 2023. [Online]. Available:
https://www.nansen.ai/. [Accessed: 08-Oct-2023].
48
[24] D. Analytics, “Blockchain ecosystem analytics by and for the community. explore and
share data from ethereum, bitcoin, polygon, bnb chain, solana, arbitrum, avalanche,
optimism, fantom and gnosis chain for free,” tech. rep., Dune Analytics, 2023. [On-
line]. Available: https://dune.com/browse/dashboards. [Accessed: 08-Oct-2023].
[25] Glassnode, “Glassnode studio - on-chain market intelligence,” tech. rep., Glassnode
Studio, 2023. [Online]. Available: https://studio.glassnode.com/home. [Accessed:
08-Oct-2023].
[26] IntoTheBlock, “Powering the intelligence layer of the crypto markets,” tech. rep.,
2023. [Online]. Available: https://www.intotheblock.com/. [Accessed: 08-Oct-2023].
[27] A. Aung, H. Aung, and N. Khaing, “Bitcoin transaction analysis using autoencoder
for anomaly detection,” in Proceedings of the 4th International Conference on Big
Data and Internet of Things (BDIoT 2021), pp. 109–113, 2021.
[28] H. Le, C. Strufe, and B. Meinel, “Ethereum smart contract detection using cluster-
ing,” in Lecture Notes in Computer Science, vol. 11401, pp. 32–42, 2019.
[29] C. G. Akcora, Y. Li, and M. Kantarcioglu, “Bitcoin heist: Topological data analysis
for ransomware detection on the bitcoin blockchain,” in Proceedings of the 2019 IEEE
International Conference on Data Mining (ICDM), pp. 1–8, 2019.
[31] R. M. et al., “Illicit activity detection in bitcoin transactions using timeseries analy-
sis,” International Journal of Advanced Computer Science and Applications, vol. 14,
no. 3, pp. 13–18, 2023.
[34] D. P. J. Bahnsen and B. Edwards, “Bitcoin fraud detection using random forest
models,” Expert Systems with Applications, vol. 137, pp. 156–163, 2018.
[35] H. T. Y. Gai and J. Sun, “Anomaly detection using xgboost model,” IEEE Transac-
tions on Blockchain, vol. 22, no. 6, pp. 1–8, 2021.
49
[37] B. Guo, “1d-cnn: Best single model in kaggle competition,” 2021. [Online]. Available:
https://github.com/baosenguo/Kaggle-MoA-2nd-Place-Solution.
[38] S. O. Arik and T. Pfister, “Tabnet: Attentive interpretable tabular learning,” arXiv
preprint arXiv:1908.07442, 2019.
[39] M. Joseph and H. Raj, “Gate: Gated additive tree ensemble for tabular classification
and regression,” arXiv preprint arXiv:2207.08548, 2022.
[40] S. M. S. Popov and A. Babenko, “Neural oblivious decision ensembles for deep learn-
ing on tabular data,” arXiv preprint arXiv:1909.06312, 2019.
[42] Z. Huang and F. Tanaka, “A scalable reinforcement learning-based system using on-
chain data for cryptocurrency portfolio management,” ArXiv, vol. abs/2307.01599,
2023.
[43] Z. Zheng, S. Xie, H. Dai, X. Chen, and X. Wang, “An overview of blockchain technol-
ogy: Architecture, consensus, and future trends,” in IEEE 6th International Congress
on Big Data, pp. 557–564, 2017.
[45] L. Cong and Z. He, “Blockchain disruption and smart contracts,” Review of Financial
Studies, vol. 32, no. 5, pp. 1754–1797, 2019.
50