0% found this document useful (0 votes)
11 views50 pages

Building A M Platform at Home 1739382859

The document outlines a DIY Malware Analysis Platform course aimed at teaching participants how to download, extract, and analyze metadata from malware. It covers various topics including setting up a lab environment, automating data collection, and utilizing tools like YARA for malware identification. The course emphasizes hands-on experience and encourages participants to explore deeper subjects independently.

Uploaded by

absamo1997
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views50 pages

Building A M Platform at Home 1739382859

The document outlines a DIY Malware Analysis Platform course aimed at teaching participants how to download, extract, and analyze metadata from malware. It covers various topics including setting up a lab environment, automating data collection, and utilizing tools like YARA for malware identification. The course emphasizes hands-on experience and encourages participants to explore deeper subjects independently.

Uploaded by

absamo1997
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Building A Malware Analysis

Platform At Home
Jared Stroud (@DLL_Cool_J)

April 2023

2023

Approved for Public Release; Distribution Unlimited. Public Release Case Number 23-1151
Disclaimer

The author’s affiliation with The MITRE Corporation is provided for identification purposes
only, and is not intended to convey or imply MITRE’s concurrence with or support for the
positions, opinions or viewpoints expressed by the author. 2023 The MITRE Corporation.
ALL RIGHTS RESERVED
www.archcloudlabs.com

Page | 2
Agenda & Goal
This is a DIY Malware Analysis Platform course! You will be building a process to download,
extract, and analyze metadata from malware! This two hour course is going to try to expose
you to a bunch of different things and leave you with the ability to dive into deeper subjects
independently.

1 Identifying Our Goals 4 Data Extraction


2 Malware & where to find it 5 Data Analysis
www.archcloudlabs.com

3 Automating Collection 6 Recap & Survey

Page | 3
$> whoami
● Currently:
○ Lead Security Engineer at The MITRE Corporation
○ Adjunct Lecturer at Rochester Institute of Technology
● Previously:
○ Malware Analyst/Security Researcher
● Presented at:
○ ATT&CKCON
○ BSides Roc
www.archcloudlabs.com

○ DEF CON 29 Packet Hacking Village


○ DFRWS-EU
○ ShmooCon Firetalk
Page | 4
The Lab Environment!

● The entire lab environment has been instrumented via Vagrant!


○ Get it here: https://developer.hashicorp.com/vagrant/downloads

● This is to ensure we all have the same environment and tools/tool version
when poking at the labs.
○ If you want to use your own distro/VM, that’s fine too.

● We are working with live malware, you are responsible for your machine.
○ Take snapshots/backups/etc…

www.archcloudlabs.com Page | 5
Configuring Networks for Analysis
● Dynamic analysis can influence static analysis and vice versa.
○ Capturing traffic in a safe manner is critical.
○ Nothing ruins a day faster than accidentally executing malware in a insecure way.
○ Have a known good state w/ snapshots and do a dry run prior to malware execution.

virt-manager Virtualbox
VMware Desktop

www.archcloudlabs.com Page | 6
Configuring Networks for Analysis - “Smoke Test”

www.archcloudlabs.com Page | 7
What do YOU want to achieve?

● No one has the same 24 hours in a day.


● How you invest them is critical to what you receive from said investment.
Malware Analysis/Homelab Malware Platforms help build a variety of skills.
● For example:
○ Analyzing the Malware? - Reverse Engineering/DFIR skills
○ Building software to collect and analyze malware? - Software Engineering
○ Automating the downloads and passing data to different microservices? -
DevOps
○ Blogging - Writing/Presentation/Soft Skills

www.archcloudlabs.com Page | 8
What are we Building?

www.archcloudlabs.com Page | 9
Malware and Where to Find It

● Well curated samples ● Daily dumps of IoCs ● IP telemetry


● Domain telemetry
● Historic APT Campaigns ● Search for hashes
● Source Code ● YARA Rule Hunting
● Papers
Note: Not an exhaustive list

www.archcloudlabs.com Page | 10
VX-Underground - Case Study 3CX VoIP

● Alleged DPRK threat actors


● Less than 24 hours samples were
available via VX-Underground

Note: Not an exhaustive list

www.archcloudlabs.com Page | 11
Malware and Where to Find It

● Malware Dumps (theZoo) ● Honeypots


● IoT Botnet Source Code ● Web Server logs?
● …

Note: Not an exhaustive list

www.archcloudlabs.com Page | 12
Malware and Where to Find It - Notable Mentions

● Sophos Reversing Labs “SOREL” 20 Million Samples


● Dataset for ML purposes, but also great for RE

www.archcloudlabs.com Page | 13
Provisioning Our Malware Machine!

● Vagrant is a utility that allows you to quickly spin up Virtual Machines.


○ These Virtual Machines could be enabled via Virtualbox, Libvirt, or VMWare*
■ VMWare requires a paid plugin

● Provisioning these machines can be achieved via shell commands, Chef, or Ansible.
○ Our demo will have us completing the provisioning with Ansible

www.archcloudlabs.com Page | 14
Daily Malware Dumps!

● Everyday there’s a new article on a new malware variant/technique/etc… being exploited in the
wild.
● Some of these services provide daily dumps of ongoing campaigns, but not all let you download.
○ Malshare/Hybrid-Analysis/Abuse.ch allows you to download samples for free!
■ Today, we’ll focus on malshare
● Interacting with these services requires an account to get an API key to then download the daily
dumps.
● How do we identify what these binaries are?
○ YARA!

www.archcloudlabs.com Page | 15
YARA Rules
● Why YARA?
○ Industry standard rule format for identifying
malicious family of binaries.
○ Open Source (BSD License)
○ Significant amount of examples on the internet to
use for scanning

● Combining Daily Dumps of malware w/ YARA Rules


can help us identify what a given sample is or what
features the malware contains. Example from https://github.com/VirusTotal/yara

www.archcloudlabs.com Page | 16
YARA Rules - Python Documentation

Example Python yara Detection

www.archcloudlabs.com Page | 17
Some Tools of the Trade

IDA Pro
(hex-rays.com) Ghidra Cutter Radare2
(ghidra-sre.org) (cutter.re) (radare2.org)

www.archcloudlabs.com Page | 18
The labs use radare2, Why?

● Free
● Cross Platform
● Lets you rapidly explore binaries all via the console.
● Allows you to explore and understand file formats.
● Excellent for automation
● GUI components (cutter) integrate some of the best parts
of both IDA & Ghidra
○ Ghidra’s Decompiler
○ IDA’s Graph

● Cons: high learning curve (think vim, but for RE)

www.archcloudlabs.com Page | 19
What is a PE/ELF?

● Executable files contain a structure that the underlying Operating System loader understands how
to parse in order to allocate memory, copy sections into memory, load additional files and ultimately
begin execution of the process.

● A file is broken down into headers, sections and segments (program headers).
○ Think of headers like the table of contents in a book.
■ It tells you where to look for a given topic.
○ Sections: used by the linker to build an executable
○ Segments contain data for runtime.

www.archcloudlabs.com Page | 20
Preventing Accidental Execution

● Ensure binary is NOT executable on download.


○ During file creation we can make binaries read only.

● How does the OS execute a binary?


● TL;DR: The OS loader parses the file format
(PE/ELF) to then allocate memory regions to
copy the given regions of a binary into memory
and start execution.
○ If the header is broken this process stops.
○ A fake header can prevent accidental
execution of binaries in our environment.

Source: https://wiki.osdev.org/ELF

www.archcloudlabs.com Page | 21
Checking and Breaking Headers for ELFs

Machine: 0x3e (62) Machine: 0x00


● Note: this will change the underlying hash
● man 5 elf to see expected header values
● Automating with Radare2:
● aaa: analyze
● s: seek to offset 0x11 in the binary ( Machine ID in the ELF header)
● w0 2: write 2 zeros at this location
● q: quit

www.archcloudlabs.com Page | 22
Simple but effective! Case study: SoRel Data Set

● Case example: Sophos Reversing Labs SoRel data set.


● 20 Million malware samples (Windows) labeled for
machine learning/data analysis purposes.
● Modified FileHeader. Machine header &
FileHeader.OptionalHeader.Subsystem to prevent
accidental execution.
● Modifications to the file itself changes the file hash!
○ SoRel has the file name of a binary as the original
file hash.

www.archcloudlabs.com Page | 23
Lab - Automating Collection from Malshare

www.archcloudlabs.com Page | 24
Beyond The Course - Scaling Malware Collection
● Single points of failure are Bad!
○ If the Python script fails, well there goes the data…
○ If we restart the python script as is, we’ll reingest all the previously seen data as well.
○ How do we plan for this?

www.archcloudlabs.com Page | 25
Break!
Grouping Related files Together

● NOTE: ELF Binary


● PDB: Debug string containing path to project on Host machine.
○ “The data is stored in a separate file from the executable to help limit the size of the executable,
saving disk storage space and reducing the time it takes to load the data. This methodology also
allows the executable to be distributed without disclosing this significant information which could
make the program easier to reverse engineer.” - MSDN Documentation

● IMPHash (Import Hashing): Hash the ways binaries import libraries

● Fuzzing Hashing (SSDEEP): Hashing N-number of bytes together to identify similar files.

www.archcloudlabs.com Page | 27
The Art of Recreating Analysis

● Why recreate analysis of other blogs?


○ Better understanding of the DFIR/RE process.
■ Opportunity to automate a process
○ Potentially identify things another analyst missed.
○ Start identifying common patterns/mechanisms in Malware
■ Ex: LoadLibrary/RegQueryA/etc…
● Find interesting patterns in your own samples!

www.archcloudlabs.com Page | 28
Recreating Analysis Case Study: Mandiant Blog

● Definitive Dossier of Devilish Debug Details – Part One: PDB Paths and Malware
○ Steve Miller, Mandiant Blog 2019
● TL;DR linking together threat actors based on shared PDB file paths.
● Radare2 provides an easy way to gather this data from the command line

Shared PDB Paths Among APT-1 Threat Actors

www.archcloudlabs.com Page | 29
PDBs Can Also Be Very Telling

PDBs from VX Underground Malware Samples

www.archcloudlabs.com Page | 30
R2ELK - Radare2-to-ELK

● https://www.github.com/archcloudlabs/r2elk
● Automatically extract metadata from Executables and import them into
Elasticsearch
● Useful for bulk analysis to then upload into Elasticsearch

IMPHash & PDB Analysis of APT1 Samplest

www.archcloudlabs.com Page | 31
Analyzing our Data in Kibana/Elasticsearch

● First, execute the following from the home folder: $> docker compose up -d
● Kibana: Front end to Elasticsearch.
● Elasticsearch: popular Open Source database for logs.
○ Observibility/Security/Analytics/etc…
● Our provisioning scripts already have this setup and running.
● Browse to your Vagrant IP on port 5601 in a web browser to access Kibana

www.archcloudlabs.com Page | 32
Analyzing our Data in Kibana/Elasticsearch

www.archcloudlabs.com Page | 33
Analyzing our Data in Kibana/Elasticsearch

https://github.com/Yara-Rules/rules/blob/85cb1fad9a58efedc71f696eb334e0226a166ba0/malware/APT_APT1.yar#L950

www.archcloudlabs.com Page | 34
Lab - Data Extraction & Labeling

www.archcloudlabs.com Page | 35
Lab - Data Extraction & Labeling

Data Set: https://tinyurl.com/yc6j7cmr

www.archcloudlabs.com Page | 36
Beyond The Class - Automating Ingestion

● We’ve been creating JSON/CSV files and uploading them manually via Kibana.
● What if we just ingested them directly into Elastic with our tool?
● Investigate how to leverage the Python API for Elastic to ingest data directly from our
parsing utility into Elasticsearch.

www.archcloudlabs.com Page | 37
Break!
Identifying Interesting Samples to Reverse
● Statically analyzing any single sample
takes time.
● When choosing what to analyze
consider what you’re looking to get out
of it:
○ Just to have fun Google Trends - “Ransomware”

○ More knowledge about a given


TTP
○ Getting better at a tool
○ Demonstrating to an employer a
specific skill set

Google Trends - “Cobalt Strike”

www.archcloudlabs.com Page | 39
Standing on The Shoulders of Giants: Public Feeds

www.archcloudlabs.com Page | 40
Why integrate with public feeds?
● These services are widely used across industry.
● This is a data enrichment/software development
activity that can help you analyze Malware analysis
trends.
○ Malware Analysis/Threat Intel++
■ See the malware trends as reported by
other organizations.
○ Software Development++
■ How do we build services to go forth and
fetch this data?
○ DevOps++
■ How do we automate, update and deploy
these services?

www.archcloudlabs.com Page | 41
Hybrid Analysis Public Freed

● https://www.hybrid-analysis.com/feed?json
● Public feed of JSON results from sandbox execution
● Data includes substantial artifacts form execution:
○ Registry keys
○ “maliciousness score”
○ IP addresses/Domains
○ File hashes
○ Process spawned
○ Files extracted
○ File size

www.archcloudlabs.com Page | 42
Joe Sandbox - Community Edition

www.archcloudlabs.com Page | 43
VirusTotal - Graphs

www.archcloudlabs.com Page | 44
VirusTotal - Graphs

● Community edition accounts get to use Graph.


○ These can be created for free.
○ API access to further enrich data.
● Browse other’s graphs for inspiration
● However, your graphs are made public

https://support.virustotal.com/hc/en-us/articles/360002138677-Does-VT-Graph-consume-quota-How-is-it-measured-
www.archcloudlabs.com Page | 45
Break!
Lab - Obtaining Data from Public Feeds

www.archcloudlabs.com Page | 47
Would you like to know more?

● Free workshops/classes
○ https://malwareunicorn.org/#/
○ https://p.ost2.fyi/courses

● Looking for network forensic challenges?


○ https://www.malware-traffic-analysis.net/
■ LIVE malware traffic, be careful!
○ https://www.netresec.com/?page=PcapFiles
■ ISTS /CCDC Competition

www.archcloudlabs.com Page | 48
Finished!
Survey: tinyurl.com/43htcbst

Thank You

archcloudlabs@gmail.com www.archcloudlabs.com
Bonus - Building a “Pew Pew” Map

● What is IPInfo?
○ API to give Geolocation based on IP Address
○ $> curl ipinfo.io/<IPv4_HERE>?token=<TOKEN_HERE>
○ https://ipinfo.io/products/free-ip-database

● What is Cobalt Strike?


○ Prevalent adversary emulation tool.
■ Very configurable via MalleableC2
■ Beacon Object Files
○ Widely used by threat actors and red teams alike.

www.archcloudlabs.com Page | 50

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy