🚀 Pipeline | 📚 Benchmark | 🤗 Models | 🖥️ Evaluation | 📊 Experiment | 📎 Quick Start
- Edit Similarity
HumanEval | O0 | O1 | O2 | O3 | AVG |
---|---|---|---|---|---|
GPT-4.1-mini | 46.09 | 33.83 | 34.75 | 29.66 | 36.08 |
IDA | 25.47 | 21.01 | 20.18 | 17.92 | 21.15 |
LLM4Decompile-End-1.3B | 43.37 | 36.91 | 36.76 | 36.30 | 38.34 |
Idioms-1.3B | 48.84 | 38.08 | 35.93 | 34.66 | 39.35 |
LLM4Decompile-DCBench-1.3B | 54.36 | 43.54 | 44.21 | 42.78 | 46.22 |
LLM4Decompile-DCBench-6.7B | 62.32 | 51.91 | 51.66 | 52.99 | 54.72 |
Claude-Sonnet-4-reasoning | 60.75 | 48.45 | 47.12 | 46.40 | 50.68 |
MBPP | O0 | O1 | O2 | O3 | AVG |
---|---|---|---|---|---|
GPT-4.1-mini | 47.52 | 37.34 | 39.15 | 32.63 | 39.16 |
IDA | 27.66 | 23.63 | 22.01 | 19.43 | 23.18 |
LLM4Decompile-End-1.3B | 44.82 | 39.67 | 39.01 | 38.13 | 40.41 |
Idioms-1.3B | 49.35 | 38.13 | 35.91 | 34.36 | 39.44 |
LLM4Decompile-DCBench-1.3B | 56.38 | 48.14 | 46.76 | 45.79 | 49.28 |
LLM4Decompile-DCBench-6.7B | 64.12 | 55.40 | 53.87 | 53.39 | 56.70 |
Claude-Sonnet-4-reasoning | 64.64 | 54.27 | 53.10 | 51.98 | 55.99 |
GitHub2025 | O0 | O1 | O2 | O3 | AVG |
---|---|---|---|---|---|
GPT-4.1-mini | 21.15 | 18.64 | 19.38 | 18.43 | 19.40 |
IDA | 22.17 | 18.21 | 19.69 | 18.93 | 19.75 |
LLM4Decompile-End-1.3B | 23.09 | 20.61 | 21.77 | 20.81 | 21.57 |
Idioms-1.3B | 30.27 | 24.04 | 25.09 | 24.18 | 25.90 |
llm4decompile-1.3b-DCBench | 30.99 | 29.21 | 30.23 | 27.59 | 29.51 |
llm4decompile-6.7b-DCBench | 34.29 | 32.74 | 34.18 | 29.96 | 32.79 |
Claude-Sonnet-4-reasoning | 36.29 | 32.80 | 33.12 | 31.32 | 33.38 |
ProRec | Edit Sim |
---|---|
GPT-4.1-mini | 34.74 |
IDA | 27.24 |
LLM4Decompile-End-1.3B | 34.26 |
Idioms-1.3B | 36.03 |
LLM4Decompile-DCBench-1.3B | 38.85 |
LLM4Decompile-DCBench-6.7B | 45.23 |
Claude-Sonnet-4-reasoning | 41.99 |
- Re-executability
HumanEval | O0 | O1 | O2 | O3 | AVG |
---|---|---|---|---|---|
GPT-4.1-mini | 21.95 | 11.58 | 10.07 | 10.06 | 13.42 |
IDA | 18.60 | 19.81 | 17.69 | 16.77 | 18.22 |
LLM4Decompile-End-1.3B | 26.22 | 12.81 | 14.03 | 13.42 | 16.22 |
Idioms-1.3B | 30.56 | 16.10 | 12.63 | 12.36 | 17.91 |
LLM4Decompile-DCBench-1.3B | 33.23 | 18.60 | 16.47 | 15.24 | 20.89 |
LLM4Decompile-DCBench-6.7B | 61.59 | 30.18 | 34.15 | 32.01 | 39.48 |
Claude-Sonnet-4-reasoning | 65.85 | 42.68 | 39.63 | 39.02 | 46.79 |
MBPP | O0 | O1 | O2 | O3 | AVG |
---|---|---|---|---|---|
GPT-4.1-mini | 31.37 | 16.74 | 16.64 | 14.79 | 19.89 |
IDA | 25.62 | 25.05 | 23.72 | 23.57 | 24.49 |
LLM4Decompile-End-1.3B | 29.16 | 16.99 | 17.92 | 18.07 | 20.54 |
Idioms-1.3B | 33.97 | 20.47 | 18.13 | 17.30 | 22.47 |
LLM4Decompile-DCBench-1.3B | 35.06 | 21.56 | 22.80 | 20.28 | 24.93 |
LLM4Decompile-DCBench-6.7B | 58.32 | 39.58 | 39.73 | 37.06 | 43.67 |
Claude-Sonnet-4-reasoning | 67.76 | 51.69 | 53.02 | 50.25 | 55.68 |
- R2I
HumanEval | O0 | O1 | O2 | O3 | AVG |
---|---|---|---|---|---|
GPT-4.1-mini | 62.38 | 52.63 | 55.68 | 53.90 | 56.14 |
IDA | 41.49 | 36.29 | 35.85 | 35.32 | 37.23 |
LLM4Decompile-End-1.3B | 65.69 | 60.48 | 60.66 | 59.37 | 61.55 |
Idioms-1.3B | 68.18 | 66.92 | 67.46 | 65.48 | 67.01 |
LLM4Decompile-DCBench-1.3B | 68.93 | 68.74 | 69.03 | 67.76 | 68.62 |
LLM4Decompile-DCBench-6.7B | 69.35 | 68.91 | 69.79 | 68.42 | 69.12 |
Claude-Sonnet-4-reasoning | 61.09 | 54.94 | 55.65 | 55.28 | 56.74 |
MBPP | O0 | O1 | O2 | O3 | AVG |
---|---|---|---|---|---|
GPT-4.1-mini | 61.79 | 55.34 | 57.05 | 55.83 | 57.50 |
IDA | 41.82 | 34.87 | 35.16 | 36.21 | 37.02 |
LLM4Decompile-End-1.3B | 67.93 | 63.47 | 65.69 | 63.01 | 65.03 |
Idioms-1.3B | 69.12 | 67.01 | 63.91 | 62.35 | 65.60 |
LLM4Decompile-DCBench-1.3B | 69.13 | 70.97 | 68.03 | 67.79 | 68.98 |
LLM4Decompile-DCBench-6.7B | 72.30 | 71.99 | 72.25 | 70.67 | 71.80 |
Claude-Sonnet-4-reasoning | 64.78 | 60.62 | 61.53 | 61.71 | 62.16 |
GitHub2025 | O0 | O1 | O2 | O3 | AVG |
---|---|---|---|---|---|
GPT-4.1-mini | 51.65 | 39.64 | 46.62 | 55.83 | 48.43 |
IDA | 45.87 | 38.85 | 36.99 | 36.20 | 39.48 |
LLM4Decompile-End-1.3B | 54.26 | 51.73 | 53.42 | 50.56 | 52.49 |
Idioms-1.3B | 61.76 | 58.06 | 53.26 | 51.19 | 56.07 |
LLM4Decompile-DCBench-1.3B | 64.40 | 65.72 | 61.74 | 63.31 | 63.79 |
LLM4Decompile-DCBench-6.7B | 72.67 | 70.23 | 66.55 | 67.76 | 69.30 |
Claude-Sonnet-4-reasoning | 55.70 | 43.88 | 45.04 | 51.71 | 49.08 |
ProRec | R2I |
---|---|
GPT-4.1-mini | 55.01 |
IDA | 38.35 |
LLM4Decompile-End-1.3B | 57.49 |
Idioms-1.3B | 64.86 |
LLM4Decompile-DCBench-1.3B | 65.73 |
LLM4Decompile-DCBench-6.7B | 66.15 |
Claude-Sonnet-4-reasoning | 57.38 |
- Variable naming
GitHub2025 | O0 | O1 | O2 | O3 | Average |
---|---|---|---|---|---|
GPT-4.1-mini | 48.99 | 42.24 | 43.07 | 39.98 | 43.57 |
IDA | 33.66 | 27.16 | 29.49 | 28.99 | 29.83 |
LLM4Decompile-End | 64.15 | 63.48 | 62.39 | 63.84 | 63.47 |
LLM4Decompile-DCBench | 76.38 | 77.18 | 77.53 | 76.69 | 76.95 |
- Control flow
GitHub2025 | O0 | O1 | O2 | O3 | Average |
---|---|---|---|---|---|
GPT-4.1-mini | 63.25 | 50.09 | 50.41 | 50.10 | 53.46 |
IDA | 63.28 | 59.42 | 60.35 | 60.62 | 60.92 |
LLM4Decompile-End | 73.75 | 73.49 | 73.61 | 74.65 | 73.88 |
LLM4Decompile-DCBench | 83.61 | 85.13 | 85.56 | 84.87 | 84.79 |
- Type recovery
GitHub2025 | O0 | O1 | O2 | O3 | Average |
---|---|---|---|---|---|
GPT-4.1-mini | 55.69 | 45.18 | 46.75 | 44.93 | 48.14 |
IDA | 63.53 | 60.37 | 62.29 | 61.67 | 61.97 |
LLM4Decompile-End | 76.47 | 77.82 | 78.98 | 77.42 | 77.67 |
LLM4Decompile-DCBench | 80.13 | 82.26 | 82.22 | 81.55 | 81.54 |
You are an expert in reverse-engineering and decompiler evaluation. I will give you a decompiled code snippet; your job is to evaluate it on three criteria:
1. variable_naming: How well the decompiler recovered meaningful variable names.
2. control_flow: How faithfully complex control-flow constructs (loops, branches, gotos) have been reconstructed.
3. type_recovery: How accurately types (primitives, structs, pointers, arrays, etc.) were inferred.
For each criterion:
• Assign an integer score from 1 (very poor) to 100 (excellent).
• Provide a one- or two-sentence rationale.
Produce only a single JSON object, with exactly these fields:
{
"variable_naming": {
"score": <int>,
"rationale": "<string>"
},
"control_flow": {
"score": <int>,
"rationale": "<string>"
},
"type_recovery": {
"score": <int>,
"rationale": "<string>"
}
}
Do not include any extraneous keys and directly output the result without any explanation.
The source code: {source code}.
Now evaluate this snippet: {decompiled code}
- [2025-05-21]: Release LLM4Decompile-DCBench, a 1.3 billion-parameter model trained on 10% of the Decompile-Bench, specifically designed to decompile C/C++ code.
- [2025-05-20]: Release Decompile-Bench, contains two million binary-source function pairs for training, and 70K function pairs for evaluation.
- Decompile-Bench is the first open-source dataset comprising two million binary-source function pairs condensed from 100 million collected function pairs, i.e., 450GB of binaries compiled from permissively licensed GitHub projects.
- Decompile-Bench-Eval includes manually crafted binaries from the well-established HumanEval and MBPP, alongside the compiled GitHub repositories released after 2025 to mitigate data leakage issues.
Compile-Trace-Filter framework that automates project compilation, precisely traces function‐level binary-source mappings, and applies robust filters to retain only high-quality pairs.
Decompile-Bench contains the following columns:
{
"name":"demangled name for the function",
"code":"source code",
"asm":"assembly",
"file":"source code path"
}
Decompile-Bench-Eval contains three splits, huameval, mbpp, and github2025. We also provide a json verison for the data. They contains the following columns:
{
"index":"index of the function",
"func_name":"demangled name for he function",
"func_dep":"function dependecies (includes, help functions), or the path to the source code",
"func":"source code",
"test":"unit tests for the function, empty for github data",
"opt":"optimization, O0, O1, O2, O3",
"language":"language, c or cpp",
"asm":"assembly",
"ida_asm":"assembly from ida pro",
"ida_pseudo":"decompiled results (pseudo code) from ida pro",
"ghidra_asm":"assembly from ghidra",
"ghidra_pseudo":"decompiled results (pseudo code) from ghidra"
}
Model | Checkpoint | Size | HumanEval-Decompile | Alias |
---|---|---|---|---|
llm4decompile-1.3b-v1.5 | 🤗 HF Link | 1.3B | 16.22% | LLM4Decompile-End |
llm4decompile-1.3b-v1.6 | 🤗 HF Link | 1.3B | 20.89% | LLM4Decompile-DCBench |
- Re-executability evaluates whether the decompiled code can execute properly and pass all the predefined test cases.
- Edit Similarity based on Levenshtein Distance, this metric captures the minimum number of insertions, deletions, or substitutions needed to turn the generated code into the reference.
For R2I, please refer to the source project.
- vllm >= 0.5.2
https://docs.vllm.ai/en/v0.5.2/getting_started/installation.html
IMPORTANT: the libs are required for the compilation, otherwise, the compilation will fail.
apt-get update
apt-get install -y libboost-dev libssl-dev
pip install editdistance
- Re-executability
python3 run_exe_rate.py \
--model_path LLM4Binary/llm4decompile-1.3b-v1.6 \
--dataset_path ./data/humaneval-decompile.json \
--output_path ./data/humaneval
- Edit Similarity
# Note that we assume the decompiled results are stored in the ./data/humaneval
python3 ./metrics/cal_edit_sim.py
Setup: Please use the script below to install the necessary environment.
git clone https://github.com/albertan017/LLM4Decompile.git
cd LLM4Decompile
conda create -n 'llm4decompile' python=3.9 -y
conda activate llm4decompile
pip install -r requirements.txt
Here is an example of how to use our model (For previous models, please check the corresponding model page at HF). Note: Replace the "func0" with the function name you want to decompile.
Preprocessing: Compile the C code into binary, and disassemble the binary into assembly instructions.
import subprocess
import os
func_name = 'func0'
OPT = ["O0", "O1", "O2", "O3"]
fileName = 'samples/sample' #'path/to/file'
for opt_state in OPT:
output_file = fileName +'_' + opt_state
input_file = fileName+'.c'
compile_command = f'gcc -o {output_file}.o {input_file} -{opt_state} -lm'#compile the code with GCC on Linux
subprocess.run(compile_command, shell=True, check=True)
compile_command = f'objdump -d {output_file}.o > {output_file}.s'#disassemble the binary file into assembly instructions
subprocess.run(compile_command, shell=True, check=True)
input_asm = ''
with open(output_file+'.s') as f:#asm file
asm= f.read()
if '<'+func_name+'>:' not in asm: #IMPORTANT replace func0 with the function name
raise ValueError("compile fails")
asm = func_name+':' + asm.split('<'+func_name+'>:')[-1].split('\n\n')[0] #IMPORTANT replace func0 with the function name
asm_clean = ""
asm_sp = asm.split("\n")
for tmp in asm_sp:
if len(tmp.split("\t"))<3 and '00' in tmp:
continue
idx = min(
len(tmp.split("\t")) - 1, 2
)
tmp_asm = "\t".join(tmp.split("\t")[idx:]) # remove the binary code
tmp_asm = tmp_asm.split("#")[0].strip() # remove the comments
asm_clean += tmp_asm + "\n"
input_asm = asm_clean.strip()
before = f"# This is the assembly code:\n"#prompt
after = "\n# What is the source code?\n"#prompt
input_asm_prompt = before+input_asm.strip()+after
with open(fileName +'_' + opt_state +'.asm','w',encoding='utf-8') as f:
f.write(input_asm_prompt)
Assembly instructions should be in the format:
FUNCTION_NAME:
OPERATIONS
OPERATIONS
Typical assembly instructions may look like this:
func0:
endbr64
lea (%rdi,%rsi,1),%eax
retq
Decompilation: Use LLM4Decompile to translate the assembly instructions into C:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_path = 'LLM4Binary/llm4decompile-1.3b-v1.6' # V1.6 Model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path,torch_dtype=torch.bfloat16).cuda()
with open(fileName +'_' + OPT[0] +'.asm','r') as f:#optimization level O0
asm_func = f.read()
inputs = tokenizer(asm_func, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=2048)### max length to 4096, max new tokens should be below the range
c_func_decompile = tokenizer.decode(outputs[0][len(inputs[0]):-1])
with open(fileName +'.c','r') as f:#original file
func = f.read()
print(f'original function:\n{func}')# Note we only decompile one function, where the original file may contain multiple functions
print(f'decompiled function:\n{c_func_decompile}')