-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Improving auto-generated dictionary of Cmplog #2493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
thanks for looking into this! |
This is the result that I extracted from the data from the fuzzbench run:
so overall it looks like it gives a marginal improvement - which is already good! |
@am009 ping |
Have been quite busy with other things these days. I will back to this on Sunday. |
if you want another fuzzbench run, tell me |
Yes, I would like to run it again. I still quite hope to have a publicly available HTML report. |
Background: We observed that on FuzzBench, Honggfuzz performs better than AFL++ on the proj4 benchmark (for example, here). Over the past month, we have been investigating the reasons and attempting to further improve AFL++.
For proj4, we noticed that Honggfuzz’s auto-generated dictionary is of much higher quality than AFL++’s. Honggfuzz employs a simple strategy when collecting string constants for its dictionary: during
memcmp
/strcmp
operations, it checks whether one of the pointers originates from the ELF file’s mapped memory (typically writable or non-writable data sections), excluding pointers from the heap. For example, in string constant comparisons, the data being compared usually points to the heap, while the string constant pointer points to the ELF’s read-only data section. When porting this to AFL++, we marked this information in an unused field of the cmplog’srtn
entry.We initially ported Honggfuzz’s strategy (Version 1,
aflplusplus_hfdictv1
), but the generated dictionary still contained many low-quality bytes. We modified AFL++ to log auto-dictionary additions per code location (Here), revealing thatlocation3
andlocation4
(link) added numerous suboptimal dictionary entries, typically 32 bytes in size.AFL++’s cmplog will even instruments functions with signatures similar to
memcmp
/strcmp
(and also ignoring the size from 3rd arg, see this code), it directly records the maximum 32 bytes from the memory pointed to by the first two arguments as a cmplogrtn
entry when encountering such functions. The related condition checks (link) are quite loose, and over 90% of auto-dictionary entries originate there. After further filtering these entries, we achieved better results (Version 2,aflplusplus_hfdictv2
).Local FuzzBench Results
Fuzzers:
aflplusplus_hfdictv1
: First version with Honggfuzz’s auto-dictionary logic.aflplusplus_hfdictv2
: Filtered out 32-byte dictionaries from cmplogrtn
entries.aflplusplus_proj4dict
: Auto-dictionary entries extracted from a 23-hour Honggfuzz instance (converted and fed to AFL++).honggfuzz_orig
: Original Honggfuzz from FuzzBench.aflplusplus_recent
: Recent stable AFL++ version.On the
proj4_proj_crs_to_crs_fuzzer
benchmark, AFL++ now performs as good as Honggfuzz:hfdict-base-aflpphfdictv2-23h.zip
Testing across 4 other benchmarks also showed improvements on some other benchmarks:
report-5-benchmarks.zip
We are also requesting public FuzzBench experiments to further validate the results across more benchmarks (experiments request PR link).
unused
field incmpfn_operands