Random-Crypto Benchmark

Leaderboard

Name	Model	Algorithm	Model Baseline		After Training		Improvement		Link
Name	Model	Algorithm	Pass@8	Maj@8	Pass@8	Maj@8	Pass@8	Maj@8	Link
HackSynth-GRPO	Llama-3.1-8B	GRPO	0.10	0.02	0.90	0.14	0.80	0.12	Paper

Challenge Generation Pipeline

Cryptographic Algorithms

The following categories summarize the types of cryptographic algorithms implemented in the benchmark:

Archetypes

Classical

Caesar
Vigenere
Playfair
Hill
Rail Fence
Substitution
Transposition
Autokey
Base64 Layered
Morse Code
Fibonacci Encoding
XOR
Base64
Base64 Layered
Base85
Base85 Layered
Substitution Direct
Atbash
Hex
ASCII Shift
Split Flag
Reversed Flag
Chunked Flag

RSA

Small Primes
Repeated Prime Usage
Partial Key Exposure
Common Factors
Shared Prime
Blum Integers

AES

AES-GCM
AES-CCM
AES-XTS
AES-CFB

Hash

MD5 Reverse
Poor Random Salt
Iterated Hash Challenge

ECC

Small-order curves
Faulty curve parameters
Reused nonce (ECDSA)

PRNG

Predictable seed
Time-based seed
Low-entropy generator
LFSR weakness
Congruential generator flaw

Web Crypto

JWT 'none' algorithm
Weak cookie encryption
Broken key exchange
Insecure session token

Signature Schemes

ECDSA nonce reuse
RSA sign with low public exponent

Training Reward Progression

Reinforcement learning reward progression across training styles — Reward progression curves over the course of RL training for three different styles. The plot shows how training stability and final performance vary by learning strategy. See our paper for detailed analysis and discussion.

How to Cite

@article{muzsai2025improving,
  title={Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges},
  author={Muzsai, Lajos and Imolai, David and Luk{\'a}cs, Andr{\'a}s},
  journal={arXiv preprint arXiv:2506.02048},
  year={2025}
}