Leaderboard
| Name | Model | Algorithm | Model Baseline | After Training | Improvement | Link | |||
|---|---|---|---|---|---|---|---|---|---|
| Pass@8 | Maj@8 | Pass@8 | Maj@8 | Pass@8 | Maj@8 | ||||
| HackSynth-GRPO | Llama-3.1-8B | GRPO | 0.10 | 0.02 | 0.90 | 0.14 | 0.80 | 0.12 | Paper |
Challenge Generation Pipeline
Cryptographic Algorithms
The following categories summarize the types of cryptographic algorithms implemented in the benchmark:
Archetypes
Classical
- Caesar
- Vigenere
- Playfair
- Hill
- Rail Fence
- Substitution
- Transposition
- Autokey
- Base64 Layered
- Morse Code
- Fibonacci Encoding
- XOR
- Base64
- Base64 Layered
- Base85
- Base85 Layered
- Substitution Direct
- Atbash
- Hex
- ASCII Shift
- Split Flag
- Reversed Flag
- Chunked Flag
RSA
- Small Primes
- Repeated Prime Usage
- Partial Key Exposure
- Common Factors
- Shared Prime
- Blum Integers
AES
- AES-GCM
- AES-CCM
- AES-XTS
- AES-CFB
Hash
- MD5 Reverse
- Poor Random Salt
- Iterated Hash Challenge
ECC
- Small-order curves
- Faulty curve parameters
- Reused nonce (ECDSA)
PRNG
- Predictable seed
- Time-based seed
- Low-entropy generator
- LFSR weakness
- Congruential generator flaw
Web Crypto
- JWT 'none' algorithm
- Weak cookie encryption
- Broken key exchange
- Insecure session token
Signature Schemes
- ECDSA nonce reuse
- RSA sign with low public exponent
Training Reward Progression
How to Cite
@article{muzsai2025improving,
title={Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges},
author={Muzsai, Lajos and Imolai, David and Luk{\'a}cs, Andr{\'a}s},
journal={arXiv preprint arXiv:2506.02048},
year={2025}
}