Lazy Bigrams

LACTFby smothy

lazy-bigrams - LA CTF 2025 Crypto Writeup

Category: Crypto Difficulty: Medium Flag: lactf{n0t_r34lly_4_b1gr4m_su8st1tu7ion_bu7_1_w1ll_tak3_1t_f0r_n0w} Solved by: Smothy @ 0xN1umb


Hackerman

"When you NATO phonetic alphabet so hard you NATO phonetic alphabet your NATO phonetic alphabet"


Challenge Description

A simple bigram substitution cipher... with a twist.

We get chall.py (the encryption script) and ct.txt (2158 characters of uppercase ciphertext). The encryption pipeline is delightfully cursed: take the flag, convert each character to its NATO phonetic word, then do it AGAIN on the result, then apply a random bigram substitution cipher. Double phonetic encoding go brrr.

TL;DR

Double phonetic mapping + bigram substitution. The known flag prefix lactf{ expands to 215 characters of known plaintext through the double-phonetic pipeline, giving us 61 substitution table entries for free. Backtracking through possible flag characters with constraint propagation cracks the rest in under a second. Crypto? More like known-plaintext-attack speedrun.

Initial Recon

Two files: the encryption script and the ciphertext output.

The ciphertext is 2158 uppercase characters - all A-Z, no spaces, no numbers. Clean bigram substitution output. Let's read the source.

Step 1: Understanding the Encryption Pipeline

The encryption has three stages:

python
def phonetic_mapping(ptext):
    cleanptext = re.sub(r'[^a-zA-Z0-9_{}]', '', ptext).upper()
    mapped = "".join([phonetic_map[c] for c in cleanptext])
    if (len(mapped) % 2 == 1):
        mapped += "X"  # Pad to even length
    return mapped

pt = phonetic_mapping(phonetic_mapping(flag))  # DOUBLE phonetic!
ct = encryption(pt)  # Random bigram substitution

So each flag character goes through:

  1. First phonetic: a -> ALPHA, _ -> UNDERSCORE, { -> OPENCURLYBRACE, etc.
  2. Second phonetic: Each letter of the first result gets phonetic'd again: A -> ALPHA, L -> LIMA, etc.
  3. Bigram substitution: Pairs of characters are randomly permuted (676 possible bigrams AA-ZZ, shuffled).

The key observation: this is a substitution cipher with massive known plaintext from the flag format.

Step 2: Quantifying the Known Plaintext

Each flag character expands to a LOT of characters through double-phonetic:

The expansion sizes range from 15 chars (6 -> SIX -> 15) to 56 chars (_ -> UNDERSCORE -> 56). For the homies who haven't done the math: a 6-character prefix lactf{ blows up to 215 characters of known plaintext. That's nearly 10% of the entire ciphertext, for free!

From just the prefix, we recover 61 unique bigram substitution entries. Out of 676 total possible bigrams, that's already a solid foundation.

Step 3: The Carry Problem

Here's where it gets spicy. The prefix expansion is 215 characters - that's odd. Bigram substitution works on pairs (positions 0-1, 2-3, 4-5...), so the 215th character of the prefix is the first half of a bigram whose second half comes from the first unknown flag character.

PT: ...ECHO | ??? (first unknown char expansion) ^---^ This bigram crosses the boundary!

We need to track this "carry" character as we try different flag characters. If a character's expansion starts with A and our carry is O, then the boundary bigram is OA, which must be consistent with the ciphertext at that position.

Step 4: Padding Cases

The phonetic_mapping function pads with X if the result is odd length. This happens at both levels:

  • First-level padding: Adds X at end of first phonetic result, which then expands to XRAY -> XRAYROMEOALPHAYANKEE (20 chars) at second level
  • Second-level padding: Adds a single X at the very end

Four combinations to try. The winning case turned out to be no first-level padding, yes second-level padding, giving us a middle target of 1861 characters.

Step 5: Backtracking Solver

The solver is elegant in its brutality:

python
def bt_with_carry(ct, pt_pos_start, remaining, carry, sub_map, rev_sub, path, depth):
    if remaining == 0:
        return path
    if remaining < 0 or remaining < 15:  # Min expansion is 15 (for '6')
        return None

    for c in "abcdefghijklmnopqrstuvwxyz0123456789_":
        expansion = char_to_double[c]
        full_seq = carry + expansion

        # Check every bigram pair against known substitution entries
        # Both forward (PT->CT) and reverse (CT->PT) must be consistent
        consistent = check_consistency(full_seq, ct, pt_pos_start, sub_map, rev_sub)

        if consistent:
            # Add new substitution entries, recurse
            result = bt_with_carry(ct, pt_pos_start + len(expansion),
                                   remaining - len(expansion), new_carry, ...)
            if result:
                return [c] + result
            # Rollback on failure

The magic is constraint propagation: each correctly placed flag character reveals new substitution table entries that constrain future characters. By the time we're a few characters in, most wrong guesses are eliminated instantly. The whole thing runs in under a second.

The Flag

lactf{n0t_r34lly_4_b1gr4m_su8st1tu7ion_bu7_1_w1ll_tak3_1t_f0r_n0w}

Decoded: "not really a bigram substitution but I will take it for now"

Fair point, challenge author. A bigram substitution on double-phonetic-encoded text is indeed not really a traditional bigram cipher. But we'll take the flag, thanks.

The Solve Script

python
"""
lazy-bigrams solver - LA CTF 2025
Smothy @ 0xN1umb
"""
import re

phonetic_map = {
    "A":"ALPHA","B":"BRAVO","C":"CHARLIE","D":"DELTA","E":"ECHO",
    "F":"FOXTROT","G":"GOLF","H":"HOTEL","I":"INDIA","J":"JULIETT",
    "K":"KILO","L":"LIMA","M":"MIKE","N":"NOVEMBER","O":"OSCAR",
    "P":"PAPA","Q":"QUEBEC","R":"ROMEO","S":"SIERRA","T":"TANGO",
    "U":"UNIFORM","V":"VICTOR","W":"WHISKEY","X":"XRAY","Y":"YANKEE",
    "Z":"ZULU","_":"UNDERSCORE","{":"OPENCURLYBRACE","}":"CLOSECURLYBRACE",
    "0":"ZERO","1":"ONE","2":"TWO","3":"THREE","4":"FOUR","5":"FIVE",
    "6":"SIX","7":"SEVEN","8":"EIGHT","9":"NINE"
}

# Precompute double-phonetic expansion for each flag character
possible = "abcdefghijklmnopqrstuvwxyz0123456789_"
char_to_double = {}
for c in list(possible) + ["{", "}"]:
    first = phonetic_map[c.upper()]
    second = "".join(phonetic_map[ch] for ch in first)
    char_to_double[c] = second

with open("ct.txt") as f:
    ct = f.read().strip()

# Build substitution table from known prefix "lactf{"
prefix_exp = "".join(char_to_double[c] for c in "lactf{")
suffix_exp = char_to_double["}"]
sub_map, rev_sub = {}, {}

for i in range(0, len(prefix_exp) - 1, 2):
    pb, cb = prefix_exp[i:i+2], ct[i:i+2]
    sub_map[pb] = cb
    rev_sub[cb] = pb

# Add suffix mappings (second-level padding case)
suffix_start = len(ct) - len(suffix_exp) - 1  # -1 for second-level X pad
for i in range(0, len(suffix_exp) - 1, 2):
    pb, cb = suffix_exp[i:i+2], ct[suffix_start+i:suffix_start+i+2]
    sub_map.setdefault(pb, cb)
    rev_sub.setdefault(cb, pb)

def bt(pt_pos, remaining, carry, path):
    if remaining == 0: return path
    if remaining < 15: return None
    for c in possible:
        exp = char_to_double[c]
        if len(exp) > remaining: continue
        if remaining - len(exp) > 0 and remaining - len(exp) < 15: continue

        full = carry + exp
        ok, pairs = True, []
        for j in range(0, len(full)-1, 2):
            pb, cb = full[j:j+2], ct[pt_pos-len(carry)+j:pt_pos-len(carry)+j+2]
            if pb in sub_map and sub_map[pb] != cb: ok = False; break
            if cb in rev_sub and rev_sub[cb] != pb: ok = False; break
            if pb not in sub_map and cb not in rev_sub: pairs.append((pb,cb))
        if not ok: continue

        added = []
        for pb, cb in pairs:
            if pb in sub_map or cb in rev_sub: continue
            sub_map[pb]=cb; rev_sub[cb]=pb; added.append((pb,cb))

        nc = full[-1] if len(full)%2 else ""
        r = bt(pt_pos+len(exp), remaining-len(exp), nc, path+[c])
        if r: return r
        for pb,cb in added: del sub_map[pb]; del rev_sub[cb]
    return None

middle = len(ct) - len(prefix_exp) - len(suffix_exp) - 1
carry = prefix_exp[-1] if len(prefix_exp)%2 else ""
result = bt(len(prefix_exp), middle, carry, [])
print(f"lactf{{''.join(result)}}")

The Graveyard of Failed Attempts

  1. First attempt: forgot about carry characters. The prefix expansion is 215 chars (odd!), so the boundary bigram between prefix and middle crosses the known/unknown boundary. Assertion errors everywhere until I realized I needed to track partial bigrams.

  2. Wrong padding assumption. Initially tried "no padding at all" - the suffix mapping came back with conflicts. Had to systematically try all 4 padding combinations (first-level pad x second-level pad).

  3. Greedy solver that tried to parse phonetic words. Tried to parse the partially-decrypted text into phonetic words and use structure to fill gaps. Way overengineered. The backtracking approach was cleaner and faster.

Key Takeaways

  1. Known plaintext is king. Even 6 known characters of flag format gave us 215 chars (9.96%) of known plaintext. Always compute how much you can derive from format strings.

  2. Double encoding amplifies known plaintext. Each layer of phonetic mapping multiplies the known prefix length by roughly 5x. Two layers = 25x amplification. Beautiful.

  3. Bigram substitution is NOT strong. With ~60 known bigram pairs out of 676, constraint propagation eliminates most possibilities. The search space collapses fast.

  4. Handle boundary conditions carefully. The odd-length prefix creating a "carry" character was the trickiest part. Always check for off-by-one issues in cipher implementations.

  5. Try all padding combinations. When there are multiple possible states (padding/no padding), enumerate them systematically rather than guessing.

Tools Used

  • Python 3 (solver + matplotlib for screenshots)
  • Brain cells (like 3 of them)
  • Way too much caffeine

Writeup by Smothy from 0xN1umb team. Double phonetic? More like double trouble. GG.