Let The Penguin Live - Writeup

EH4X CTF 2026 | Forensics | 500 pts

"In a colony of many, one penguin's path is an anomaly. Silence the crowd to hear the individual."

author - mahekfr

First Look

Alright so we download the challenge file and get a challenge.mkv. Cool, a video file. Let's see what we're working with.

Solve Flow

bash

$ ffprobe challenge.mkv

What we get:

Stream	Type	Codec	Details
0	Video	H.264	576x320, 24fps, ~63 sec
1	Audio	FLAC	"English (Stereo)", 44.1kHz, 2ch
2	Audio	FLAC	"English (5.1 Surround)", 44.1kHz, 2ch

And in the metadata:

TAG:title=Penguin
TAG:COMMENT=EH4X{k33p_try1ng}

The video is literally just penguins walking on ice lol. Cute but not helpful... yet.

Wait... Two Audio Tracks?

Okay hold up. Track 2 says "5.1 Surround" but it's actually stereo (2 channels). That's sus af.

Stream 2: flac, 44100 Hz, stereo    <-- says 5.1 but is stereo???

me: "why would you lie about your channel count bro"
the mkv file:

trust nobody

The Decoy Flag

That EH4X{k33p_try1ng} in the metadata? Yeah that's bait. Classic CTF move. The name literally tells you to keep trying. Don't fall for it.

me: submits EH4X{k33p_try1ng}
ctfd: "Incorrect"
me: *surprised pikachu*

Reading The Hint Properly

"In a colony of many, one penguin's path is an anomaly."

Colony of many = the two audio tracks that sound the same (the crowd)

"Silence the crowd to hear the individual."

Silence the crowd = subtract one track from the other to cancel out the common audio and isolate the hidden part.

This is basically the same technique used in vocal isolation / karaoke track creation. If two signals are almost identical, subtracting them cancels out the common parts and leaves only the difference -- the anomaly.

Audio Subtraction (The Solve)

Step 1: Extract both audio tracks

bash

ffmpeg -i challenge.mkv -map 0:1 -c copy track1_stereo.flac
ffmpeg -i challenge.mkv -map 0:2 -c copy track2_surround.flac

Quick hash check confirms they ARE different files:

bash

$ md5sum track1_stereo.flac track2_surround.flac
5228be96e4f8094f37c6c7baf7ae3c5e  track1_stereo.flac
cf1a2fcbda3c3ffb555b61004db29155  track2_surround.flac

Step 2: Phase-invert one track and mix (subtract)

We invert track 2 and add it to track 1. This cancels everything that's the same and leaves only the difference.

bash

ffmpeg -i track1_stereo.flac -i track2_surround.flac \
  -filter_complex "[1:a]aeval=-val(0)|-val(1)[inv];[0:a][inv]amix=inputs=2:duration=longest:normalize=0[out]" \
  -map "[out]" difference.wav

How this works (for the homies who wanna learn):

Track 1:     ████████████████████████████████
Track 2:     ████████████████HIDDEN██████████  (almost same but has extra data)

Track 1 - Track 2 = ________________HIDDEN__________  (common parts cancel out!)

It's basically destructive interference. Same principle as noise-cancelling headphones.

Step 3: Find where the hidden data is

Ran a quick energy analysis per second:

python

# RMS energy per 1-second window
 0.0s |  616.5  (background noise)
 ...
25.0s | 1875.9  <<<< SPIKE!
26.0s | 2213.2  <<<< SPIKE!
27.0s |  938.3
 ...

There's a clear burst of energy at 25-27 seconds. Everything else is just noise floor from tiny quantization differences.

Step 4: Spectrogram the difference

bash

# Normalize difference to full volume first
# (the diff is tiny, need to boost it to see anything)

ffmpeg -i difference_normalized.wav \
  -lavfi "showspectrumpic=s=3000x1500:legend=0:scale=log" \
  spectrogram.png

And BOOM:

 _______________________________________________
|                                               |
|  Frequency                                    |
|  ^                                            |
|  |     EH4X{0n3_tr4ck_m1nd                    |
|  |       _tw0_tr4ck_f1les}                    |
|  |                                            |
|  +-----------------------------> Time         |
|_______________________________________________|

THE FLAG IS LITERALLY WRITTEN IN THE SPECTROGRAM. Text hidden in audio frequencies. Beautiful.

Flag

EH4X{0n3_tr4ck_m1nd_tw0_tr4ck_f1les}

Which translates to: "one track mind, two track files" -- a cheeky reference to the solve technique itself.

What We Learned (Educational Breakdown)

1. MKV Container Forensics

MKV (Matroska) files can contain multiple streams -- video, multiple audio tracks, subtitles, attachments, fonts, etc. Always enumerate ALL streams:

bash

ffprobe -v quiet -show_streams file.mkv

2. Audio Steganography via Dual Tracks

The technique here is:

Take original audio
Add a hidden signal (the spectrogram text) to a copy
Embed both as separate tracks in the same container

The hidden signal is so quiet compared to the main audio that if you just listen, both tracks sound identical. But mathematically, they're different.

3. Audio Phase Cancellation / Subtraction

Signal A = Original + Hidden
Signal B = Original

A - B = Hidden

This is the same math behind:

Noise-cancelling headphones (mic picks up ambient noise, inverts it, adds to audio)
Vocal removal for karaoke (center-panned vocals cancel when you subtract L-R)
Audio forensics (comparing recordings to find tampering)

4. Spectrogram Steganography

Audio spectrograms display frequency content over time. You can paint arbitrary images (including text) into the frequency domain using tools like:

Coagula (Windows) - converts images to audio
Sonic Visualiser - great for viewing spectrograms
Audacity - Analyze > Spectrogram view
sox - CLI spectrogram generation

bash

# Quick spectrogram with sox
sox audio.wav -n spectrogram -o output.png

# Or with ffmpeg
ffmpeg -i audio.wav -lavfi showspectrumpic=s=1920x1080 spectrogram.png

5. Decoy Flags

Always be suspicious of metadata flags that are too easy to find. Real CTF challenges don't hand you the answer in plaintext metadata... unless the challenge is literally about reading metadata. When the comment says k33p_try1ng, take the hint.

Tools Used

Tool	Purpose
`ffprobe`	Stream enumeration and metadata
`ffmpeg`	Audio extraction, phase inversion, spectrogram
`python3 + numpy`	Sample-level comparison, energy analysis
`md5sum`	Quick file comparison

TL;DR Solve Flow

challenge.mkv
    |
    v
ffprobe --> spot 2 audio tracks (one is lying about being 5.1)
    |
    v
extract both tracks
    |
    v
subtract Track1 - Track2 (phase inversion)
    |
    v
normalize & amplify the tiny difference
    |
    v
generate spectrogram --> flag is written in the frequencies
    |
    v
EH4X{0n3_tr4ck_m1nd_tw0_tr4ck_f1les}

First blood on this one btw, zero solves before us. felt good ngl.

GG mahekfr, clean challenge.

Let The Penguin Live

Let The Penguin Live - Writeup

EH4X CTF 2026 | Forensics | 500 pts

First Look

Wait... Two Audio Tracks?

The Decoy Flag

Reading The Hint Properly

Audio Subtraction (The Solve)

Step 1: Extract both audio tracks

Step 2: Phase-invert one track and mix (subtract)

Step 3: Find where the hidden data is

Step 4: Spectrogram the difference

Flag

What We Learned (Educational Breakdown)

1. MKV Container Forensics

2. Audio Steganography via Dual Tracks

3. Audio Phase Cancellation / Subtraction

4. Spectrogram Steganography

5. Decoy Flags

Tools Used

TL;DR Solve Flow

More from Eh4x CTF

Flight Risk

Inferno Sprint

Pathfinder