Epstein Files - 366 pts (Web/ML)

Author: skjeks CTF: EH4X CTF URL: http://chall.ehax.in:4529/

Challenge Description

Have you ever wondered that if there can be a model which can predict that if a person was in epstein files or not? For that we are conducting an epstein file ai model competition but your model to be actually epstein worthy you need to get accuracy of 0.69. Proof that your model is epstein worthy.

Recon

The challenge presents a Kaggle-style ML competition page. It provides:

train.csv (1516 rows) — Names with features (Category, Bio, Aliases, Flights, Documents, Connections, Nationality) and a binary target In Black Book
test.csv (2276 rows) — Same features, no target column
sample_sub.csv — Example submission format (just the In Black Book column)

The submission form accepts .csv or .pkl files.

The Trap: Being Too Good

The natural instinct is to build the best model possible. A simple Random Forest with basic feature engineering achieves 94% accuracy on the test set. But the server responds:

STATUS // SUB-OPTIMAL
Directive incomplete. The submitted model fails to meet the strict unredaction threshold.

94% is "sub-optimal"? The key realization: the challenge doesn't want accuracy >= 0.69, it wants accuracy == 0.69. The entire challenge is themed around the number 69 (the challenge ID on CTFd was literally 69).

Solution

Step 1: Train a strong baseline model

python

from sklearn.ensemble import RandomForestClassifier

# Feature engineering: numeric cols, category encoding, bio keywords, etc.
model = RandomForestClassifier(n_estimators=200, random_state=42)
model.fit(X_train, y_train)
preds = model.predict(X_test)  # 94% accuracy

Step 2: Degrade accuracy to exactly 0.69

With 94% accuracy (~2139/2276 correct), we need ~1570/2276 correct (69%). The strategy: randomly flip predictions to introduce controlled errors.

The math for random flipping:

Each flip has ~94% chance of turning a correct prediction wrong
Each flip has ~6% chance of turning a wrong prediction correct
Net change per flip: -0.88 correct predictions
Flips needed: (2139 - 1570) / 0.88 ≈ 647

python

np.random.seed(42)
all_indices = np.random.permutation(len(preds))
flip_indices = all_indices[:620]  # tuned via binary search
degraded = preds.copy()
degraded[flip_indices] = 1 - degraded[flip_indices]

Step 3: Binary search for the right flip count

Since the math is approximate, I submitted with different flip counts:

Flips	Accuracy
647	68%
620	69%

Flipping 620 predictions hit the target.

Step 4: Submit

bash

curl -X POST -F "submission=@sub_620.csv" http://chall.ehax.in:4529/submit

The server responds with the declassified flag page.

Red Herrings

Pickle upload (.pkl): Accepting pickle files hinted at deserialization RCE, but all pickle payloads returned a generic error. This was either sandboxed or a distraction.
Building a better model: Higher accuracy was penalized, not rewarded.

Flag

EH4X{epst3in_d1dnt_k1ll_h1ms3lf_but_th1s_m0d3l_d1d}

Epstein Files