Epstein Files

Eh4x CTFby smothy

Epstein Files - 366 pts (Web/ML)

Author: skjeks CTF: EH4X CTF URL: http://chall.ehax.in:4529/

Challenge Description

Have you ever wondered that if there can be a model which can predict that if a person was in epstein files or not? For that we are conducting an epstein file ai model competition but your model to be actually epstein worthy you need to get accuracy of 0.69. Proof that your model is epstein worthy.

Recon

The challenge presents a Kaggle-style ML competition page. It provides:

  • train.csv (1516 rows) — Names with features (Category, Bio, Aliases, Flights, Documents, Connections, Nationality) and a binary target In Black Book
  • test.csv (2276 rows) — Same features, no target column
  • sample_sub.csv — Example submission format (just the In Black Book column)

The submission form accepts .csv or .pkl files.

The Trap: Being Too Good

The natural instinct is to build the best model possible. A simple Random Forest with basic feature engineering achieves 94% accuracy on the test set. But the server responds:

STATUS // SUB-OPTIMAL Directive incomplete. The submitted model fails to meet the strict unredaction threshold.

94% is "sub-optimal"? The key realization: the challenge doesn't want accuracy >= 0.69, it wants accuracy == 0.69. The entire challenge is themed around the number 69 (the challenge ID on CTFd was literally 69).

Solution

Step 1: Train a strong baseline model

python
from sklearn.ensemble import RandomForestClassifier

# Feature engineering: numeric cols, category encoding, bio keywords, etc.
model = RandomForestClassifier(n_estimators=200, random_state=42)
model.fit(X_train, y_train)
preds = model.predict(X_test)  # 94% accuracy

Step 2: Degrade accuracy to exactly 0.69

With 94% accuracy (~2139/2276 correct), we need ~1570/2276 correct (69%). The strategy: randomly flip predictions to introduce controlled errors.

The math for random flipping:

  • Each flip has ~94% chance of turning a correct prediction wrong
  • Each flip has ~6% chance of turning a wrong prediction correct
  • Net change per flip: -0.88 correct predictions
  • Flips needed: (2139 - 1570) / 0.88 ≈ 647
python
np.random.seed(42)
all_indices = np.random.permutation(len(preds))
flip_indices = all_indices[:620]  # tuned via binary search
degraded = preds.copy()
degraded[flip_indices] = 1 - degraded[flip_indices]

Step 3: Binary search for the right flip count

Since the math is approximate, I submitted with different flip counts:

FlipsAccuracy
64768%
62069%

Flipping 620 predictions hit the target.

Step 4: Submit

bash
curl -X POST -F "submission=@sub_620.csv" http://chall.ehax.in:4529/submit

The server responds with the declassified flag page.

Red Herrings

  • Pickle upload (.pkl): Accepting pickle files hinted at deserialization RCE, but all pickle payloads returned a generic error. This was either sandboxed or a distraction.
  • Building a better model: Higher accuracy was penalized, not rewarded.

Flag

EH4X{epst3in_d1dnt_k1ll_h1ms3lf_but_th1s_m0d3l_d1d}