Epstein Files - 366 pts (Web/ML)
Author: skjeks
CTF: EH4X CTF
URL: http://chall.ehax.in:4529/
Challenge Description
Have you ever wondered that if there can be a model which can predict that if a person was in epstein files or not? For that we are conducting an epstein file ai model competition but your model to be actually epstein worthy you need to get accuracy of 0.69. Proof that your model is epstein worthy.
Recon
The challenge presents a Kaggle-style ML competition page. It provides:
- train.csv (1516 rows) — Names with features (Category, Bio, Aliases, Flights, Documents, Connections, Nationality) and a binary target
In Black Book - test.csv (2276 rows) — Same features, no target column
- sample_sub.csv — Example submission format (just the
In Black Bookcolumn)
The submission form accepts .csv or .pkl files.
The Trap: Being Too Good
The natural instinct is to build the best model possible. A simple Random Forest with basic feature engineering achieves 94% accuracy on the test set. But the server responds:
STATUS // SUB-OPTIMAL
Directive incomplete. The submitted model fails to meet the strict unredaction threshold.
94% is "sub-optimal"? The key realization: the challenge doesn't want accuracy >= 0.69, it wants accuracy == 0.69. The entire challenge is themed around the number 69 (the challenge ID on CTFd was literally 69).
Solution
Step 1: Train a strong baseline model
from sklearn.ensemble import RandomForestClassifier
# Feature engineering: numeric cols, category encoding, bio keywords, etc.
model = RandomForestClassifier(n_estimators=200, random_state=42)
model.fit(X_train, y_train)
preds = model.predict(X_test) # 94% accuracyStep 2: Degrade accuracy to exactly 0.69
With 94% accuracy (~2139/2276 correct), we need ~1570/2276 correct (69%). The strategy: randomly flip predictions to introduce controlled errors.
The math for random flipping:
- Each flip has ~94% chance of turning a correct prediction wrong
- Each flip has ~6% chance of turning a wrong prediction correct
- Net change per flip:
-0.88correct predictions - Flips needed:
(2139 - 1570) / 0.88 ≈ 647
np.random.seed(42)
all_indices = np.random.permutation(len(preds))
flip_indices = all_indices[:620] # tuned via binary search
degraded = preds.copy()
degraded[flip_indices] = 1 - degraded[flip_indices]Step 3: Binary search for the right flip count
Since the math is approximate, I submitted with different flip counts:
| Flips | Accuracy |
|---|---|
| 647 | 68% |
| 620 | 69% |
Flipping 620 predictions hit the target.
Step 4: Submit
curl -X POST -F "submission=@sub_620.csv" http://chall.ehax.in:4529/submitThe server responds with the declassified flag page.
Red Herrings
- Pickle upload (.pkl): Accepting pickle files hinted at deserialization RCE, but all pickle payloads returned a generic error. This was either sandboxed or a distraction.
- Building a better model: Higher accuracy was penalized, not rewarded.
Flag
EH4X{epst3in_d1dnt_k1ll_h1ms3lf_but_th1s_m0d3l_d1d}