# baby_discriminator

## Challenge Overview

We need to distinguish between two types of random vectors 200 times in a row to get the flag.

Bit 1: The vector (length 140) is completely random. Each element is generated independently.
Bit 0: The vector has a hidden structure. Each element v[j] (for j >= 5) is generated using a PRNG seeded with the previous 5 elements (v[j-5:j]). However, to make it harder, some elements are randomly replaced with noise.

We are provided with a baby-discriminator.py file:


bash
import random
from hashlib import md5, sha256
import secrets
import string
import numpy as np
import sys

try:
    from secret import flag
except ImportError:
    flag = "0ops{this_is_a_test_flag}"

window = 5
total_nums = 20000
vector_size = 140


def proof_of_work():
    challenge = ''.join(secrets.choice(string.ascii_letters + string.digits) for _ in range(8))
    difficulty = 6
    print(f"Proof of Work challenge:")
    print(f"sha256({challenge} + ???) starts with {'0' * difficulty}")
    
    sys.stdout.write("Enter your answer: ")
    sys.stdout.flush()
    answer = sys.stdin.readline().strip()
    
    hash_res = sha256((challenge + answer).encode()).hexdigest()
    if hash_res.startswith('0' * difficulty):
        return True
    return False


def choose_one(seed = None):
    p_v = 10 ** np.random.uniform(0, 13, size=total_nums)
    if seed is not None:
        seed_int = int(seed, 16)
        rng = np.random.default_rng(seed_int)
    else:
        rng = np.random.default_rng()
        
    us = rng.random(total_nums)
    return int(np.argmax(np.log(us) / p_v))


def get_vector(bit):
    if bit == 0:
        v = []
        for _ in range(vector_size):
            seed = md5(str(v[-window:]).encode()).hexdigest() if len(v) >= window else None
            v.append(choose_one(seed))
        
        to_change = secrets.randbelow(65)
        pos = random.choices(range(vector_size), k=to_change)
        for p in pos:
            v[p] = choose_one()
        return v
    else:
        return [choose_one() for _ in range(vector_size)]

if not proof_of_work():
    print("PoW verification failed!")
    exit()

banner = """
 █████   ███   █████          ████  ████                                        █████                   █████      █████████  ███████████ ███████████
░░███   ░███  ░░███          ░░███ ░░███                                       ░░███                  ███░░░███   ███░░░░░███░█░░░███░░░█░░███░░░░░░█
 ░███   ░███   ░███   ██████  ░███  ░███   ██████   ██████  █████████████      ███████    ██████     ███   ░░███ ███     ░░░ ░   ░███  ░  ░███   █ ░ 
 ░███   ░███   ░███  ███░░███ ░███  ░███  ███░░███ ███░░███░░███░░███░░███    ░░░███░    ███░░███   ░███    ░███░███             ░███     ░███████   
 ░░███  █████  ███  ░███████  ░███  ░███ ░███ ░░░ ░███ ░███ ░███ ░███ ░███      ░███    ░███ ░███   ░███    ░███░███             ░███     ░███░░░█   
  ░░░█████░█████░   ░███░░░   ░███  ░███ ░███  ███░███ ░███ ░███ ░███ ░███      ░███ ███░███ ░███   ░░███   ███ ░░███     ███    ░███     ░███  ░    
    ░░███ ░░███     ░░██████  █████ █████░░██████ ░░██████  █████░███ █████     ░░█████ ░░██████     ░░░█████░   ░░█████████     █████    █████      
     ░░░   ░░░       ░░░░░░  ░░░░░ ░░░░░  ░░░░░░   ░░░░░░  ░░░░░ ░░░ ░░░░░       ░░░░░   ░░░░░░        ░░░░░░     ░░░░░░░░░     ░░░░░    ░░░░░       
"""

print(banner)
print("Are u ready to play the game")

play_times = 200
for i in range(play_times):
    bit = secrets.randbelow(2)
    v = get_vector(bit)
    print("Vector: ", v)
    print("Please tell me the bit of the vector")
    try:
        user_bit = int(input())
    except ValueError:
        print("Invalid input")
        exit()
        
    if user_bit != bit:
        print("Wrong answer")
        exit()

print("You are a good guesser, the flag is ", flag)

## The Generation Process

The core function is choose_one(seed):

It generates random weights p_v.
It generates random values us.
It picks an index k that maximizes log(us[k]) / p_v[k].

Crucially:

p_v is always generated using the global RNG state (which we don't know).
us is generated using a local RNG. If a seed is provided, us is deterministic and known to us!

For Bit 0, v[j] is generated with seed = md5(v[j-5:j]). This means if we know the previous 5 numbers, we can reconstruct the exact us array that was used to generate v[j].

## The Statistical Flaw

Even though we don't know p_v, we know that the winning index v[j] maximizes log(us[k]) / p_v[k].
Since log(us) is negative and p_v is positive, we want the value closest to 0. This happens when us[k] is large (close to 1) and p_v[k] is large.

Therefore, the winning index v[j] tends to have a larger than average us[v[j]] value.

If v[j] was generated from the seed md5(v[j-5:j]):

We can re-generate the same us array.
We look at us[v[j]]. It should be one of the larger values in us.
We compute the rank q: the fraction of us values that are larger than us[v[j]].
- If v[j] is the true generated value, q will be small (close to 0).
- If v[j] is random (or replaced by noise), us[v[j]] will be uncorrelated with our generated us, so q will be uniformly distributed between 0 and 1.

## The Solution

We can build a distinguisher based on this q value.

Calculate q for every position:
For each index j from 5 to 139:
- Calculate seed: md5(v[j-5:j]).
- Generate us using this seed.
- Compute q = (count of us > us[v[j]]) / total_count.
Analyze the distribution of q:
- Bit 1 (Random): The q values will look like a uniform distribution.
- Bit 0 (Structured): Many q values will be very small (e.g., < 0.001), corresponding to the positions that weren't replaced by noise.
Classification:
We can use a simple heuristic or a trained classifier.
- Heuristic: If we see a "run" of very small q values (e.g., 2 consecutive q < 0.001), it's almost certainly Bit 0. Random chance of this is $10^{-6}$ .
- Classifier: The provided solver uses a Logistic Regression model trained on features like "mean of the smallest 5 qs", "count of q < 0.001", etc.

## Implementation Details

The solver script solve.py implements this:

Solves the PoW.
For each round:
- Computes q values for the vector.
- Extracts features (sorted q stats).
- Checks for a "run" of small qs (strong signal).
- If no run, uses the logistic regression score.
- Sends the guess.

Flag: 0ops{34sy_st4tistics_g@me_Thou_@rt_more_lovely_and_more_temperate}

./contents.sh

0CTF - baby discriminator

# baby_discriminator

## Challenge Overview

## The Generation Process

## The Statistical Flaw

## The Solution

## Implementation Details

Comments(0)

Related Writeups

0CTF - Nightfall Tempest