Box game example

In this section we describe an experiment involving the player having to make two different, though interrelated, decisions at each round. This corresponds to some so far unpublished work in the DDMLab, a highly stylized example of a cyber defense task.

Description

In this game the player is given a choice of opening one of two boxes, 1 or 2, one of which contains a reward. An automated defender is defending one of the boxes, but it can defend only one. The player first chooses the box they are considering opening, but does not yet open it. The player is then given feedback by the defender, potentially warning that the box is defended; however this warning may or may not be true. The player then chooses whether or not to open the box. Opening an actually undefended box that contains the reward earns the player 100 points. Opening a box that is truly defended costs the player 50 points. Not opening a box, or opening a box that does not contain the reward has no effect on the player’s score.

We consider four conditions for this task. In each of these conditions there is a probability of 0.5 that the reward will be in box 1, and otherwise it will be in box 2. In the first three conditions there is also a probability of 0.5 that a particular box is defended. These three conditions differ in the probability of the user being told that the box is defended.

  • In the “1 way” condition if the box is defended a warning signal is always given (this parameter is referred to as p, and in this case is thus 1); and if the box is not defended there is an 0.5 probability (this parameteer is referred to as q, and in this case is 0.5) that the warning signal is nonetheless given.

  • In the “2 way (0.75)” condition if the box is defended the warning signal is issued 75% of the time (p = 0.75); and if the box is not defended it is issued 37.5% of the time (q = 0.375).

  • In the “2 way (0.5) condition if the box is defended the warning signal is issued 50% of the time (p = 0.5); and if the box is not defended it is issued 25% of the time (q = 0.25).

The fourth condition is a degenerate “control” condition in which no warning signal is ever issued (referred to as the “no signaling” condition).

Code

The source file for this experiment and model can be downloaded. To run this install the dependencies from the requirements.txt file and then simply call Python on the box_game.py source file. It will take few minutes to complete, and will then display a graph of the results, assuming matplotlib is correctly configured for your machine.

  1# Copyright 2019–2025 Carnegie Mellon University
  2# Example of a two stage decision task modeled using PyIBL.
  3
  4import click
  5import csv
  6from itertools import count
  7import matplotlib.pyplot as plt
  8import numpy as np
  9import pyibl
 10import random
 11from tqdm import tqdm
 12
 13DEFAULT_ROUNDS = 50
 14DEFAULT_PARTICIPANTS = 1000
 15DEFAULT_NOISE = 0.25
 16DEFAULT_DECAY = 0.5
 17DEFAULT_TEMPERATURE = 1
 18LOGFILE = "box-game-log.csv"
 19
 20CONDITIONS = [{"name": n, "p": p, "q": q} for n, p, q in [("1 way",        1,    0.5),
 21                                                          ("2 way (0.75)", 0.75, 0.375),
 22                                                          ("2 way (0.5)",  0.5,  0.25),
 23                                                          ("no signaling", None, None)]]
 24
 25def run(condition, rounds=DEFAULT_ROUNDS, participants=DEFAULT_PARTICIPANTS,
 26        noise=DEFAULT_NOISE, decay=DEFAULT_DECAY, temperature=DEFAULT_TEMPERATURE,
 27        logwriter=None, progress=None):
 28    for c in CONDITIONS:
 29        if c["name"] == condition:
 30            cond_p = c["p"]
 31            cond_q = c["q"]
 32            break
 33    else:
 34        raise ValueError(f"Unknown condition {condition}")
 35    selection_agent = pyibl.Agent(noise=noise, decay=decay, temperature=temperature)
 36    attack_agent = pyibl.Agent(attributes=(["attack"] + ([] if cond_p is None else ["warning"])),
 37                               noise=noise, decay=decay, temperature=temperature)
 38    successful_attacks = 0
 39    failed_attacks = 0
 40    withdrawals = 0
 41    if cond_p is None:
 42        attack_agent.populate ([{"attack": False}],
 43                               0)
 44    else:
 45        attack_agent.populate ([{"attack": False, "warning": 0},
 46                                {"attack": False, "warning": 1}],
 47                               0)
 48    for v in [100, -50]:
 49        selection_agent.populate([0, 1], v)
 50        if cond_p is None:
 51            attack_agent.populate([{"attack": True}],
 52                                  v)
 53        else:
 54            attack_agent.populate([{"attack": True, "warning": 0},
 55                                   {"attack": True, "warning": 1}],
 56                                  v)
 57    for p in range(participants):
 58        total = 0
 59        selection_agent.reset(True)
 60        attack_agent.reset(True)
 61        for r in range(rounds):
 62            selected = selection_agent.choose((0, 1))
 63            covered = random.random() < 0.5
 64            if cond_p is None:
 65                attack = attack_agent.choose([{"attack": True},
 66                                              {"attack": False}])["attack"]
 67            else:
 68                if covered:
 69                    warned = int(random.random() < (1- cond_p))
 70                else:
 71                    warned = int(random.random() < cond_q)
 72                attack = attack_agent.choose([{"attack": True, "warning": warned},
 73                                              {"attack": False, "warning": warned}])["attack"]
 74            if not attack:
 75                withdrawals += 1
 76                payoff = 0
 77            elif covered:
 78                failed_attacks += 1
 79                payoff = -50
 80            else:
 81                successful_attacks += 1
 82                payoff = 100
 83            total += payoff
 84            attack_agent.respond(payoff)
 85            selection_agent.respond(payoff)
 86            logwriter.writerow([condition, p + 1, r + 1, selected,
 87                                (int(warned) if cond_p is not None else None),
 88                                int(covered), int(attack), payoff, total])
 89        if progress:
 90            progress.update()
 91    return [n / (participants * rounds)
 92            for n in [successful_attacks, failed_attacks, withdrawals]]
 93
 94@click.command()
 95@click.option("--rounds", "-r", default=DEFAULT_ROUNDS,
 96              help="number of rounds to play")
 97@click.option("--participants", "-p", default=DEFAULT_PARTICIPANTS,
 98              help="number of virtual participants to simulate")
 99@click.option("--noise", "-n", default=DEFAULT_NOISE,
100              help="noise for the two agents")
101@click.option("--decay", "-d", default=DEFAULT_DECAY,
102              help="decay parameter for the two agents")
103@click.option("--temperature", "-t", default=DEFAULT_TEMPERATURE,
104              help="blending temperature for the two agents")
105def main(rounds, participants, noise, decay, temperature):
106    results = {"successful attack": [], "failed attack": [], "withdrew": []}
107    colors = ("red", "green", "blue")
108    with tqdm(total=(participants * len(CONDITIONS))) as p:
109        with open(LOGFILE, "w", newline="") as f:
110            w = csv.writer(f)
111            w.writerow("Condition,Subject,Trial,Selected,Warning,Covered,Action,Outcome,Cum_Outcome".split(","))
112            for c in CONDITIONS:
113                cname = c["name"]
114                r = run(cname, rounds=rounds, participants=participants,
115                        noise=noise, decay=decay, temperature=temperature,
116                        logwriter=w, progress=p)
117                for k, v in zip(results.keys(), r):
118                    results[k].append(round(v, 2))
119    fig, ax = plt.subplots(layout='constrained')
120    x = np.arange(len(CONDITIONS))
121    wid = 0.25
122    for (kind, vals), mult, c  in zip(results.items(), count(), colors):
123        offset = wid * mult
124        rects = ax.bar(x + offset, vals, wid, label=kind, color=c)
125        ax.bar_label(rects, padding=3)
126        mult += 1
127    ax.set_xticks(x + wid, [c["name"] for c in CONDITIONS])
128    ax.legend(loc="upper left", ncols=3)
129    ax.set_ylim(0, 0.6)
130    ax.set_title(f"{participants} participants, {rounds} rounds\n"
131                 f"noise={noise}, decay={decay}, temperature={temperature}")
132    plt.show()
133
134
135if __name__ == "__main__":
136    main()

The heart of the model is the run function, which runs the model for one condition, using a specified number of rounds and virtual participants, as well as the usual IBL parameters. After working out the various parameters to implement the given condition, it allocates two PyIBL Agents, one for selecting the box to possibly be attacked, and the second to decide whether or not to attack it. Note that the attributes of the second, attack, Agent differ slightly for the “no signaling” condition, as we do not want to record whether or not a warning that the box is defended was issued. These agents are prepopulated with instances for the various possible combinations of whether or not a box is selected, and whether or not it is attacked when a warning has or has not been given, with the prepopulated values being one each of the best and worst possible results.

The model then proceeds by at each round first selecting which box to possibly attack, and then, having seen whether or not a warning is given, whether or not to attack. Once the resulting payoff is known both Agents are updated to reflect that payoff.

The main function calls run for each of the conditions, collects their results, and displays a bar graph comparing them. The click module is used to provide the opportunity to run things with different numbers of rounds and participatns, as well as different IBL parameters, though the default values of all are sensible.

Note that additional conditions can be easily added by amending the definition of the CONDITIONS constant.

Results

Here is a graph of of the results when run with 1,000 participants and 50 rounds for each (the default values); note that when running this yourself the results may differ slightly, since the model is, of course, stochastic, but the results should be similar:

_images/box-game-model-results.png

The DDMLab has also run this task, with the same conditions, with an ensemble of human participants, with the following results:

_images/box-game-human-data.png