Skip to content

Zero Agent

Zero

The MCTS and Neural Network powered (Zero) agent is inspired by the work of Google's DeepMind and their AlphZero board-game playing AI. It uses a Monte Carlo Tree Search algorithm to produce quality actions that are unbiased by things like Actor/Critic errors.

Multiple Process Training

Mathy's Zero agent uses the Python multiprocessing module to train with many copies of the agent at the same time.

For long training runs, this multi-worker approach speeds up example gathering considerably.

Let's see what this looks like with self-play.

We'll train a zero agent using the Policy/Value model:

Open Example In Colab

#!pip install gym
from mathy.cli import setup_tf_env

setup_tf_env(use_mp=True)

from mathy.agents.zero import SelfPlayConfig, self_play_runner

import shutil
import tempfile

model_folder = tempfile.mkdtemp()
self_play_cfg = SelfPlayConfig(
    # This option is set to allow the script to run quickly.
    # You'll probably want a much larger value during training. 10000?
    max_eps=1,
    # This is set to allow training after the first set of problems.
    # You'll probalby want this to be more like 128
    batch_size=1,
    # This is set to only do 2 problems before training. You guessed
    # it, in order to keep things snappy. Try 100.
    self_play_problems=2,
    # This is set to 1 in order to exit after the first gather/training loop.
    training_iterations=1,
    # This is normally larger, try 10
    epochs=1,
    # This is a tiny model, designed to be fast for testing.
    units=32,
    embedding_units=16,
    # The number of MCTS sims directly correlates with finding quality
    # actions. Normally I would set this to something like 100, 250, 500
    # depending on the problem difficulty.
    mcts_sims=2,
    # This can be scaled to however many CPUs you have available. Going
    # higher than your CPU count does not produce good performance usually.
    num_workers=2,
    verbose=True,
    difficulty="easy",
    topics=["poly"],
    model_dir=model_folder,
    print_training=True,
)

self_play_runner(self_play_cfg)
# Comment this out to keep your model
shutil.rmtree(model_folder)

Single Process Training

Running multiple process training does not work great with some modern debuggers like Visual Studio Code.

Because of this, the Zero agent will use the Python threading module if it is configured to use only one worker.

In this mode you can set breakpoints in the debugger to help diagnose errors.

Open Example In Colab

#!pip install gym
from mathy.cli import setup_tf_env
from mathy.agents.zero import SelfPlayConfig, self_play_runner

import shutil
import tempfile

model_folder = tempfile.mkdtemp()
setup_tf_env()
self_play_cfg = SelfPlayConfig(
    # Setting to 1 worker uses single-threaded implementation
    num_workers=1,
    mcts_sims=3,
    max_eps=1,
    self_play_problems=1,
    training_iterations=1,
    verbose=True,
    topics=["poly-combine"],
    model_dir=model_folder,
    print_training=True,
)

self_play_runner(self_play_cfg)
# Comment this out to keep your model
shutil.rmtree(model_folder)

Performance Profiling

The CLI Zero agent can optionally output performance profiles.

For the CLI you do this with --profile --num-workers=1.

In python you pass profile=True and num_workers=1 to the agent configuration.

Open Example In Colab

#!pip install gym
import os
import shutil
import tempfile

from mathy.agents.zero import SelfPlayConfig, self_play_runner
from mathy.cli import setup_tf_env

model_folder = tempfile.mkdtemp()
args = SelfPlayConfig(
    num_workers=1,
    profile=True,
    model_dir=model_folder,
    # All options below here can be deleted if you're actually training
    max_eps=1,
    self_play_problems=1,
    training_iterations=1,
    epochs=1,
    mcts_sims=3,
)

self_play_runner(args)


assert os.path.isfile(os.path.join(args.model_dir, "worker_0.profile"))

# Comment this out to keep your model
shutil.rmtree(model_folder)

Learn about how to view output profiles on the debugging page


Last update: January 5, 2020