The MCTS and Neural Network powered (Zero) agent is inspired by the work of Google's DeepMind and their AlphZero board-game playing AI. It uses a Monte Carlo Tree Search algorithm to produce quality actions that are unbiased by things like Actor/Critic errors.
Multiple Process Training¶
Mathy's Zero agent uses the Python multiprocessing module to train with many copies of the agent at the same time.
For long training runs, this multi-worker approach speeds up example gathering considerably.
Let's see what this looks like with self-play.
We'll train a zero agent using the Policy/Value model:
#!pip install gym from mathy.cli import setup_tf_env setup_tf_env(use_mp=True) from mathy.agents.zero import SelfPlayConfig, self_play_runner import shutil import tempfile model_folder = tempfile.mkdtemp() self_play_cfg = SelfPlayConfig( # This option is set to allow the script to run quickly. # You'll probably want a much larger value during training. 10000? max_eps=1, # This is set to allow training after the first set of problems. # You'll probalby want this to be more like 128 batch_size=1, # This is set to only do 2 problems before training. You guessed # it, in order to keep things snappy. Try 100. self_play_problems=2, # This is set to 1 in order to exit after the first gather/training loop. training_iterations=1, # This is normally larger, try 10 epochs=1, # This is a tiny model, designed to be fast for testing. units=32, embedding_units=16, # The number of MCTS sims directly correlates with finding quality # actions. Normally I would set this to something like 100, 250, 500 # depending on the problem difficulty. mcts_sims=2, # This can be scaled to however many CPUs you have available. Going # higher than your CPU count does not produce good performance usually. num_workers=2, verbose=True, difficulty="easy", topics=["poly-combine"], model_dir=model_folder, print_training=True, ) self_play_runner(self_play_cfg) # Comment this out to keep your model shutil.rmtree(model_folder)
Single Process Training¶
Running multiple process training does not work great with some modern debuggers like Visual Studio Code.
Because of this, the Zero agent will use the Python threading module if it is configured to use only one worker.
In this mode you can set breakpoints in the debugger to help diagnose errors.
#!pip install gym from mathy.cli import setup_tf_env from mathy.agents.zero import SelfPlayConfig, self_play_runner import shutil import tempfile model_folder = tempfile.mkdtemp() setup_tf_env() self_play_cfg = SelfPlayConfig( # Setting to 1 worker uses single-threaded implementation num_workers=1, mcts_sims=3, max_eps=1, self_play_problems=1, training_iterations=1, verbose=True, topics=["poly-combine"], model_dir=model_folder, print_training=True, ) self_play_runner(self_play_cfg) # Comment this out to keep your model shutil.rmtree(model_folder)
The CLI Zero agent can optionally output performance profiles.
For the CLI you do this with
In python you pass
num_workers=1 to the agent configuration.
#!pip install gym import os import shutil import tempfile from mathy.agents.zero import SelfPlayConfig, self_play_runner from mathy.cli import setup_tf_env model_folder = tempfile.mkdtemp() args = SelfPlayConfig( num_workers=1, profile=True, model_dir=model_folder, # All options below here can be deleted if you're actually training max_eps=1, self_play_problems=1, training_iterations=1, epochs=1, mcts_sims=3, ) self_play_runner(args) assert os.path.isfile(os.path.join(args.model_dir, "worker_0.profile")) # Comment this out to keep your model shutil.rmtree(model_folder)
Learn about how to view output profiles on the debugging page