Mathy provides an on-policy learning agent (a3c) that can be trained on a modern desktop CPU using python's threading APIs.
For some tasks the A3C agent trains quickly, but for more complex tasks it can require long training periods.
Asynchronous Advantage Actor-Critic¶
Asynchronous Advantage Actor-Critic (A3C) is an algorithm that uses multiple workers to train a shared model. It roughly breaks down like this:
- create a global model that workers will update
- create n worker threads and pass the global model to them
- [worker loop]
- create a local model by copying the global model weights
- take up to update_interval actions or complete an episode
- apply updates to the local model from gathered data
- merge local model changes with global model
The coordinator/worker architecture used by A3C has a few features that stabilize training and allow it to quickly find solutions to some challenging tasks.
By using many workers, the diversity of training data goes up which forces the model to make predictions from a more diverse set of inputs.
The A3C agent can be interacted with via the CLI or the API directly.
You can import the required bits and train an A3C agent using your own custom python code:
#!pip install gym from mathy.cli import setup_tf_env from mathy.agent import A3CAgent, AgentConfig import shutil import tempfile model_folder = tempfile.mkdtemp() setup_tf_env() args = AgentConfig( max_eps=3, verbose=True, topics=["poly-combine"], model_dir=model_folder, num_workers=2, print_training=True, ) A3CAgent(args).train() # Comment this out to keep your model shutil.rmtree(model_folder)
Training with the CLI¶
Once mathy is installed on your system, you can train an agent using the CLI:
mathy train a3c poly output/my_agent --show
Viewing the agent training
You can view the agent's in-episode actions by providing the
--show argument when using the CLI
The CLI A3C agent accepts a
--profile option, or a config option when the API is used.
#!pip install gym import os from mathy.cli import setup_tf_env from mathy.agent import A3CAgent, AgentConfig import shutil import tempfile model_folder = tempfile.mkdtemp() setup_tf_env() args = AgentConfig( profile=True, max_eps=2, verbose=True, topics=["poly-grouping"], model_dir=model_folder, num_workers=2, print_training=True, ) A3CAgent(args).train() assert os.path.isfile(os.path.join(args.model_dir, "worker_0.profile")) assert os.path.isfile(os.path.join(args.model_dir, "worker_1.profile")) # Comment this out to keep your model shutil.rmtree(model_folder)
Learn about how to view output profiles on the debugging page
One thing that is challenging about A3C is that the workers all need to push their gradients to a shared "global" model. Tensorflow doesn't make this easy to do across process boundaries so the A3C implementation is strictly multi-threaded and has limited ability to scale.
Help Wanted - Parallelizing A3C updates
If you would like to help out with making the A3C implementation scaling using multiprocessing open an issue here
As a workaround for the inability to use multiprocessing, a hyperparameter "worker_wait" is defined by the agent configuration, and each worker that isn't the main (worker 0) will wait that many milliseconds between each action it attempts to take. This allows you to run more workers than you have cores. The overall number of examples gained may not be greater using this trick, but the diversity of the data gathered should be.