Skip to content

Agent

Mathy provides an on-policy learning agent (a3c) that can be trained on a modern desktop CPU using python's threading APIs.

For some tasks the A3C agent trains quickly, but for more complex tasks it can require long training periods.

Asynchronous Advantage Actor-Critic

Asynchronous Advantage Actor-Critic (A3C) is an algorithm that uses multiple workers to train a shared model. It roughly breaks down like this:

A3C Pseudocode

  1. create a global model that workers will update
  2. create n worker threads and pass the global model to them
  3. [worker loop]
    • create a local model by copying the global model weights
    • take up to update_interval actions or complete an episode
    • apply updates to the local model from gathered data
    • merge local model changes with global model
  4. done

The coordinator/worker architecture used by A3C has a few features that stabilize training and allow it to quickly find solutions to some challenging tasks.

By using many workers, the diversity of training data goes up which forces the model to make predictions from a more diverse set of inputs.

Examples

The A3C agent can be interacted with via the CLI or the API directly.

Training

You can import the required bits and train an A3C agent using your own custom python code:

Open Example In Colab

#!pip install gym
from mathy.cli import setup_tf_env
from mathy.agent import A3CAgent, AgentConfig
import shutil
import tempfile

model_folder = tempfile.mkdtemp()
setup_tf_env()

args = AgentConfig(
    max_eps=3,
    verbose=True,
    topics=["poly-combine"],
    model_dir=model_folder,
    num_workers=2,
    print_training=True,
)
A3CAgent(args).train()
# Comment this out to keep your model
shutil.rmtree(model_folder)

Training with the CLI

Once mathy is installed on your system, you can train an agent using the CLI:

mathy train a3c poly output/my_agent --show

Viewing the agent training

You can view the agent's in-episode actions by providing the --show argument when using the CLI

Performance Profiling

The CLI A3C agent accepts a --profile option, or a config option when the API is used.

Open Example In Colab

#!pip install gym
import os
from mathy.cli import setup_tf_env
from mathy.agent import A3CAgent, AgentConfig
import shutil
import tempfile

model_folder = tempfile.mkdtemp()
setup_tf_env()

args = AgentConfig(
    profile=True,
    max_eps=2,
    verbose=True,
    topics=["poly-grouping"],
    model_dir=model_folder,
    num_workers=2,
    print_training=True,
)
A3CAgent(args).train()

assert os.path.isfile(os.path.join(args.model_dir, "worker_0.profile"))
assert os.path.isfile(os.path.join(args.model_dir, "worker_1.profile"))

# Comment this out to keep your model
shutil.rmtree(model_folder)

Learn about how to view output profiles on the debugging page

Multiprocess A3C

One thing that is challenging about A3C is that the workers all need to push their gradients to a shared "global" model. Tensorflow doesn't make this easy to do across process boundaries so the A3C implementation is strictly multi-threaded and has limited ability to scale.

Help Wanted - Parallelizing A3C updates

If you would like to help out with making the A3C implementation scaling using multiprocessing open an issue here

As a workaround for the inability to use multiprocessing, a hyperparameter "worker_wait" is defined by the agent configuration, and each worker that isn't the main (worker 0) will wait that many milliseconds between each action it attempts to take. This allows you to run more workers than you have cores. The overall number of examples gained may not be greater using this trick, but the diversity of the data gathered should be.


Last update: July 24, 2020