Skip to content

Model

Overview

Mathy uses a model that predicts which action to take in an environment, and the scalar value of the current state.

Model

Mathy's policy/value model takes in a window of observations and outputs a weighted distribution over all the possible actions and value estimates for each observation.

G 140713506106000 InputLayer values_in input:[(?, ?)] output:[(?, ?)] 140713505684816 ExpandDims input:(?, ?) output:[(?, ?, 1)] 140713506106000->140713505684816 140713506321872 InputLayer nodes_in input:[(?, ?)] output:[(?, ?)] 140713506195856 Embedding nodes_input input:(?, ?) output:(?, ?, 128) 140713506321872->140713506195856 140713506195152 SinusodialRepresentationDense values_input input:(?, ?, 1) output:(?, ?, 64) 140713505684816->140713506195152 140713506107024 InputLayer type_in input:[(?, ?, 2)] output:[(?, ?, 2)] 140713505773968 SinusodialRepresentationDense type_input input:(?, ?, 2) output:(?, ?, 64) 140713506107024->140713505773968 140713506106256 InputLayer time_in input:[(?, ?, 1)] output:[(?, ?, 1)] 140713505775568 SinusodialRepresentationDense time_input input:(?, ?, 1) output:(?, ?, 64) 140713506106256->140713505775568 140713496358480 input_vectors input:[(?, ?, 128), (?, ?, 64), (?, ?, 64), (?, ?, 64)] output:[(?, ?, 320)] 140713506195856->140713496358480 140713506195152->140713496358480 140713505773968->140713496358480 140713505775568->140713496358480 140713505855504 SIRENModel siren input:(?, ?, 320) output:(?, ?, 64) 140713496358480->140713505855504 140713496450768 Mean input:(?, ?, 64) output:[(?, 64)] 140713505855504->140713496450768 140713505558288 Sequential policy_head input:(?, ?, 64) output:(?, ?, 6) 140713505855504->140713505558288 140713505628112 Sequential value_head input:(?, 64) output:(?, 1) 140713496450768->140713505628112 140713505660816 Sequential reward_head input:(?, 64) output:(?, 1) 140713496450768->140713505660816

Examples

Call the Model

The simplest thing to do is to load a blank model and pass some data through it. This gives us a sense of how things works:

Open Example In Colab

import tensorflow as tf

from mathy import envs
from mathy.agent.config import AgentConfig
from mathy.agent.model import build_agent_model
from mathy.env import MathyEnv
from mathy.state import MathyObservation, observations_to_window

args = AgentConfig()
env: MathyEnv = envs.PolySimplify()
observation: MathyObservation = env.state_to_observation(env.get_initial_state()[0])
model = build_agent_model(args, predictions=env.action_size)
inputs = observations_to_window([observation]).to_inputs()
policy, value, reward = model.predict(inputs)
# TODO: this is broken until the model is restructured to produce a single output

# The policy is a 1D array of size (actions * num_nodes)
assert policy.shape == (1, len(observation.nodes), env.action_size)

# There should be one floating point output Value
assert value.shape == (1, 1)
assert isinstance(float(value), float)

Save Model with Optimizer

Mathy's optimizer is stateful and so it has to be saved alongside the model if we want to pause and continue training later. To help with this Mathy has a function get_or_create_agent_model.

The helper function handles:

  • Creating a folder if needed to store the model and related files
  • Saving the agent hyperparameters used for training the model model.config.json
  • Initializing and sanity checking the model by compiling and calling it with a random observation

Open Example In Colab

#!pip install gym
import shutil
import tempfile

from mathy.agent import A3CAgent, AgentConfig
from mathy.agent.model import get_or_create_agent_model
from mathy.cli import setup_tf_env
from mathy.envs import PolySimplify

model_folder = tempfile.mkdtemp()
setup_tf_env()

args = AgentConfig(
    max_eps=3,
    verbose=True,
    topics=["poly"],
    model_dir=model_folder,
    update_gradients_every=4,
    num_workers=1,
    units=4,
    embedding_units=4,
    lstm_units=4,
    print_training=True,
)
instance = A3CAgent(args)
instance.train()
# Load the model back in
model_two = get_or_create_agent_model(
    config=args, predictions=PolySimplify().action_size, is_main=True
)
# Comment this out to keep your model
shutil.rmtree(model_folder)


Last update: July 24, 2020