Skip to content

Overview

Mathy includes a framework for building reinforcement learning environments that transform math expressions using a set of user-defined actions.

There are a number of built-in environments aimed at simplifying algebra problems, and generous customization points for creating new ones.

Episodes

Mathy agents interact with environments through sequences of ineteractions called episodes, which follow a standard RL episode lifecycle:

Episode Pseudocode

  1. set state to an initial state from the environment
  2. while state is not terminal
    • take an action and update state
  3. done

Extensions

Because algebra problems are only a tiny sliver of what can be represented using math expression trees, Mathy has customization points to allow altering or creating entirely new environments with little effort.

New Problems

Generating a new problem type while subclassing a base environment is probably the simplest way to create a custom challenge for the agent.

You can inherit from a base environment like Poly Simplify which has win-conditions that require all the like-terms to be gone from an expression, and all complex terms be simplified. From there you can provide any valid input expression:

Open Example In Colab

from mathy import MathyEnv, MathyEnvProblem, MathyEnvProblemArgs


class CustomSimplifyEnv(MathyEnv):
    def get_env_namespace(self) -> str:
        return "custom.polynomial.simplify"

    def problem_fn(self, params: MathyEnvProblemArgs) -> MathyEnvProblem:
        return MathyEnvProblem("4x + y + 13x", 3, self.get_env_namespace())


env: MathyEnv = CustomSimplifyEnv()
state, problem = env.get_initial_state()
assert problem.text == "4x + y + 13x"
assert problem.complexity == 3

New Actions

Build your own tree transformation actions and use them with the built-in agents:

Open Example In Colab

"""Environment with user-defined actions"""

from mathy_core import (
    AddExpression,
    BaseRule,
    NegateExpression,
    SubtractExpression,
)
from mathy import (
    MathyEnvState,
    envs,
    MathyEnv,
)


class PlusNegationRule(BaseRule):
    """Convert subtract operators to plus negative to allow commuting"""

    @property
    def name(self) -> str:
        return "Plus Negation"

    @property
    def code(self) -> str:
        return "PN"

    def can_apply_to(self, node) -> bool:
        is_sub = isinstance(node, SubtractExpression)
        is_parent_add = isinstance(node.parent, AddExpression)
        return is_sub and (node.parent is None or is_parent_add)

    def apply_to(self, node):
        change = super().apply_to(node)
        change.save_parent()  # connect result to node.parent
        result = AddExpression(node.left, NegateExpression(node.right))
        result.set_changed()  # mark this node as changed for visualization
        return change.done(result)


class CustomActionEnv(envs.PolySimplify):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.rules = MathyEnv.core_rules() + [PlusNegationRule()]


env = CustomActionEnv()

state = MathyEnvState(problem="4x - 2x")
expression = env.parser.parse(state.agent.problem)
action = env.random_action(expression, PlusNegationRule)
out_state, transition, _ = env.get_next_state(state, action)
assert out_state.agent.problem == "4x + -2x"

Custom Win Conditions

Environments can implement their own logic for win conditions, or inherit them from a base class:

Open Example In Colab

"""Custom environment with win conditions that are met whenever
two nodes are adjacent to each other that can have the distributive
property applied to factor out a common term """

from typing import Optional
from mathy_core import (
    MathExpression,
    rules,
)
from mathy import (
    MathyEnv,
    MathyEnvState,
    MathyObservation,
    is_terminal_transition,
    time_step,
)


class CustomWinConditions(MathyEnv):
    rule = rules.DistributiveFactorOutRule()

    def transition_fn(
        self,
        env_state: MathyEnvState,
        expression: MathExpression,
        features: MathyObservation,
    ) -> Optional[time_step.TimeStep]:
        # If the rule can find any applicable nodes
        if self.rule.find_node(expression) is not None:
            # Return a terminal transition with reward
            return time_step.termination(features, self.get_win_signal(env_state))
        # None does nothing
        return None


env = CustomWinConditions()

# This state is not terminal because none of the nodes can have the distributive
# factoring rule applied to them.
state_one = MathyEnvState(problem="4x + y + 2x")
transition = env.get_state_transition(state_one)
assert is_terminal_transition(transition) is False

# This is a terminal state because the nodes representing "4x + 2x" can
# have the distributive factoring rule applied to them.
state_two = MathyEnvState(problem="4x + 2x + y")
transition = env.get_state_transition(state_two)
assert is_terminal_transition(transition) is True

Custom Timestep Rewards

Specify which actions the agent should be rewarded for using and which it should be penalized for:

Open Example In Colab

"""Environment with user-defined rewards per-timestep based on the
rule that was applied by the agent."""

from typing import List, Type

from mathy import MathyEnv, MathyEnvState
from mathy_core import BaseRule, rules


class CustomTimestepRewards(MathyEnv):
    def get_rewarding_actions(self, state: MathyEnvState) -> List[Type[BaseRule]]:
        return [rules.AssociativeSwapRule]

    def get_penalizing_actions(self, state: MathyEnvState) -> List[Type[BaseRule]]:
        return [rules.CommutativeSwapRule]


env = CustomTimestepRewards()
problem = "4x + y + 2x"
expression = env.parser.parse(problem)
state = MathyEnvState(problem=problem)

_, transition, _ = env.get_next_state(
    state, env.random_action(expression, rules.AssociativeSwapRule),
)
# Expect positive reward
assert transition.reward > 0.0

_, transition, _ = env.get_next_state(
    state, env.random_action(expression, rules.CommutativeSwapRule),
)
# Expect neagative reward
assert transition.reward < 0.0

Custom Episode Rewards

Specify (or calculate) custom floating point terminal reward values:

Open Example In Colab

"""Environment with user-defined terminal rewards"""

from mathy import MathyEnvState, envs, is_terminal_transition
from mathy_core.rules import ConstantsSimplifyRule


class CustomEpisodeRewards(envs.PolySimplify):
    def get_win_signal(self, env_state: MathyEnvState) -> float:
        return 20.0

    def get_lose_signal(self, env_state: MathyEnvState) -> float:
        return -20.0


env = CustomEpisodeRewards()

# Win by simplifying constants and yielding a single simple term form
state = MathyEnvState(problem="(4 + 2) * x")
expression = env.parser.parse(state.agent.problem)
action = env.random_action(expression, ConstantsSimplifyRule)
out_state, transition, _ = env.get_next_state(state, action)
assert is_terminal_transition(transition) is True
assert transition.reward == 20.0
assert out_state.agent.problem == "6x"

# Lose by applying a rule with only 1 move remaining
state = MathyEnvState(problem="2x + (4 + 2) + 4x", max_moves=1)
expression = env.parser.parse(state.agent.problem)
action = env.random_action(expression, ConstantsSimplifyRule)
out_state, transition, _ = env.get_next_state(state, action)
assert is_terminal_transition(transition) is True
assert transition.reward == -20.0
assert out_state.agent.problem == "2x + 6 + 4x"

Other Libraries

Mathy has basic support for alternative Reinforcement Learning libraries.

OpenAI Gym

Mathy has support OpenAI gym via a small wrapper.

You can import the mathy.envs.gym module separately to register the environments:

Open Example In Colab

#!pip install gym
import gym
import mathy.envs.gym
from mathy.state import MathyObservation

all_envs = gym.envs.registration.registry.all()
# Filter to just mathy registered envs
mathy_envs = [e for e in all_envs if e.id.startswith("mathy-")]

assert len(mathy_envs) > 0

# Each env can be created and produce an initial observation without
# special configuration.
for gym_env_spec in mathy_envs:
    wrapper_env: mathy.envs.gym.MathyGymEnv = gym.make(gym_env_spec.id)
    assert wrapper_env is not None
    observation: MathyObservation = wrapper_env.reset()
    assert isinstance(observation, MathyObservation)
    assert observation is not None

Tensorflow Agents

Mathy does not currently have a TF-agents environment wrapper.

That being said, it may be possible to use it indirectly through the OpenAI Gym wrapper.

Help Wanted

TF-Agents seems like a great library.

If you use it and would like to contribute a PR to add support, it would be very welcomed.


Last update: December 25, 2019