Mathy includes a framework for building reinforcement learning environments that transform math expressions using a set of user-defined actions.
There are a number of built-in environments aimed at simplifying algebra problems, and generous customization points for creating new ones.
Mathy agents interact with environments through sequences of ineteractions called episodes, which follow a standard RL episode lifecycle:
- set state to an initial state from the environment
- while state is not terminal
- take an action and update state
Because algebra problems are only a tiny sliver of what can be represented using math expression trees, Mathy has customization points to allow altering or creating entirely new environments with little effort.
Generating a new problem type while subclassing a base environment is probably the simplest way to create a custom challenge for the agent.
You can inherit from a base environment like Poly Simplify which has win-conditions that require all the like-terms to be gone from an expression, and all complex terms be simplified. From there you can provide any valid input expression:
from mathy import MathyEnv, MathyEnvProblem, MathyEnvProblemArgs class CustomSimplifyEnv(MathyEnv): def get_env_namespace(self) -> str: return "custom.polynomial.simplify" def problem_fn(self, params: MathyEnvProblemArgs) -> MathyEnvProblem: return MathyEnvProblem("4x + y + 13x", 3, self.get_env_namespace()) env: MathyEnv = CustomSimplifyEnv() state, problem = env.get_initial_state() assert problem.text == "4x + y + 13x" assert problem.complexity == 3
Build your own tree transformation actions and use them with the built-in agents:
"""Environment with user-defined actions""" from mathy import ( AddExpression, BaseRule, MathyEnvState, NegateExpression, SubtractExpression, envs, MathyEnv, ) class PlusNegationRule(BaseRule): """Convert subtract operators to plus negative to allow commuting""" @property def name(self) -> str: return "Plus Negation" @property def code(self) -> str: return "PN" def can_apply_to(self, node) -> bool: is_sub = isinstance(node, SubtractExpression) is_parent_add = isinstance(node.parent, AddExpression) return is_sub and (node.parent is None or is_parent_add) def apply_to(self, node): change = super().apply_to(node) change.save_parent() # connect result to node.parent result = AddExpression(node.left, NegateExpression(node.right)) result.set_changed() # mark this node as changed for visualization return change.done(result) class CustomActionEnv(envs.PolySimplify): def __init__(self, **kwargs): super().__init__(**kwargs) self.rules = MathyEnv.core_rules() + [PlusNegationRule()] env = CustomActionEnv() state = MathyEnvState(problem="4x - 2x") expression = env.parser.parse(state.agent.problem) action = env.random_action(expression, PlusNegationRule) out_state, transition, _ = env.get_next_state(state, action) assert out_state.agent.problem == "4x + -2x"
Custom Win Conditions¶
Environments can implement their own logic for win conditions, or inherit them from a base class:
"""Custom environment with win conditions that are met whenever two nodes are adjacent to each other that can have the distributive property applied to factor out a common term """ from typing import Optional from mathy import ( DistributiveFactorOutRule, MathExpression, MathyEnv, MathyEnvState, MathyObservation, is_terminal_transition, time_step, ) class CustomWinConditions(MathyEnv): rule = DistributiveFactorOutRule() def transition_fn( self, env_state: MathyEnvState, expression: MathExpression, features: MathyObservation, ) -> Optional[time_step.TimeStep]: # If the rule can find any applicable nodes if self.rule.find_node(expression) is not None: # Return a terminal transition with reward return time_step.termination(features, self.get_win_signal(env_state)) # None does nothing return None env = CustomWinConditions() # This state is not terminal because none of the nodes can have the distributive # factoring rule applied to them. state_one = MathyEnvState(problem="4x + y + 2x") transition = env.get_state_transition(state_one) assert is_terminal_transition(transition) is False # This is a terminal state because the nodes representing "4x + 2x" can # have the distributive factoring rule applied to them. state_two = MathyEnvState(problem="4x + 2x + y") transition = env.get_state_transition(state_two) assert is_terminal_transition(transition) is True
Custom Timestep Rewards¶
Specify which actions the agent should be rewarded for using and which it should be penalized for:
"""Environment with user-defined rewards per-timestep based on the rule that was applied by the agent.""" from typing import List, Type from mathy import BaseRule, MathyEnv, MathyEnvState from mathy.rules import AssociativeSwapRule, CommutativeSwapRule class CustomTimestepRewards(MathyEnv): def get_rewarding_actions(self, state: MathyEnvState) -> List[Type[BaseRule]]: return [AssociativeSwapRule] def get_penalizing_actions(self, state: MathyEnvState) -> List[Type[BaseRule]]: return [CommutativeSwapRule] env = CustomTimestepRewards() problem = "4x + y + 2x" expression = env.parser.parse(problem) state = MathyEnvState(problem=problem) _, transition, _ = env.get_next_state( state, env.random_action(expression, AssociativeSwapRule), ) # Expect positive reward assert transition.reward > 0.0 _, transition, _ = env.get_next_state( state, env.random_action(expression, CommutativeSwapRule), ) # Expect neagative reward assert transition.reward < 0.0
Custom Episode Rewards¶
Specify (or calculate) custom floating point terminal reward values:
"""Environment with user-defined terminal rewards""" from mathy import MathyEnvState, envs, is_terminal_transition from mathy.rules import ConstantsSimplifyRule class CustomEpisodeRewards(envs.PolySimplify): def get_win_signal(self, env_state: MathyEnvState) -> float: return 20.0 def get_lose_signal(self, env_state: MathyEnvState) -> float: return -20.0 env = CustomEpisodeRewards() # Win by simplifying constants and yielding a single simple term form state = MathyEnvState(problem="(4 + 2) * x") expression = env.parser.parse(state.agent.problem) action = env.random_action(expression, ConstantsSimplifyRule) out_state, transition, _ = env.get_next_state(state, action) assert is_terminal_transition(transition) is True assert transition.reward == 20.0 assert out_state.agent.problem == "6x" # Lose by applying a rule with only 1 move remaining state = MathyEnvState(problem="2x + (4 + 2) + 4x", max_moves=1) expression = env.parser.parse(state.agent.problem) action = env.random_action(expression, ConstantsSimplifyRule) out_state, transition, _ = env.get_next_state(state, action) assert is_terminal_transition(transition) is True assert transition.reward == -20.0 assert out_state.agent.problem == "2x + 6 + 4x"
Mathy has basic support for alternative Reinforcement Learning libraries.
Mathy has support OpenAI gym via a small wrapper.
You can import the
mathy.envs.gym module separately to register the environments:
#!pip install gym import gym import mathy.envs.gym from mathy.state import MathyObservation all_envs = gym.envs.registration.registry.all() # Filter to just mathy registered envs mathy_envs = [e for e in all_envs if e.id.startswith("mathy-")] assert len(mathy_envs) > 0 # Each env can be created and produce an initial observation without # special configuration. for gym_env_spec in mathy_envs: wrapper_env: mathy.envs.gym.MathyGymEnv = gym.make(gym_env_spec.id) assert wrapper_env is not None observation: MathyObservation = wrapper_env.reset() assert isinstance(observation, MathyObservation) assert observation is not None
Mathy does not currently have a TF-agents environment wrapper.
That being said, it may be possible to use it indirectly through the OpenAI Gym wrapper.
TF-Agents seems like a great library.
If you use it and would like to contribute a PR to add support, it would be very welcomed.