mathy.env

MathyEnv

MathyEnv(
    self,
    rules: List[mathy.core.rule.BaseRule] = None,
    max_moves: int = 20,
    verbose: bool = False,
    reward_discount: float = 0.99,
)
Implement a math solving game where a player wins by executing the right sequence of actions to reduce a math expression to an agreeable basic representation in as few moves as possible.

action_size

Return the number of available actions

core_rules

MathyEnv.core_rules(
    preferred_term_commute: bool = False,
) -> List[mathy.core.rule.BaseRule]
Return the mathy core agent actions

finalize_state

MathyEnv.finalize_state(self, state:mathy.state.MathyEnvState)
Perform final checks on a problem state, to ensure the episode yielded results that were uncorrupted by transformation errors.

get_action_indices

MathyEnv.get_action_indices(self, action:int) -> Tuple[int, int]
Get the normalized action/node_index values from a given absolute action value.

Returns a tuple of (rule_index, node_index)

get_agent_actions_count

MathyEnv.get_agent_actions_count(
    self,
    env_state: mathy.state.MathyEnvState,
) -> int
Return number of all possible actions

get_env_namespace

MathyEnv.get_env_namespace(self) -> str
Return a unique dot namespaced string representing the current environment. e.g. mycompany.envs.differentiate

get_initial_state

MathyEnv.get_initial_state(
    self,
    params: Optional[mathy.types.MathyEnvProblemArgs] = None,
    print_problem: bool = True,
) -> Tuple[mathy.state.MathyEnvState, mathy.types.MathyEnvProblem]
Generate an initial MathyEnvState for an episode

get_lose_signal

MathyEnv.get_lose_signal(self, env_state:mathy.state.MathyEnvState) -> float
Calculate the reward value for failing to complete the episode. This is done so that the reward signal can be problem-type dependent.

get_next_state

MathyEnv.get_next_state(
    self,
    env_state: mathy.state.MathyEnvState,
    action: int,
    searching: bool = False,
) -> Tuple[mathy.state.MathyEnvState, mathy.time_step.TimeStep, mathy.core.rule.ExpressionChangeRule]

Parameters

  • env_state: current env_state
  • action: action taken
  • searching: boolean set to True when called by MCTS

Returns

next_state: env_state after applying action

transition: the timestep that represents the state transition

change: the change descriptor describing the change that happened

get_penalizing_actions

MathyEnv.get_penalizing_actions(
    self,
    state: mathy.state.MathyEnvState,
) -> List[Type[mathy.core.rule.BaseRule]]
Get the list of penalizing action types. When these actions are selected, the agent gets a negative reward.

get_rewarding_actions

MathyEnv.get_rewarding_actions(
    self,
    state: mathy.state.MathyEnvState,
) -> List[Type[mathy.core.rule.BaseRule]]
Get the list of rewarding action types. When these actions are selected, the agent gets a positive reward.

get_state_transition

MathyEnv.get_state_transition(
    self,
    env_state: mathy.state.MathyEnvState,
    searching: bool = False,
) -> mathy.time_step.TimeStep
Given an input state calculate the transition value of the timestep.

Parameters

  • env_state: current env_state
  • searching: True when called by MCTS simulation

Returns

transition: the current state value transition

get_token_at_index

MathyEnv.get_token_at_index(
    self,
    expression: mathy.core.expressions.MathExpression,
    focus_index: int,
) -> Optional[mathy.core.expressions.MathExpression]
Get the token that is focus_index from the left of the expression

get_valid_moves

MathyEnv.get_valid_moves(self, env_state:mathy.state.MathyEnvState) -> List[int]
Get a vector the length of the action space that is filled with 1/0 indicating whether the action at that index is valid for the current state.

get_valid_rules

MathyEnv.get_valid_rules(self, env_state:mathy.state.MathyEnvState) -> List[int]
Get a vector the length of the number of valid rules that is filled with 0/1 based on whether the rule has any nodes in the expression that it can be applied to.

Note

If you want to get a list of which nodes each rule can be applied to, prefer to use the get_valid_moves method.

get_win_signal

MathyEnv.get_win_signal(self, env_state:mathy.state.MathyEnvState) -> float
Calculate the reward value for completing the episode. This is done so that the reward signal can be scaled based on the time it took to complete the episode.

max_moves_fn

MathyEnv.max_moves_fn(
    self,
    problem: mathy.types.MathyEnvProblem,
    config: mathy.types.MathyEnvProblemArgs,
) -> int
Return the environment specific maximum move count for a given prolem.

MathyEnv.print_state(
    self,
    env_state: mathy.state.MathyEnvState,
    action_name: str,
    token_index: int = -1,
    change: mathy.core.rule.ExpressionChangeRule = None,
    change_reward: float = 0.0,
)
Render the given state to stdout for visualization

problem_fn

MathyEnv.problem_fn(
    self,
    params: mathy.types.MathyEnvProblemArgs,
) -> mathy.types.MathyEnvProblem
Return a problem for the environment given a set of parameters to control problem generation.

This is implemented per environment so each environment can generate its own dataset with no required configuration.

random_action

MathyEnv.random_action(
    self,
    expression: mathy.core.expressions.MathExpression,
    rule: Type[mathy.core.rule.BaseRule],
) -> int
Get a random action index that represents a particular rule

render_state

MathyEnv.render_state(
    self,
    env_state: mathy.state.MathyEnvState,
    action_name: str,
    token_index: int = -1,
    change: mathy.core.rule.ExpressionChangeRule = None,
    change_reward: float = 0.0,
)
Render the given state to a string suitable for printing to a log

state_to_observation

MathyEnv.state_to_observation(
    self,
    state: mathy.state.MathyEnvState,
    rnn_size: Optional[int] = None,
    rnn_state_h: Optional[List[float]] = None,
    rnn_state_c: Optional[List[float]] = None,
    rnn_history_h: Optional[List[float]] = None,
) -> mathy.state.MathyObservation
Convert an environment state into an observation that can be used by a training agent.

to_hash_key

MathyEnv.to_hash_key(self, env_state:mathy.state.MathyEnvState) -> str
Convert env_state to a string for MCTS cache

transition_fn

MathyEnv.transition_fn(
    self,
    env_state: mathy.state.MathyEnvState,
    expression: mathy.core.expressions.MathExpression,
    features: mathy.state.MathyObservation,
) -> Optional[mathy.time_step.TimeStep]
Provide environment-specific transitions per timestep.