Skip to content

Swarm Planning

Mathy can solve problems using a planning only algorithm that does not require a pretrained model to be installed.

It comes with support for planning using a swarming algorithm from the fragile library.

The basic idea behind fragile's swarm planning is to simulate a bunch of different possible actions at the same time, then select the most rewarding one to take.

In practice, the swarm planning algorithm can solve almost all of the mathy environments with little effort.

Solve Many Tasks

Because the swarm planning algorithm doesn't require training, we can apply it to any task that Mathy exposes, and expect to see a decent result.

Open Example In Colab

#!pip install gym

import random

import gym

from mathy.agents.fragile import SwarmConfig, swarm_solve
from mathy.envs.gym import MathyGymEnv

config = SwarmConfig(max_iters=10)
task = random.choice(["poly", "binomial", "complex"])
env: MathyGymEnv = gym.make(f"mathy-{task}-easy-v0")
_, problem = env.mathy.get_initial_state(env.env_problem_args)
swarm_solve(problem.text, config)

Generate Training Datasets

Fragile has built-in support for generating batched datasets that can be used to train ML models. A basic example goes like this:

Open Example In Colab

#!pip install gym

import random

import gym

from mathy.agents.fragile import SwarmConfig, mathy_swarm
from mathy.envs.gym import MathyGymEnv
from mathy.state import MathyEnvState

# Which values do we want from the history tree?
history_names = ["states", "actions", "rewards"]

# Configure the swarm
swarm = mathy_swarm(SwarmConfig(history=True, history_names=history_names))

# Run the swarm to generate tree history
swarm.run()

# Sample random batches
random_batches = swarm.tree.iterate_nodes_at_random(batch_size=32, names=history_names)
total_set = set()
total_generated = 0
for states, actions, rewards in random_batches:
    texts = [MathyEnvState.from_np(s).agent.problem for s in states]
    total_generated += len(texts)
    unique = list(set(texts))
    total_set.update(unique)
best_state = MathyEnvState.from_np(swarm.walkers.states.best_state)
swarm.env._env._env.mathy.print_history(best_state)
print(f"Generated {total_generated} states, {len(total_set)} of which are unique")
print(f"Highest reward encountered: {swarm.walkers.states.best_reward}")
print(best_state.agent.problem)


Last update: April 19, 2020