Skip to content

Swarm Planning

Mathy can solve problems using a swarm-planning algorithm from the fragile library that does not require a pre-trained model.

The basic idea behind fragile's swarm planning is to simulate many different possible actions simultaneously, then select the most rewarding one to take.

In practice, the swarm planning algorithm can solve almost all of the mathy environments with little effort.

Solve Many Tasks

Because the swarm planning algorithm doesn't require training, we can apply it to any task that Mathy exposes and expect to see a decent result.

Open Example In Colab

#!pip install gym

import random

import gym
from mathy.solver import SwarmConfig, swarm_solve
from mathy_envs.gym import MathyGymEnv

config = SwarmConfig(max_iters=10)
task = random.choice(["poly", "binomial", "complex"])
env: MathyGymEnv = gym.make(f"mathy-{task}-easy-v0")
_, problem = env.mathy.get_initial_state(env.env_problem_args)
swarm_solve(problem.text, config)

Generate Training Datasets

Fragile has built-in support for generating batched datasets for training ML models. A basic example goes like this:

Open Example In Colab

#!pip install gym

from mathy.solver import SwarmConfig, mathy_swarm
from mathy_envs.state import MathyEnvState

# Which values do we want from the history tree?
history_names = ["states", "actions", "rewards"]

# Configure the swarm
swarm = mathy_swarm(SwarmConfig(history=True, history_names=history_names))

# Run the swarm to generate tree history

# Sample random batches
random_batches = swarm.tree.iterate_nodes_at_random(batch_size=32, names=history_names)
total_set = set()
total_generated = 0
for states, actions, rewards in random_batches:
    texts = [MathyEnvState.from_np(s).agent.problem for s in states]
    total_generated += len(texts)
    unique = list(set(texts))
best_state = MathyEnvState.from_np(swarm.walkers.states.best_state)
print(f"Generated {total_generated} states, {len(total_set)} of which are unique")
print(f"Highest reward encountered: {swarm.walkers.states.best_reward}")

Last update: November 22, 2020