Enhancing Your AI Using Unity Machine Learning Agents

In Unity 2021+

So you’ve got your game going with some awesome artificial intelligence (AI) characters. You have this neat behavior tree set up with some fancy moves. How about taking this up a notch by making your AI learn, adapt, and surprise you with each move you make? Here we shall explore using Unity’s Machine Learning Agent (ML-Agent) to achieve this.

Photo by Arseny Togulev on Unsplash

Understanding ML-Agents

Unity ML-Agents is an open-source framework that allows developers to incorporate machine learning capabilities into their games and simulations. It uses reinforced learning, where agents learn from interaction with the environment to improve their performance over time.

I won’t delve too much into what this magic is all about. But, if you’re interested to find out more, head over here.

This article serves as a guide to how we introduced ML-Agents, to an ongoing project to create a learning environment for our AI.

We will be covering the following in this post:

  1. Installation
  2. Agent Setup
  3. Rules: Observation, Action, Reward
  4. Training
  5. Evaluation
  6. Deployment

Without going into the details of the AI, I’m going to assume the following have already been set up:

  • A scene where I have an AI bot (that I’m going to name botBehaviour, which you’ll see referenced later in our code) that can move around and collect items.
  • The bot takes damage to health and armor over time.
  • There are collectibles spawned in the scene.
  • The collectibles respawn after a fixed duration after they are collected by the bot.
  • The scene is bound so the AI bot will never drop out of the world.


The following is required:

  • Unity (2021.3+)
  • Python (3.8+)
  • ML-Agents (we’ll be using release 20 here)

Unity Package — ML-Agents

Go ahead and clone ML-Agents from the Unity ML-Agents Toolkit.

Once done, copy thecom.unity.ml-agentsfolder into the Packages folder of your Unity project. You can also add this through the package manager if you prefer.

Python Package — mlagents

The recommendation is to set this up in a virtual environment. I’m setting this in for MacOS (refer to this for other system setup).

  1. Create a directory for the virtual environment.
    mkdir ~/python-envs
  2. Create a new virtual environment.
    python3 -m nenv ~/python-envs/ml-agents
  3. Activate the virtual environment.
    source ~/python-envs/ml-agents/bin/activate
  4. Upgrade to the latest pip.
    pip3 install –upgrade pip
  5. Upgrade to the latest setup tools.
    pip3 install –upgrade setuptools
  6. Almost but not quite there yet. Next, go ahead and install PyTorch.
    pip3 install torch~=1.9.1 -f https://download.pytorch.org/whl/torch_stable.html
  7. Finally, install mlagents.
    python -m pip install mlagents==0.30.0
  8. The ml-agent package seems to run with an older version of protobuf so I had to downgrade my protobuf package. You don’t have to do this if it’s not the case for you.
    pip install –upgrade "protobuf<=3.20.1"

To check if everything is installed correctly, type mlagents-learn –help. If you see a bunch of help usage prompts then we’re all good to go!

If you require more information regarding the installation, you can refer to this site for help.

Agent Setup

Now that we’ve installed the necessary packages, let’s get some action in.

Take a look at this very awesome behavior tree. Essentially, this bot is randomly trying to pick up collectibles every 5 seconds.

Behavior tree.

We want to create an agent that seeks a particular collectible based on certain conditions instead of randomly looking for any collectible at each turn. Say, if we’re low on health, grab a health collectible instead of the nearest available weapon.

Now let’s create our agent.

  1. Create a new script MLAgent.cs and add the following:


We’ll be filling up the functions in the next few sections.

2. Locate your bot.prefab and attach this script to it.

There will be a behavior parameters script attached as well. In this script set the behaviour name to Collector. As for the rest leave it as default for now.

Behavior Parameters script

Now we’re ready to start adding some rules to the agent.

Rules: Observation, Action, Reward


Before we can make any observations to the agent, we want to make sure that the agent has a target to work towards.

This is what we’re essentially doing in OnEpisodeBegin. We’re telling our agent, “At the start, please look for a random collectible. If you have found one, move towards it and collect if it possible. If not, just move to a random position within range.”


Next, we need to identify data points the agents can use to perceive the environment and learn. Depending on what you’re trying to achieve these can include, position, velocity, distance to target, etc.


Here we will be looking at what type of collectible the bot has collected and how much armor and health it has.


These are what we define as decisions that the agents make to interact with the environment. In your AI-behaviour tree, this is what you have when you make a choice when you encounter a certain condition.

For us, that would be choosing the next collectible in the behavior tree above.


For our agent, what we want here is for the bot to move to our target location. Once the bot has arrived, it will stay put for a while before trying to find a new target to move towards.

If the bot is no longer alive we want to stop this episode and restart a new episode.


This is an essential aspect of the reinforced learning process. We assign positive or negative rewards based on the agent’s behavior and goals which we want to achieve. Depending on how we award the rewards we can guide the agents towards desired behaviours.

For example, when our bot is low on health, we want to award our bot with a positive reward for collecting a health collectible and a negative reward for doing otherwise.


OnCollected is called when the bot has collected the collectible. Here I’m teaching my agent that if you’re low on health or armour it’s good to get a health or armour collectible respectively. If you collect a weapon during these conditions it’s a boo boo.

I’ve also made sure that every time my bot collects a health or armor collectible, it gets respawned after an interval. In my set up my bot drops a weapon every time it collects a different one so there’s no need to respawn any weapon collectibles in this case.

Here’s how the final script should look like:


Since there is no combat happening to damage the bot, I’m just going to decrease the health and armor per frame in the update loop. However, I don’t really want to kill my bot at this time. So when it gets low on health, I’ll just heal it back to full health.

Alternatively, you can go ahead and reset the bot when it’s killed, then EndEpisode will be called and a new episode for training will begin. Make sure to reset your bot in OnEpisodeBegin in this case.

I’ve left the heuristic section blank since my bot is constantly moving towards a target. If you need to test your bot with a given input, you can also extend and implement the Heuristic section.

Hang in there, we’re almost ready.

Remember the behavior parameters on the bot? Set the space size to 3. This corresponds to the number of data that we have added in CollectObservations.


Now that you have your rules in place, it’s time to head over to the gym to train your agents. Training allows your agent to learn from interactions with the environment and awarded rewards.

  1. In your Unity project create the following ml_agent_config.yaml file in a config folder — this should be in the root of the project and not in the Asset folder.

    For this example, I’m keeping most of the parameters as default but tweak them according to your needs. I’m keeping the steps small here but the more you train your agent the more accurate it becomes. More configuration set up can be found here.


2. In your terminal make sure that the python environment is activated

3. Navigate to your Unity project then run the following command:

mlagents-learn config/ml_agent_config.yaml –run-id=Collector

If we’ve done everything right, we should see a something like:

[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.

4. Press play in the editor. Now let’s wait…

Photo by Joshua Earle on Unsplash

5. When you see the following result, it means we’re finally done! Yay~

[INFO] Hyperparameters for behavior name Collector: 
trainer_type: ppo
batch_size: 1024
buffer_size: 10240
learning_rate: 0.0003
beta: 0.005
epsilon: 0.2
lambd: 0.99
num_epoch: 3
shared_critic: False
learning_rate_schedule: linear
beta_schedule: linear
epsilon_schedule: linear
normalize: False
hidden_units: 256
num_layers: 1
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
deterministic: False
gamma: 0.99
strength: 1.0
normalize: False
hidden_units: 128
num_layers: 2
vis_encode_type: simple
memory: None
goal_conditioning_type: hyper
deterministic: False
init_path: None
keep_checkpoints: 5
checkpoint_interval: 50000
max_steps: 50000
time_horizon: 64
summary_freq: 10000
threaded: False
self_play: None
behavioral_cloning: None
[INFO] Collector. Step: 10000. Time Elapsed: 57.982 s. Mean Reward: 1.974. Std of Reward: 1.491. Training.
[INFO] Collector. Step: 20000. Time Elapsed: 134.329 s. Mean Reward: 2.500. Std of Reward: 2.480. Training.
[INFO] Collector. Step: 30000. Time Elapsed: 204.257 s. Mean Reward: 3.150. Std of Reward: 2.798. Training.
[INFO] Collector. Step: 40000. Time Elapsed: 282.698 s. Mean Reward: 0.475. Std of Reward: 2.922. Training.
[INFO] Collector. Step: 50000. Time Elapsed: 357.583 s. Mean Reward: 5.225. Std of Reward: 3.215. Training.
[INFO] Exported results/Collector/Collector/Collector-50064.onnx
[INFO] Copied results/Collector/Collector/Collector-50064.onnx to results/Collector/Collector.onnx.


After training, it’s important to determine if your agents’ performance is up to standard. Test them in various environments to see if they meet your criteria. If not fine-tune the rules, then rinse and repeat.


Once you’re satisfied with the trained agents’ performance, it’s time to deploy that in your game.

  1. Create a new folder TFModel in Assets/ML-Agents in your Unity project.
  2. Locate your trained model Collector.onnx in results/Collector and copy it to Assets/ML-Agents/TFModel
  3. Select your mlAgent prefab then assign Collector.onnx to your Model in behaviour parameters.
  4. Set the Inference Device to CPU.

Click play in the editor and watch your newly trained agent go!


There we have it! By throwing in ML-Agents you have made your game a playground for AI evolution. No more predictable or hardcoded scripts — your AI can grow and adapt to the environment, and you’re now creating a dynamic and engaging experience for your players.

Try it for yourself in our battle royale in-development, Mighty Action Heroes.

Don’t forget to follow Mighty Bear Games on Medium for more articles like this, and drop me some claps if you found this article helpful!

Enhancing Your AI Using Unity Machine Learning Agents was originally published in Mighty Bear Games on Medium, where people are continuing the conversation by highlighting and responding to this story.