Monday, March 12, 2018

Using Unity's machine learning to teach bots to play my game Frog Smashers

Ever since it was announced I've been very excited to try out the new ML library in Unity, but I haven't had time until Game Jam Island. This past week I've been setting up and doing some experiments. I wanted to see if it can be used in a real-world scenario for game development, so I set about trying to create some bots for Frog Smashers.

Frog Smashers is a simple local multiplayer versus game (made during the first Game Jam Island) where Frogs try to hit each other out the screen with baseball bats. It's some sort of a mixture between Samurai Gunn and Ninendo's Smash series. It's a good choice because of its familiar mechanics, and small enough of a project for it to be malleable, and simple enough that I'd expect good results from machine learnt bots.

If you want more information on how to setup Machine Learning in Unity or how it works, I recommending starting here. 

Here is a video of the end result.  That is at about 3 million steps of training.


The first step was to change all game logic to run in FixedUpdate, so the game can run at the massive timescales required to train within a reasonable amount of time. This might be a problem for larger/different games and I'm not quite sure how to address it, Unity would need an engine-level feature for faking Time.deltatime. Luckily the project was small enough that this wasn't a big issue.

I used a vectorised observation for each frog, consisting of position, velocity, and some state information (e.g. are they currently knocked/attacking). I've found with ML that keeping the observations as small as possible gives better results - I've seen some people try a strategy of spamming as much information as possible and trusting ML to figure out what's important, but every time I've tried that, training just seemed way slower, without improving on a simplified model even at high amounts of steps.  I tried two strategies, the first would simply track the agent's own frog and the closest enemy frog, the second has an array of observations corresponding to 4 frogs. I expected the 4 frog observation model to eventually perform better, but it did not discernably do so even at high (~5 million) steps. Given that the closest frog training model also scales better to any amount of frogs, that's the one that I'd recommend (the linked FrogAgent is the 4 frog model, but switching between the two is trivial).

This model would require retraining for each level, as the bots will learn the layout of the level implicitly. If I were to try make it generalised, I would make a camera that can only see the level layout (and perhaps frogs themselves) with level switching and an academy reset, but I felt that was beyond the scope of what I was trying to do. Still it'd be interesting to see if a camera observation of the level makes them learn the layout more effectively.

The first iteration had a continuous action vector setup, (e.g. input.left == act[0] >1) but switching over to a discrete action vector setup made a HUGE difference. The second iteration had a discrete action space size of 128 (e.g. 8 bits, one corresponding to each bool in InputState), and I'd bitmask to find whether each button was pressed. This still had a fair amount of redundancy, in that in this game pressing left & right together is the same as pressing neither, and frogs can't attack and use tongue at the same time. Removing those redundancies and decreasing the action space size to 57 (9 directions * 3 attack states * 2 jump or not) made another near order-of-magnitude difference in training time.

Tuning the reward function takes some time and just a bunch of guesswork. I recommend setting the game window's size larger than the default and making keyboard controls to slow down or speed up timescale, so you can keep an eye on what your agents are doing.  For the most part, my frogs (like most frogs) only care about hitting each other with baseball bats. There are some penalties applied for missing attacks, to prevent them from continually spamming the attacks, and a minor conservation of energy penalty for jumping. Adding a small 'hint' reward for closing the distance to the nearest enemy also speeds up the first steps of training significantly. Adding a reward for knocking a frog out of the level (rather than just hitting him at all), did not make a big discernible difference in output, though I suspect at ~10 million steps of training you might see them starting to strategise a bit more.

The agents do still behave quite botlike, and typically rely on their advanced reaction speed to win. Still, at ~3 million steps they play on a reasonably advanced level and will likely be able to beat most humans. One of the challenges would be to get them to play a bit more human and to slow down their reaction speeds. Increasing the amount of timesteps between decisions will likely help, I'd also be interested in seeing what happens if you buffer input for about 0.15 seconds before executing, so that they have to play with more human reaction speeds.  I'm also not sure how you would go about scaling bot difficulty with a single brain, beyond slowing down their reaction speeds.

I'm confident that with more tweaking and training I could get even better results that I did currently, however for a day's work, which included reworking the project to run in FixedUpdate, setting up training parameters and training overnight, these results are already very encouraging and I'm really looking forward to doing some more ML experiments and seeing what people come up with as the tools mature.

You can tweet me at @rrza with any questions/comments, thanks for reading! Frogs.

No comments:

Post a Comment