Multi-Agent Collaboration for Wind Farm Control

Multi-Agent Collaboration has been a developing interest of mine. Especially how agents could potentially communicate with each other. Could they develop their own language? Does communication has a cost? And can collaboration and communication of decentralized agents surpas traditional reinforcement learning agent environment setups? Those are questions I started asking myself and so this has been developed from a initial sketch to explore those questions. As a proof of concept, I have setup an experiment with two competing agent environments. One with a single agent controlling all turbines in a wind farm and the other with each turbine being an agent, making their own decisions. The collective of agents can communicate and share data with a set number of neighbours, share wind directions and learn to predict change locally, in order to optimize orientation and therefor energy generated by the wind farm.

I have been inspired by the real world problem solved by DeepMind. Reducing energy used to cool data-centres by 40%: https://deepmind.com/blog/article/deepmind-ai-reduces-google-data-centre-cooling-bill-40

Following that, I thought about optimizing the generation of energy using RL. This sketch can also be explored as a WebApp here: https://philippds-pages.github.io/RL-Wind-Farm_WebApp/

Offshore Wind Turbine Farm

Multi-Agent Collaboration

Agent & Environment Setup:

Set-up: Each Wind Farm has 8 turbines. 3D Perlin Noise is used to simulate wind as well as a main wind direction. Each turbine can be rotated.
Goal: Maximize orientation efficiency. A turbine is optimally oriented against the wind.
Agents: The environment contains one agent.
Agent Reward Function: -0.0 to -1.0 negative reward for rotation angle with wind direction (270 to 90 degree). +0.1 to +1.0 positive reward for rotation angle against wind direction (90 to 270 degree).
Behavior Parameters:
Single Agent Observation space:
Agent Vector Observations: 72
Total Vector Observations per Wind Farm: 72
For all turbines in farm:
Turbine direction vector
Turbine location vector
Wind direction vector at turbine
Multi Agent Observation space:
Agent Vector Observations: 27
Total Vector Observations per Wind Farm: 216 Individual Agent:
Turbine direction vector
Turbine location vector
Wind direction vector at turbine
For all neighbours:
Turbine location vector
Wind direction vector at turbine
Actions: 3 discrete actions: rotate left, do nothing, rotate right.
Benchmark Max Reward: 2000

Hyperparameters:

Single-Agent

The Wind Farm is controlled by a single agent. Controlling angles of individual turbine. Objective is to maximize efficiency of the Wind Farm.

Multi-Agent

Each turbine in the Wind Farm is an individual agent, controlling it's own angle. Agents can share data with each other. Objective is to maximize efficiency of the Wind Farm.

Single-Agent Training: Max. 100k Step Count

Multi-Agent Collaboration Training: Max. 100k Step Count

Tensorboard Results: Max. 1mil Step Count

Max. Cumulative Reward: 2000
Single Agent: 1h 58m 26s - Mean Cumulative Reward: 1572
Multi-Agent Collab: 53m 40s - Mean Cumulative Reward: 1963