README.md 9.52 KB
Newer Older
hanzo's avatar
init    
hanzo committed
1
2
# AlphaZeroICGA

hanzo's avatar
init    
hanzo committed
3
4
## Table des matières

Enzo DURAND's avatar
Enzo DURAND committed
5
6
7
8
1. [Presentation](#presentation)
2. [Project Artchitecture](#project-architecture)
2. [Competition](#competition)
3. [Baseline](#baseline)
Enzo DURAND's avatar
Enzo DURAND committed
9
4. [Environment & Setup](#environment--setup)
Enzo DURAND's avatar
Enzo DURAND committed
10
5. [Try it](#try-it)
Enzo DURAND's avatar
Enzo DURAND committed
11
6. [Fight it](#fight-it)
Enzo DURAND's avatar
Enzo DURAND committed
12
7. [What I learned](#what-i-learned)
hanzo's avatar
init    
hanzo committed
13

hanzo's avatar
init    
hanzo committed
14
## Presentation
hanzo's avatar
init    
hanzo committed
15

hanzo's avatar
init    
hanzo committed
16
<p align="center"><img width="800" src="img.jpg"></p>
hanzo's avatar
init    
hanzo committed
17

Enzo DURAND's avatar
Enzo DURAND committed
18
19
20
21
22
23
24
Implementing deep reinforcement learning algorithms for the ICGA competition. This project is carried out for my 1st year of master internship at the LIP6 (Sorbonne University / CNRS).

## Project architecture

<pre><code>AlphaZeroICGA/
      ├── src/
      |       ├── main/
25
26
27
28
29
30
31
32
      |       |      ├── agents/               (Contains the jar files of the final agents)
      |       |      ├── bin/                  (Contains the binary files compiled from src_java)
      |       |      ├── datasets/             (Contains the (state,distrib,value) datasets)
      |       |      ├── final_model/          (Contains the final weights of the best models)
      |       |      ├── libs/                 (Contains the librairies such as JPY/Ludii...)
      |       |      ├── models/               (Contains the current models)
      |       |      ├── src_java/             (Contains all the source code in java)
      |       |      ├── src_python/           (Contains all the source code in python)
33
34
      |       |      |      ├── brain/         (Contains the deep learning part)
      |       |      |      ├── mcts/          (Contains the vanilla MCTS and AlphaZero MCTS)
35
      |       |      |      ├── optimization/  (Contains the optimization part such as precomputations)
36
37
      |       |      |      ├── other/         (Contains utility files)
      |       |      |      ├── run/           (Contains files runned by java files such as dojo, trials...)
38
      |       |      |      ├── scripts/       (Contains all the scripts such as merge_datasets.py)
39
40
      |       |      |      ├── settings/      (Contains the hyperparameters and games settings)
      |       |      |      └── utils.py       (File containing the utility functions)
41
42
43
      |       |      ├── alphazero.py          (Script running the whole AlphaZero algorithm)
      |       |      ├── build.xml             (Build file helping us run java commands, clean...)
      |       |      └── notes.txt             (Some notes I left while doing that project)
44
45
      |       └── test/                        (Some Ludii tutorials and tests)
      ├── alphazero_env.yml                    (Conda environment save)
Enzo DURAND's avatar
Enzo DURAND committed
46
47
48
      ├── README.md
      └── LICENSE
</pre></code>
hanzo's avatar
init    
hanzo committed
49

hanzo's avatar
init    
hanzo committed
50
## Competition
hanzo's avatar
init    
hanzo committed
51

hanzo's avatar
init    
hanzo committed
52
"The Ludii AI Competition involves general game playing events focussed on developing agents that can play a wide variety of board games. The events use the Ludii general game system to provide the necessary games and API. Games will be provided in the Ludii game description format (.lud). The version used for this competition (1.3.2) of Ludii includes over 1,000 games.
hanzo's avatar
init    
hanzo committed
53

hanzo's avatar
init    
hanzo committed
54
Three events are proposed :
hanzo's avatar
init    
hanzo committed
55

hanzo's avatar
init    
hanzo committed
56
57
- Kilothon: Best utility obtained on more than 1,000 games against UCT.
- General Game Playing (GGP): Competiton on games present or not in our library.
Enzo DURAND's avatar
Enzo DURAND committed
58
- Learning: A set of games are announced months before the actual competition, the agents are invited to learn before competing."
hanzo's avatar
init    
hanzo committed
59

Enzo Durand's avatar
Enzo Durand committed
60
**Here we focus on the learning event.**
hanzo's avatar
init    
hanzo committed
61

Enzo DURAND's avatar
Enzo DURAND committed
62
Links :
hanzo's avatar
hanzo committed
63
64
65
66
67
68
69
70
71
72
73
74
- https://icga.org/?page_id=3468
- https://github.com/Ludeme/LudiiAICompetition

## Games

The different games of the learning event this year are :
- Bashni: https://ludii.games/details.php?keyword=Bashni
- Ploy: https://ludii.games/details.php?keyword=Ploy
- Quoridor: https://ludii.games/details.php?keyword=Quoridor
- Mini Wars: https://ludii.games/details.php?keyword=Mini%20Wars
- Plakoto: https://ludii.games/details.php?keyword=Plakoto
- Lotus: https://ludii.games/details.php?keyword=Lotus
hanzo's avatar
init    
hanzo committed
75

Enzo DURAND's avatar
Enzo DURAND committed
76
77
78
79
## Baseline

We use deep reinforcement learning algorithms for this competition and we start with AlphaZero as a baseline. AlphaGo is an algorithm which can play Go at a super-human level using supervised learning and reinforcement learning. AlphaGo Zero can basically do the same but starting from scratch, hence the "Zero" in its name. AlphaZero does the same but it is able to play different games such as Chess and Shogi.

Enzo DURAND's avatar
Enzo DURAND committed
80
Links :
Enzo DURAND's avatar
Enzo DURAND committed
81
82
83
- https://www.nature.com/articles/nature16961 (AlphaGo)
- https://discovery.ucl.ac.uk/id/eprint/10045895/1/agz_unformatted_nature.pdf (AlphaGo Zero)
- https://arxiv.org/abs/1712.01815 (AlphaZero)
Enzo DURAND's avatar
Enzo DURAND committed
84
- https://arxiv.org/pdf/1903.08129.pdf (Hyper-parameters sweep on AlphaZero)
Enzo DURAND's avatar
Enzo DURAND committed
85
- https://www.scitepress.org/Papers/2021/102459/102459.pdf (Improvements to increase the efficiency of AlphaZero)
Enzo DURAND's avatar
Enzo DURAND committed
86

hanzo's avatar
init    
hanzo committed
87
## Environment & Setup
hanzo's avatar
init    
hanzo committed
88

hanzo's avatar
init    
hanzo committed
89
90
The games are hosted on the Ludii software, which is in java. Since we use python for our algorithms we will need a java-python bridge such as **JPY**. Microsoft Visual C++ 14.0 and Java JDK 7.0 are required to build JPY.
We also need the **Ludii** software to run our algorithms in the environment.
hanzo's avatar
hanzo committed
91
We compile a jar file in order to export our AI on Ludii thanks to ant so it is also required even though you can do it otherwise.
hanzo's avatar
init    
hanzo committed
92

93
Links :
hanzo's avatar
init    
hanzo committed
94
95
- https://github.com/Ludeme/LudiiPythonAI
- https://github.com/jpy-consortium/jpy
hanzo's avatar
init    
hanzo committed
96
97
98
- https://visualstudio.microsoft.com/visual-cpp-build-tools/
- https://www.oracle.com/java/technologies/downloads/
- https://maven.apache.org/download.cgi
hanzo's avatar
hanzo committed
99
- https://ant.apache.org/bindownload.cgi
hanzo's avatar
init    
hanzo committed
100
101
102
103
- https://ludii.games/download.php

First you need to clone Luddi and JPY repositories, then download C++ build and java JDK if you don't have it yet. Apache Maven is also required to build JPY. Once everything is installed go to the JPY folder and run :

hanzo's avatar
hanzo committed
104
`SET VS100COMNTOOLS=<visual-studio-tools-folder>`
Enzo Durand's avatar
Enzo Durand committed
105

hanzo's avatar
hanzo committed
106
`SET JDK_HOME=<your-jdk-dir>`
Enzo Durand's avatar
Enzo Durand committed
107

hanzo's avatar
hanzo committed
108
`SET PATH=<maven-bin-dir>`
Enzo Durand's avatar
Enzo Durand committed
109

hanzo's avatar
hanzo committed
110
`python setup.py build maven bdist_wheel`
hanzo's avatar
init    
hanzo committed
111

Enzo DURAND's avatar
Enzo DURAND committed
112
If everything worked, you should have a build directory. Copy the content of the lib directory into the Ludii directory in a folder called **/LudiiPythonAI/libs/**. The Ludii jar file should also be moved to the libs directory. Finaly, you can build the jar file thanks to ant and the xml file, then export it in Ludii.
hanzo's avatar
init    
hanzo committed
113

hanzo's avatar
hanzo committed
114
You might also have to specify some paths in the configuration files such as **jpyconfig.py** and **jpyconfig.properties**. You might aswell modify **build.xml** file in order to set the correct classpath for the JPY snapshot.
Enzo DURAND's avatar
Enzo DURAND committed
115

Enzo DURAND's avatar
Enzo DURAND committed
116
117
There is an **alphazero_env.yml** file which can be used to create a conda environnement from scratch with all the required librairies with the command `conda env create -f alphazero_env.yml`.

Enzo DURAND's avatar
Enzo DURAND committed
118
The required python librairies are :
Enzo DURAND's avatar
Enzo DURAND committed
119
120
121
- tensorflow-gpu (CUDA, cuDNN, TensorFlow...)
- onnx, onnxruntime, onnxruntime-gpu, tf2onnx
- numpy, matplotlib, keras
Enzo DURAND's avatar
Enzo DURAND committed
122

Enzo DURAND's avatar
Enzo DURAND committed
123
## Try it
hanzo's avatar
init    
hanzo committed
124

Enzo DURAND's avatar
Enzo DURAND committed
125
Go to the src/main/ directory and run the next commands in a terminal :
Enzo DURAND's avatar
Enzo DURAND committed
126

127
`nano src_python/settings/config.py` : set the settings to run AlphaZero such as number of simulations, game type...
Enzo DURAND's avatar
Enzo DURAND committed
128

129
`python3 alphazero.py <n_loop> <n_workers>` : runs the whole loop (MCTS simulation with random moves -> dataset -> train model -> save model -> MCTS simulation with model predicting moves -> dataset -> ...). **n_loop** is the number of loop it will achieve. **n_workers** is the number of processes which will be executed in parallel.
Enzo DURAND's avatar
Enzo DURAND committed
130

Enzo DURAND's avatar
Enzo DURAND committed
131
The python alphazero script does everything, the following commands are for debugging purposes :
Enzo DURAND's avatar
Enzo DURAND committed
132
133
134
135
136

`ant clean` : clean all the directories (**bin/** **build/** **models/** **datasets/**).

`ant build` : compile the java file in **bin/**.

137
`ant run_trials` : runs the MCTS simulations only (randomly or using the model depending if there is a model in **models/**) and creates a dataset.
Enzo DURAND's avatar
Enzo DURAND committed
138

139
140
141
`ant run_dojos` : runs a 1 versus 1 between the last model (the outsider) and the best current model (the champion model) and outputs some stats.

`ant run_tests` : runs tests against Ludii built-in AIs.
Enzo DURAND's avatar
Enzo DURAND committed
142

143
`ant train_model` : only trains the model using the dataset and save the best model.
Enzo DURAND's avatar
Enzo DURAND committed
144

Enzo DURAND's avatar
Enzo DURAND committed
145
`ant create_agent` : takes the best model and build an agent as a jar file for the Ludii software.
Enzo DURAND's avatar
Enzo DURAND committed
146

Enzo DURAND's avatar
Enzo DURAND committed
147
`python3 src_python/scripts/merge_datasets.py` : merges all the datasets in **datasets/** with an hash into a unique dataset.
Enzo DURAND's avatar
Enzo DURAND committed
148

Enzo DURAND's avatar
Enzo DURAND committed
149
`python3 src_python/scripts/merge_txts.py` : merges all the text files in **models/** with an hash into a unique txt file.
Enzo DURAND's avatar
Enzo DURAND committed
150

Enzo DURAND's avatar
Enzo DURAND committed
151
`python3 src_python/scripts/switch_model.py` : switch optimizer into champion and champion into old_star.
Enzo DURAND's avatar
Enzo DURAND committed
152

Enzo DURAND's avatar
Enzo DURAND committed
153
## Fight it
Enzo DURAND's avatar
Enzo DURAND committed
154

Enzo DURAND's avatar
Enzo DURAND committed
155
When the project will be over, the model will be available in the folder **models/final_model/** and the Ludii AI will be in the folder **agents/** as a jar file in order to load it in Ludii software. You will be able to load it against other AIs or against you on different games.
Enzo DURAND's avatar
Enzo DURAND committed
156
157
158

## What I learned

Enzo DURAND's avatar
Enzo DURAND committed
159
**General knowledge :**
Enzo DURAND's avatar
Enzo DURAND committed
160
161
162
163
- Papers implementation and understanding (AlphaGo, AlphaGo Zero, AlphaZero)
- Software architecture with different task communicating with each others (alphazero.sh)
- Java wrapper for python with JPY

Enzo DURAND's avatar
Enzo DURAND committed
164
**Deep learning :**
Enzo DURAND's avatar
Enzo DURAND committed
165
166
167
- Multi-headed neural networks (here for policy + value prediction)
- Huge CNN model with residual blocks and skip connection

Enzo DURAND's avatar
Enzo DURAND committed
168
**Reinforcement learning :**
Enzo DURAND's avatar
Enzo DURAND committed
169
- MCTS with UCB/PUCT scores
Enzo DURAND's avatar
Enzo DURAND committed
170
- State and action representation, reward system, temperature, dirichlet in policy for exploration etc...
Enzo DURAND's avatar
Enzo DURAND committed
171

Enzo DURAND's avatar
Enzo DURAND committed
172
**- Time and memory optimization :**
Enzo DURAND's avatar
Enzo DURAND committed
173
174
175
- Multithreading and GPU clusters (for the self-play games and the model training)
- Code optimization because the algorithm is very time consuming (use of profilers)
- Precomputing functions which are called huge amount of time (in MCTS algorithm)
Enzo DURAND's avatar
Enzo DURAND committed
176
- ONNX format for faster inference with models