Quick description of how to use the deep RL policies in PyDial¶

Note that this text is not in the quality of a full tutorial nor is it meant to be. Its purpose is merely to give guidance on how to use the deep RL policies which are part of the new PyDial release.

Running the algorithms with current domains¶

For the quick start with deepRL algorithms we suggests looking at the benchmarking paper:

https://arxiv.org/abs/1711.11023

and following the guide at:

http://www.camdial.org/pydial/benchmarks/

so that you can reproduce the results.

Tests with the new domain¶

With a high probability you will have to make changes to the model when you will try to apply it to the new domain. Neural network models are highly susceptible to the changes in architecture and training hyperparameters. That is why, we added also a script that can help with finding right parameter set. The script is at:

cd scripts/gridengine/paramsearch/runScript.py

The runScript.py is a wrapper over the repo. It enables to quickly run all policy models with different set-up of parameters. Inside the script you can specify the range of tested paramateres, number of training or testing dialogues, schedule of exploration and many other, model-specific hyperparameters.

The most influential parameters that are shared across all architectures are discussed below:

learning_rate - all models are using currently ADAM optimizer however starting value of learning rate can heavily influence the results
capacity_vary - number of dialogues that is stored in the buffer, wrong values can lead to overfitting,
epsilon_s_e_vary - annealing of the epsilon-greedy policy, too small value at the start of the training leads to overfitting,
h1_size and h2_size - sizes of hidden layers that needs to be adapted to given domain

Importance sampling mechanism for A2C and ENAC is implemented however is highly unstable. To turn it on you have set importance_sampling parameter to True.

To run training just execute:

python runScript.py

and then to test:

python runScript.py --test

Results can be quickly then parsed by:

python parseResults_all.py gRun tra_ no_of_runs .log no_of_models