Introduction to PyDial¶

In this short tutorial you will install and explore the most important parts of PyDial so that you can start using and extending the existing architecture.

Content:¶

Installation
Interacting with the system
Training a policy

Installation ¶

Firstly, install the repository on your disk and cd to it:

git clone https://bitbucket.org/dialoguesystems/pydial.git
cd cued-pydial

Now, ensure that you use the requirements.txt file to install the appropriate dependencies via pip. You can install pip using easy_install. It is suggested that all packages are installed in your local directory by including the --user option:

sudo easy_install pip
pip install --user -r requirements.txt

Note: PyDial doesn't depend on many repositories. Nevertheless, it is suggested that the repo and its dependencies are installed in a virtual environment (see http://docs.python-guide.org/en/latest/dev/virtualenvs/).

Interacting with the system ¶

Let's start by exploring the basic PyDial system using the pydial.py script. This script is a command-line driven wrapper that enables you to chat, train and test architectures, and plot performance data in PyDial.

Firstly, change the permission for the script pydial.py to make it executable and add a symbolic link to it (eg. pydial) from your local bin directory so you can run the script from the terminal.

chmod 700 pydial.py
ln -s /path/to/repository/cued-pydial/pydial.py /usr/local/bin/pydial

To have a chat session run:

pydial chat config/Tut-hdc-CamInfo.cfg

where the configuration file specifies all the parameters needed to run the system. Now you can get to know your favourite restaurant in Cambridge or book a room in some hotel. An example of dialogue is shown below:

> pydial chat config/Tut-hdc-CamInfo.cfg
*** Chatting with policies ['CamRestaurants', 'CamHotels']: ***
Prompt > Hello, welcome to the Cambridge Multi- Domain dialogue system. How may I help you?.
User   > I want a cheap hotel in the centre of town.
Prompt > I am sorry but there is no place, is a hotel, in the cheap price range and in the area of centre.
User   > Ok, how about a moderately priced hotel?
Prompt > Cityroomz is a nice place, it is in the moderate price range , it is a hotel and it is in the centre of town.
User   > What is the phone number?
Prompt > Cityroomz is a nice place Their phone number is 01223304050.
User   > I would also like a cheap chinese restaurant in the centre.
Prompt > Golden house is a nice place, it is in the centre of town, it is in the cheap price range and it serves chinese food.
User   > What is the address?
Prompt > Golden house is a nice place. Their address is 12 Lensfield Road, City Centre.
User   > Goodbye.
Prompt > Thank you, goodbye.

The conversation is divided into turns. Each turn consists of a system act which is converted to a prompt by the generation module, followed by a user response which is converted to a user dialogue act by the semantic decoder. A more detailed trace of the dialogue is stored in the log file _tutoriallogs/Tut-hdc-CamInfo-00.0.chat.log. The first few lines of this log file are shown below (truncated and reformated slightly to fit on a page):

 FlatOntologyManager.py <_set_ontology>126 :  Loading ontology: ontology/ontologies/CamRestaurants-rules.json
       FlatOntologyManager.py <_set_db>141 :  Loading database: ontology/ontologies/CamRestaurants-dbase.db
 FlatOntologyManager.py <_set_ontology>126 :  Loading ontology: ontology/ontologies/CamHotels-rules.json
       FlatOntologyManager.py <_set_db>141 :  Loading database: ontology/ontologies/CamHotels-dbase.db
                  pydial <chat_command>729 :  *** Chatting with policies ['CamRestaurants', 'CamHotels']: ***
               Agent.py <_hand_control>430 :  Launching Dialogue Manager for domain: topicmanager
                  Agent.py <start_call>178 :  >> NEW DIALOGUE SESSION. Number: 1
               Agent.py <_hand_control>447 :  Domain topicmanager is both already running and has control
                 Agent.py <_print_turn>554 :  ** Turn 0 **
              Agent.py <_print_sys_act>569 :  | Sys > hello()
                Agent.py <_agents_semo>653 :  Domain with CONTROL: topicmanager
                 Agent.py <_print_turn>554 :  ** Turn 1 **
    RuleTopicTrackers.py <infer_domain>151 :  CamHotels keyword found in: I want a cheap hotel in the centre of town.
         TopicTracking.py <track_topic>136 :  TopicTracker switched domains. From topicmanager to CamHotels
         TopicTracking.py <track_topic>142 :  After user_act - domain is now: CamHotels
               Agent.py <_hand_control>430 :  Launching Dialogue Manager for domain: CamHotels
                       SemI.py <decode>195 :  [(inform(area=centre)|inform(kind=hotel)|inform(pricerange=cheap)|
                                              inform(type=placetostay)', 1.0)]
     SemI.py <_add_context_to_user_act>254 :  Possibly adding context to user semi hyps: [(inform(area=centre)|inform(
                                              kind=hotel)|inform(pricerange=cheap)|inform(type=placetostay)', 1.0)]
ModSemBeliefTrack.p<update_belief_state>68 :  SemI   > [(inform(area=centre)|inform(kind=hotel)|
                                              inform(pricerange=cheap)|inform(type=placetostay)', 1.0)]
              Agent.py <_print_sys_act>569 :  | Sys > inform(name=none, kind='hotel',pricerange='cheap',area='centre')
                Agent.py <_agents_semo>653 :  Domain with CONTROL: CamHotels
                 Agent.py <_print_turn>554 :  ** Turn 2 **
....

Training a policy ¶

The basic system demonstrated above uses very simple hand-crafted components and rule-based dialogue policies. PyDial.py provides basic functions for training and testing a policy and analysing the results.

Learning ¶

The dialogue may be seen as a control problem where given a distribution over possible belief states we need to take some action which determines what the system will do next. We can learn a good policy for making these decisions by defining a reward function and then using reinforcement learning to find a policy which maximises the total reward accumulated in each dialogue. One algorithm for doing this is GP-SARSA (for a detailed explanation see the Policy module tutorial).

Lets train a GP policy for the Cambridge Restaurants domain:

pydial train config/Tut-gp-CamRestaurants.cfg

This command will start up the PyDial simulator and GP trainer according to the parameters set in the config file Tut-gp-CamRestaurants.cfg for more details on config files see below. In this case, the config is set to train the policy in 5 batches of 100 dialogs per batch at zero error rate. Each batch is tested (with exploration turned off) before moving to the next batch. Depending on the speed of your computer, this should take only a few minutes. When it has finished, inspect the directory _tutorialpolicies, it will contain 5 pairs of files, one for each batch. Each pair contains a dictionary of training points (.dct) and a set of parameters (.prm). The filenames include the error rate that the policy was trained on and the batch iteration number.

Most of the parameters set in the [exec_config] section of the config file can be overwritten on the command line. For example, the following will train another GP policy but this time with a 10% simulated error rate:

pydial train config/Tut-gp-CamRestaurants.cfg --trainerrorrate=10

You could also extend a previous training run by adding, say another 2 batches:

pydial train config/Tut-gp-CamRestaurants.cfg --trainsourceiteration=5 --numtrainbatches=2

where the --trainsourceiteration option sets the batch to use as the input for the new training run.

Visualising the results ¶

After training, it is useful to see how the reward function and success rate increases as a function of the number of training dialogues. The pydial plot command will scan log files and produce a composite plot for each policy. In the case of log files produced by a training run, it will plot the reward and success rate as a function of the total training dialogues:

pydial plot _tutoriallogs/*train*

which should produce a plot something like the following:

The GP framework enables an effective policy to be trained with relatively few dialogs. Already after 500 dialogues, the 0% error rate policy achieved 85% success rate using the default kernel parameters.

All policy information is stored in poldir. Since pydial overrides some config params, the actual configs used for each run are recorded in cfgdir.

Once trained, a policy can be explicitly tested using the pydial test command. This runs the same test as used for testing batches during training, however, running separate tests allows performance to be tested with different parameters. For example:

pydial test config/Tut-gp-CamRestaurants.cfg 5 --trainerrorrate=0 --testerrorrate='(0,50,10)'
pydial test config/Tut-gp-CamRestaurants.cfg 5 --trainerrorrate=10 --testerrorrate='(0,50,10)'

will test the 5th iteration of the policy trained at 0% errors over the range of error rates: 0%, 10%, 20%, ... ,50% and then do the same for the policy trained at 10% errors. The results can again be viewed by the plot commmand but this time picking up the log files containing eval rather than train:

pydial plot _tutoriallogs/*eval* --printtab

which should produce a plot something like the following:

in this case, the option to print a table of results is also included and should result the following table as well as the graph:

Tut: Performance vs Error Rate
Reward                          0           10           20           30           40           50
Tut-gp-CamRestaurants-00.5 :  10.0 +- 0.9   9.8 +- 1.0   7.7 +- 1.1   4.1 +- 1.2   4.0 +- 1.2  -0.1 +- 1.2
Tut-gp-CamRestaurants-10.5 :   9.6 +- 1.0   8.2 +- 1.1   6.1 +- 1.2   4.1 +- 1.2   1.3 +- 1.1  -0.9 +- 1.1

Success                         0           10           20           30           40           50
Tut-gp-CamRestaurants-00.5 :  85.7 +- 4.0  84.7 +- 4.1  76.3 +- 4.8  61.7 +- 5.5  59.7 +- 5.6  41.0 +- 5.6
Tut-gp-CamRestaurants-10.5 :  82.0 +- 4.4  76.3 +- 4.8  66.7 +- 5.4  55.7 +- 5.6  41.7 +- 5.6  30.7 +- 5.2

Turns                           0           10           20           30           40           50
Tut-gp-CamRestaurants-00.5 :   7.1 +- 0.4   7.1 +- 0.4   7.6 +- 0.4   8.2 +- 0.4   8.0 +- 0.4   8.3 +- 0.4
Tut-gp-CamRestaurants-10.5 :   6.8 +- 0.4   7.1 +- 0.3   7.2 +- 0.3   7.0 +- 0.3   7.1 +- 0.3   7.0 +- 0.3

Note that the graph itself can be suppressed using the --noplot option, this is useful when running pydial without a graphic display.

The Configuration File ¶

As will be apparent, pydial.py relies on the provision of a configuration file to set up the many differing options in the PyDial system In fact, virtually all uses of PyDial require a configuration file. This file is global (there is only ever one config file) and it follows the standard python config file format: i.e. one option=value per line, broken into sections headed by a section name in square brackets. The [GENERAL] section provides several key global variables:

[GENERAL]
 singledomain = True                           # turn off multi-domain handling ...
 domains = CamRestaurants,CamHotels, ....      # or list of possible domains
 tracedialog = 0                               # set trace level to 0,1 or 2
 seed = 12345                                  # set to make simulation reproducible

PyDial runs in either single or multi-domain mode. In multi-domain mode, the option variable $\verb|domains|$ provides a list of all possible domains. The $\verb|tracedialog|$ variable sets the console trace level where 0=off, and 2=verbose. Tracing is distinct from logging, and its prime use is to control the display of dialogue turns in simulation mode. Logging provides a much more detailed output at 4 possible levels: error, warn, info, debug. Logging to a file and to the screen is controlled independently. For example, in the following, the system will output in colour all logging levels to the screen (ie errors, warnings, info, and debug messages), but only output errors and warnings to the log file:

[logging]
usecolor = True
screen_level = debug
file_level = warn
file = debug.log

The operation of the pydial.py command line handler is controlled by a section labeled [exec_config]. The various options are listed below with typical values, most of which have been introduced in the sections above.

[exec_config]
domain = CamRestaurants     # specific train/test domain
policytype = gp             # type of policy to train/test
configdir = cfgdir          # folder to store configs
logfiledir = logdir         # folder to store logfiles
numtrainbatches = 2         # num training batches (iterations)
traindialogsperbatch = 10   # num dialogs per batch
numbatchtestdialogs =  100  # num dialogs to eval each batch
trainsourceiteration = 0    # iteration of source policy to update
testiteration = 1           # policy iteration to test
numtestdialogs =  100       # num dialogs per test
trainerrorrate = 0          # train error rate in %
testerrorrate  = 0          # test error rate in %
testeverybatch = True       # enable batch testing

Note that in actual config files, comments introduced by a # symbol are not allowed on the same line as an option variable but must instead be on a line of their own.

For details of all of the many other configuration variables, see the relevant tutorials and the Configuration Variable Dictionary. The file OPTIONS.cfg will give an overview over all possible configuration options.