Semantic decoding system is responsible for decoding the meaning of utterances sent to the system. Often dialogue systems are deployed in a noisy setting, where background noise and diverse user populations can result in high speech recognition error rates. Thus, SLU systems operate on the top $k$ hypotheses output by the speech recogniser and this creates additional challenges for effective transformation of spoken utterances to dialogue acts. Let's look at the illustrative procedure:
Dialogue turn: Is there maybe a cheap place in the north of town please? Dialogue act type: inform Semantic slots: price, area Semantic values: cheap, north Dialogue act: inform(price=cheap, area=north)
As we can see, during the semantic decoding process we extract the information about dialogue act type, which encodes the system or the user intention in a dialogue turn, and semantic slots and values that further describe entities from the ontology that a dialogue turn refers to.
We can tackle this problem from two perspectives - first one involves creation of list of grammar rules that parse user utterances. Second approach relies on statistically trained models where we train classifiers to directly label utterances based on training data.
A natural solution to the problem is to create a set of rules that transforms the utterance to the dialogue act based on the domain ontology. In the case of the general communication with the system, they might look like this:
self.rHELLO = "(\b|^|\ )(hi|hello)\s" self.rNEG = "(\b|^|\ )(no\b|wrong|incorrect|error)|not\ (true|correct|right)\s" self.rAFFIRM = "(yes|ok\b|OK\b|okay|sure|(that('?s| is) )?right)" self.rBYE = "(\b|^|\ )(bye|goodbye|that'*s*\ (is\ )*all)(\s|$|\ )" self.GREAT = "(great|good|awesome)" self.HELPFUL = "(that((\')?s|\ (is|was))\ (very\ )?helpful)"
More specific expressions depend completely on the domain that we are operating on. A general framework for implementation rule-based system can be derived from
You can find specific scripts with regular expression for every domain in folder
semi/RegexSemI_DOMAIN. In order to test this model you can just run:
pydial chat config/texthub.cfg
To use it in your specific config file you have to change in
semi section the variable
RegexSemI, for example:
[semi_CamRestaurants] semitype = RegexSemI
As it was mentioned previously, we can look at semantic decoding as a classification task in which we need to make predictions about the set of semantic concepts. However, this requires corpora data with labelled semantic concepts. Such an approach can easily handle $N$-best list of automatic speech recogniser hypotheses as in real conversational systems error rate of the top hypothesis is typically about 20-30% and thus robust system is needed.
Classification model that is implemented in PyDial is support vector machines which map input into a high-dimensional feature space where data is linearly separable. It was shown that such statistical approach can substantially improve performance both in terms of accuracy and overall dialogue reward.
To use this approach with your specific config file you have to change in
semi section the variable
SVMSemI, for example:
[semi_CamRestaurants] semitype = SVMSemI
You can test it with just chatting with the system through pydial:
pydial chat config/texthub_svm.cfg
In PyDial there is a dedicated module for spoken language understanding under directory
semi (semantic input). The interface class is
SemI.py from which models explained above derive from. SVM model is implemented in
SVMSemI.py while base class for rule-based model is in
RegexSemI.py with specific instances for a given domain in