stut-it Martin Stut - Ideas for an Expert System for IT Helpdesk

By Martin Stut, 2012-02-04

In my spare time, I'm helping a christian charity that runs their own IT helpdesk for about 1000 users, distributed across much of the globe. This helpdesk service is greatly appreciated by the users, but it is also absorbing (too) much of the capacity of the IT team.

Given that number of users, similar issues tend to recur. So the idea of a pre-screening website came up, that would ask certain questions and point to applicable instructions - or to a human (i.e. mail to RT), if there is no standard answer for the user's situation.

In the computer science world, such a system is called an "expert system", because it tries to mimic an expert. While looking for such a system for helpdesk use, I found surprisingly little that would match:

Existing Products

http://code.google.com/p/interactive-decision-tree/ : a very simple editor (PHP, creating an XML file) and viewer (HTML/JS/CSS, reading the XML file). Nice, working, but not flexible enough for the complexities of real IT issues.
CLIPS is a popular expert system, created by NASA, but almost without a web interface. The PHLIPS project provides a PHP extension, so PHP gets a basic interface to a CLIPS environment. Unfortunately, this only works reliably with Apache 1.3, because the multi-threaded architecture of Apache 2 causes stability issues with the single-threaded CLIPS architecture.
D3web is a huge Java framework, requiring a lot of effort to get a system going. Probably too complex for that charity.

All other helpdesk software I found was about handling tickets by humans, not solving the underlying issue by a machine.

Perhaps I just didn't use the right search terms. Does anybody out there have an idea, what search terms would lead to expert system backed helpdesk or troubleshooting systems?

So without a web-ready expert system shell available, I started thinking how I'd create one. So far I have not written a single line of code, but the idea might inspire others to do it - or to point me to someone who has done something similar.

Client (Customer) Interface

The user (client, customer, person having a problem and looking for help) enters the website of (starts a session with) the expert system, e.g. helpdesk.example.com.
The user selects the area of trouble, e.g. "I can't retrieve my e-mail".
The system starts asking yes/no questions, e.g. "Are you sure that your Internet connection really works?", "Has it ever worked after October 2011 (when we've changed servers)?".
The user answers the question by selecting "yes", "no", "I don't know", plus possibly a comment field (e.g. explaining why the user doesn't know the answer). Or maybe the user can choose "Could you please rephrase that question?".
Depending on the user's answers, the system asks more questions.
When the system is reasonably sure what the underlying issue is, it tells the user and shows proper instructions.
When the system has to give up (nothing more to ask, too few precise answers from the user) , it sends the log of the interaction to a human helpdesk technician.

Representing the Knowledge

The underlying expert knowledge needs to be stored in some data structure, residing e.g. in a database like MySQL. Here is my idea of how to represent that expert knowledge in a way that's ready to use for a machine (program):

Fact

A short statement, represented by a short string ("atom" in LISP or Ruby parlance), e.g. mail_works, worked_after_server_change or internet_connection_ok.

For each session (user problem instance), each fact has a state of "true" (either by user input or by logical inference [see rules below]), "false" (dto.), "user doesn't know" or "not yet considered".

The state of the interaction is basically the combined state of all facts.

Certain states of certain facts are solution candidates: If e.g. internet_connection_ok becomes known to be false, while looking for the reason of not (mail_works), then there is only one way out: "Go, fix your Internet connection".

So the full data structure (database schema) of a fact is

ID (number for handling in the database)
atom_string (one word, suitable for an atom in an underlying expert system)
is_solution: whether and when this fact is a solution: "always", "when_true", "when_false", "never"; the description should tell the user why.
for each session (possibly in PHP session storage): state
- user_input: one of "not yet asked", "true", "false", "don't know"
- user_comment: thoughts of the user, why he can't tell or why he thinks it is the way he wrote
- deduction_result: one of "not yet determined", "true", "false", "don't know". May need to be changed, if the user changes his input at a later stage.

Rule

A description of a material implication, which combinations of facts determine the state of another fact, e.g.

not (internet_connection_ok) => not (mail_works)

account_settings_outdated => not (worked_after_server_change)

This needs to be handled with the full care and can be used with the full power of propositional logic.

a => b can be pronounced as "a implies b", or if a is true, then (it can be implied with mathematical certainty that) b must also be true. If a is not true, then nothing is known about b. If b is true, then nothing is known about a, because b can be true for other reasons than a.
In the above example: if the internet connection is broken, then mail will certainly not work. But "mail not working" can be true for many other reasons than a broken internet connection.

By the rules of mathematical logic (the material implication (=>) works that way), a => b is fully equivalent to not (b) => not (a), so if b is false, then a must be false too, because if a were true, b would also have to be true, which it isn't.
Applied to the above example, another way equivalent to put the rule is
mail_works => internet_connection_ok
In other words, if you know that your mail works, you can imply that your internet connection is working too.

It may require more than one condition to be able to infer a fact. So the rules need to have the option of at least two conditions, each of them negate-able, and a link operator (AND, OR, perhaps XOR); the result may be negated, so the full data structure of a rule is:

ID: for identifying the rule in the database
short_name: to keep a list of reasons short
long_description: human readable sentences what this rule is really saying and why it is true; optional but highly recommended
cond1_fact: ID of fact
cond1_negate: true or false; true means "negate fact1"
cond2_fact: ID of fact, may be empty if the rule depends on only one single fact
cond2_negate: true or false; true means "negate fact2"
operator: "AND", "OR", "XOR"; may be empty if cond2_fact is empty
result_fact: ID of the fact that's being set by this rule.
result_negate: true or false; true means "set the fact to false if the condition(s) are true"

Question

Somehow the facts need to be given to the system. A machine is perfectly happy dealing with an atom-like fact, but the average end-user (typically an expert in theology, not technology) would feel being treated rude if being asked "worked_after_server_change ? yes/no/don't know".

So for each fact there should be one (or more, see below) longer string containing a question, e.g. "Are you sure that your Internet connection works?" This is a question most end users can answer with yes or no. Sometimes there may be alternative wordings for the same question. So there may be multiple questions for a single fact. One of these questions needs to be marked "preferred" (or get a numeric preference value, e.g. 1-100), so the system can know which one to present first.

Some of these alternative wordings may have the opposite meaning, e.g. "Are you currently experiencing trouble with your Internet connection, so you have problems visiting other websites too?". So another attribute per question needs to be "meaning reversed" (yes/no).

In a worldwide user group, people tend to prefer different languages. So for each fact there could (or rather should) be one question per language. The user could specify his preferred language(s) in his user profile on the helpdesk system, e.g. "native German, good English, very little Japanese".

So the full data structure of a question could be:

ID: to identify it within the database
fact_id: ID of the fact, the state of which would be set if the question were answered; there can be several questions for a single fact.
text: the string that's going to be presented as a question to the user
language: id of the language the text is in
meaning_reversed: yes or no; yes means that answering "yes" to this question should set the fact's state to "no"
level_of_preference: 0-10; higher preference questions will be suggested first to the user

Description

When the system describes what it has concluded (or what the user has entered), it should present its findings in understandable human language to the user. Because there are multiple languages (English for users, English for technicians, German, Spanish, ...), a single description field won't suffice. Additionally, a "yes" state and a "no" state may have quite different wordings. So there is need for a separate table of descriptions:

id: technical field for the database; possibly unneeded
fact_id: fact this description refers to
language_id: language this description is written in
fact_state: "true", "false", "don't know"
text: the description of the fact when being in the specified state

Rule (Inference) Engine

Given all these (database) records about the state (within the session) of facts, rules and questions, the core logic of the expert system (I'll call it "rule engine") needs to decide, which question to ask the user next. Because each question is closely linked to exactly one fact, the task can be reworded to "determine, which fact to consider next".

Core parts of the algorithm could be:

Facts with known state (yes, no, don't know) don't need to be reconsidered.
For each rule, check whether the rule could trigger (left side of the "=>" evaluates to true, right side is unknown or yet unconsidered) using the known facts. This method is called "forward chaining". Enhancement: use the reversal law for rules having only one fact on the left side, i.e. right side of the "=>" evaluates to false. Further enhancement: do so with rules having a known right side fact and one of two known left side facts, if the states, negators and link operators permit to draw a conclusion.
- If a rule can be triggered, trigger it, thus putting more facts into a known state.
For each rule, check whether it could trigger if just one (or, in "desperate mode", two) additional fact(s) would be known.
- take note of this "interesting" facts. If a fact becomes interesting in several rules, increase its score. If a fact would only be useful if a second fact is also known, then increase the score only by a little.
- select the fact that has occurred most often and present its question to the user; record the answer, set "unconsidered" to false.
- If no such fact (and corresponding rule) is found, give up, "can't solve the problem with so few supplied facts", send the log of the session to a human technician.
If a fact reaches a state where it becomes a solution (e.g. internet_connection_works becomes a known false), tell the user the solution and the path (facts and rules) that have led to the conclusion

Possible Extensions of the Basic Design

in rough order of complexity, easiest first:

Save the state of the sessions for several weeks, in order to enable the user to return later and continue the session, e.g. after having determined the answer to a hard question. Enable the user to supply modified (corrected) fact answers, especially for facts previously answered with "don't know", or in cases of "a second look has revealed that things actually are different".

Specify a preset probability for facts as an additional criterion for choosing them, e.g. "a lot of people forgot to change their account settings when we switched servers" would increase the preset probability of account_settings_outdated.

Specify an "easiness to find out" for facts. Facts that are easier to find out (for the end-user) will be asked first.

Multiple possible solutions: if a fact is not a definite solution, but only a possible one, don't terminate, but continue investigating.

Use fuzzy logic for fact values (values 0...1) instead of Boolean logic (false or true, nothing else).

Specify a probability for rules, e.g. "if (100% a and 100% b) then there is an 80% probability of c".