## Finite-State Machines

Click here for important instructions on how to print this web page.

A finite-state machine (FSM) is an abstract model of a system (physical, biological, mechanical, electronic, or software). Key components are

• a finite number of states which represent the internal "memory" of the system by implicitly storing information about what has happened before.
• transitions which represent the "response" of the system to its environment. Transitions depend upon the current state of the machine as well as the current input and often result in a change of state.

Consider, for example, the use of an FSM to model an old-time soda machine that dispenses soda for 30 cents. The possible inputs to the machine are ` n - nickel, d - dime, q - quarter, s - select soda.`

The states of the machine are designated by circles, each labeled with the amount of money that has been deposited so far. State 00 is designated as the start or initial state by the incoming arrow. States which represent a total input of 30 or more cents are considered final states and are designated by double circles. The transitions from state to state are shown as arcs (lines with arrows). Each transition is labeled with the input that caused it. If a person puts a nickel into the machine followed by a dime followed by a quarter, the FSM would transition from state 00 to state 05 to state 15 to final state 40. At that point, he or she could select a soda (and hopefully receive 10 cents in change).

A finite-state machine can also be used to model a simple automated teller machine (ATM). In this case, each transition is labeled with an input coming from the user (such as `insert card`) or a central database (such as ```PIN OK``` versus `bad PIN`). The level of abstraction in this example is much greater than that of the soda machine. Each of the transitions may, in actuality, involve a number of complex steps which are not shown on the diagram. Thus in this case, the finite-state machine is being used as a system design tool, rather than as a detail-oriented description of the ATM's operation. Another application of finite-state machines is as a notation for the precise description of a programming language syntax. For example, real constants in Pascal are described in English by a leading programming text as follows:

A real number in PASCAL is written as a string of digits containing a decimal point. There must be at least one digit before and after the decimal point. Real data may also be represented using PASCAL scientific notation. A real data item written in scientific notation consists of a sign followed by a real number, followed by the letter E, another sign and an integer (+ signs may be omitted).

The English description is imprecise. Are the strings `.9` and `9.` valid, or do you have to say `0.9` and `9.0`, respectively?

The finite-state machine below gives a clear answer to this question. A valid real constant in Pascal is any string that leads the FSM from the start state to a final (double-circled) state. For the first string `.9`, whose first character is a decimal point `.`, the FSM stops in state 0 because only a digit or a sign will allow it to move to state 1. The second string `9.` causes the FSM to go to state 2 based on the first character 9 and to state 3 based on the second character which is a decimal point. It cannot go beyond state 3 since there are no more characters in the string. Since neither string causes the FSM to get to either final state 4 or 7, neither is valid. However, using the FSM with the strings `0.9` and `9.0` proves that these are indeed valid Pascal real constants. Note that `.9` and `9.` are valid in the C++ language.

### Text Processing with FSM's

Finite-state machines are often used in text processing. A simple example is a string search that takes place in an editor or in the `grep` utility, which is used to search a file for a particular pattern. The `grep` utility takes a string or regular expression and converts it to a finite-state machine before doing a search. To simplify this scenario, suppose a file consists of a's and b's only and the search is for the string "abba". The corresponding finite-state machine would look like this with the double circle indicating a successful conclusion to the search. If, for example, this FSM were used to locate the string "abba" in a file containing "aababbbabba...", it would move from state to state as follows:

```    Input:     a   a   b   a   b   b   b   a   b   b   a

State: 0   1   1   2   1   2   3   0   1   2   3   4
```
The states listed above are the states of the machine after the corresponding input. When the final state 4 is reached, the search is successful.

For now we will use FSM's to design text-processing programs. What follows is a simple case study to show how an FSM design can be developed and translated directly into a program.

Consider the following specifications:

• A word is a maximal sequence of upper- and lower-case letters. For example, in the string `Hello, Judy!`, the sequence `Judy!` is not a word because it includes `!` (not a letter), and the sequence `Jud` is not a word because it's not maximal (it does not contain all of the contiguous letters). However, `Judy` is a word. In the string `Goodbe now.`, `Goodbe` is considered a word(by our definition) because it is maximal and contains only letters, even though it should be spelled `Goodbye`.
• `wc` is the word count, initially 0
• `lc` is the line count, initially 0
• `cc` is the character count, initially 0
The following FSM both clarifies the definition of a word (which might have been ambiguous without the examples) and gives a design for a program that counts words, lines, and characters in a text file. Each transition is labeled with an input character (or "other", which means any character that does not appear as a label on a transition from that state) and a corresponding action. For example, the transition labelled "A-Za-z `++wc,++cc`" from state 0 to state 1 means "when the current state is 0 and the next input character is a letter, enter state 1 and increment both `wc` and `cc`". In state 0, the FSM remembers that we're not currently in the middle of a word, while state 1 remembers that we are. Another way to depict an FSM uses a table instead of a "bubble diagram". Each row of the table corresponds to a state and each column to a possible input symbol (or category of input symbols). The table method is useful for a comment at the beginning of a file that implements a finite-state machine. The table version for the word counting FSM is given below (as it would appear in a comment). Each table entry shows the new state the machine would enter, based on its current state and current input, and the action(s) that would be taken.

```// STATE      |     A-Za-z    |     \n        |  other   |
// -------------------------------------------------------
//    0       | 1 : ++wc,++cc | 0 : ++lc,++cc | 0 : ++cc |
// -------------------------------------------------------
//    1       | 1 : ++cc      | 0 : ++lc,++cc | 0 : ++cc |
// -------------------------------------------------------
```
A standard idiom for translating an FSM into a program is the while-switch idiom. A `while` loop gets one character at a time from the input stream. Inside the loop is a `switch` statement that causes different code to be executed based on the current state.

Our example is rendered as follows. Input to the program could come from the keyboard or from a file using redirection (```eos% a.out < filename```).

```#include <iostream.h>    // cin, cout
#include <ctype.h>       // isalpha()
int main()
{
int wc = 0, lc = 0, cc = 0;
char ch;
int state = 0;

while ( cin.get(ch) ) {
switch (state) {
case 0: if ( '\n' == ch ) {
++lc;
++cc;
}
else if ( isalpha(ch) ) {
state = 1;
++wc;
++cc;
}
else
++cc;
break;
case 1: if ( '\n' == ch ) {
state = 0;
++lc;
++cc;
}
else if ( isalpha(ch) )
++cc;
else {
state = 0;
++cc;
}
}
}
cout << lc << "\t" << wc << "\t" << cc << endl;
}
```
Within each case of the switch, a decision is made based on the classification of the current input character. The program can be simplified considerably by recognizing the following facts about the FSM:
• `++cc` is done on every transition
• whenever the current input is `\n` we do `++lc` and the state either remains or becomes 0
• the only other change in state or in values of variables occurs when a letter is encountered in state 0 or a character other than a letter or `\n` is encountered in state 1
The simplified program looks like:
```#include <iostream.h>    // cin, cout
#include <ctype.h>       // isalpha()
int main()
{
int wc = 0, lc = 0, cc = 0;
char ch;
int state = 0;

while ( cin.get(ch) ) {
++cc;
if ( '\n' == ch ) {
++lc;
state = 0;
}
else if ( isalpha(ch) && 0 == state ) {
state = 1;
++wc;
}
else if ( !isalpha(ch) && 1 == state)
state = 0;
}
cout << lc << "\t" << wc << "\t" << cc << endl;
}
```

This simplified version of the program is included solely to point out that it may not be obvious that a particular piece of code is actually implementing a finite-state machine. For now, it is best to stick with the more straight forward original implementation.

A slightly more complicated use of a finite-state machine is to output each of the words contained in a C++ source file while excluding any that occur in comments. For this example, a word will be defined as any string of alphabetic characters. Comments are defined as strings that begin with the characters `//` and end with the new line character `'\n'`. The FSM along with the program used to accomplish this task is as follows. ```// TABLE form of FSM:
//
// state      |   A-Za-z   |      /     |     \n     |   other    |
//-----------------------------------------------------------------
// IGNORE     |  IN_WORD   | ONE_SLASH  |   IGNORE   |  IGNORE    |
//-----------------------------------------------------------------
// IN_WORD    |  IN_WORD   | ONE_SLASH  |   IGNORE   |  IGNORE    |
//-----------------------------------------------------------------
// ONE_SLASH  |  IN_WORD   | IN_COMMENT |   IGNORE   |  IGNORE    |
//-----------------------------------------------------------------
// IN_COMMENT | IN_COMMENT | IN_COMMENT |   IGNORE   | IN_COMMENT |
//-----------------------------------------------------------------
// words are output on transitions out of the IN_WORD state

#include <iostream.h>
#include <ctype.h>           //isalpha

int main()
{
enum { IGNORE, IN_WORD, ONE_SLASH, IN_COMMENT };
int  state = IGNORE;
char buf;
int  pos;
char ch;

while (cin.get(ch)) {
switch (state) {
case IGNORE:
if  (isalpha(ch)) {
pos = 0;
buf[pos++] = ch;
state = IN_WORD;
}
else if ('/' == ch  )
state = ONE_SLASH;
break;
case IN_WORD:
if   (isalpha(ch)) {
buf[pos++] = ch;
state = IN_WORD;
}
else {
buf[pos] = '\0';
cout << buf << endl;
if ('/' == ch)
state = ONE_SLASH;
else
state = IGNORE;
}
break;
case ONE_SLASH:
if ('/' == ch)
state = IN_COMMENT;
else if (isalpha(ch)) {
pos = 0;
buf[pos++] = ch;
state = IN_WORD;
}
else
state = IGNORE;
break;
case IN_COMMENT:
if ('\n' == ch)
state = IGNORE;
break;
}
}
}

```

### Horner's Rule

A finite-state machine can also be used to convert an ASCII string of characters representing a real number to its actual numerical value. It performs the same operation as the `atof`(ASCII to floating point number) function and uses an algorithm known as Horner's Rule.

The letters shown in the FSM stand for the following:

```c - current character
s - sign of the number
v - value of the number
p - power
```
Note that this FSM assumes that the string contains a valid floating point number that starts with an optional + or -, has at least one digit, an optional decimal point, and any number (including 0) of digits before and after the decimal point. Like the `atof` function, a value of 0 is returned if an invalid string is encountered. What follows is a function that implements the above finite-state machine. Since we're dealing with a string stored in an array (instead of an input stream), the method for getting the next character and the corresponding loop control are modified accordingly.

```
// TABLE form of FSM:
//
// state      |  + or -  |     .    |    digit    |  other  |
//------------------------------------------------------------
// START      |  INTEGER | DECIMAL  |   INTEGER   |  ERROR  |
//------------------------------------------------------------
// INTEGER    |  ERROR   | DECIMAL  |   INTEGER   |  ERROR  |
//------------------------------------------------------------
// DECIMAL    |  ERROR   | ERROR    |   DECIMAL   |  ERROR  |
//-----------------------------------------------------------
//

#include <ctype.h>       //isdigit
#include <iostream.h>    //cerr

double string_to_number(char* string)

{
double sign = 1;
double value = 0;
double power;
int i = 0;
enum {START, INTEGER, DECIMAL, ERROR};
int state = START;
char ch;              //current character in string

while ((state != ERROR) && (ch = string[i++])) {

switch (state) {

case START:      if ('.' == ch) {
power = .1;
state = DECIMAL;
}
else if ('-' == ch) {
sign = -1.;
state = INTEGER;
}
else if ('+' == ch)
state = INTEGER;

else if (isdigit(ch)) {
value = ch - '0';
state = INTEGER;
}
else
state = ERROR;

break;

case INTEGER:    if ('.' == ch) {
power = .1;
state = DECIMAL;
}
else if (isdigit(ch))
value = 10.*value + (ch - '0');
else {
value = 0.;
state = ERROR;
}
break;

case DECIMAL:    if (isdigit(ch)) {
value += power * (ch - '0');
power /= 10.;
}
else {
value = 0.;
state = ERROR;
}
break;
default: cerr << "Should not be here!" << endl;
}
}

return sign * value;
}

```

### Summary

1. Finite-state machines can be used to model the interaction between a system and its environment.
2. The state of a FSM is a way of remembering what has occurred so far. In addition to the FSM state, there may be variables that remember other details. The designer has to use judgement to decide what to model with a FSM state and what to leave as a variable.
3. A transition occurs when an event in the environment causes the system to change state (either the FSM state or variables).
4. A FSM can be depicted either by a bubble diagram or a transition table.
5. In text stream applications each transition corresponds to a single input character.
6. The while-switch idiom gives a method for mechanically translating a FSM to a program. Simplifications can be made by taking advantage of special features of the FSM.