Communications Strategies for Flexible Manufacturing

COMMUNICATIONS STRATEGIES FOR FLEXIBLE MANUFACTURING
A MAP IMPLEMENTATION

Edward Lipchus
itp boston, inc.
U.S.A.

The accuracy and timeliness of message transmissions in a high volume FCS (factory control system) is critical. Part of the solution can be the use of fault-tolerant hardware and MAP, but additional measures must be taken.

This paper described the success of one major automotive factory FMS specifically designed to meet this communication challenge. The elements of the design are discussed, the communication system architecture is outlined, and the unique approach is highlighted.

The system will be processing transaction volumes of 200,000 messages per 8 hour shift between 40 MAP nodes. An inverted interface structure allows the network to control the application modules and thus insure efficient message transmission and database processing.

This paper was presented at AIM TECH '87, March29 - April 1, 1987, Sheraton Tara Castle Hotel, Framingham MA, for the Association for Integrated Manufacturing Technology.

Introduction
MAP

MAP Application Layer

MAP Application Layer Commands

MAP Applications Layer Limitations

MAP Transaction Interface

MAP Transaction Interface Overview

MAP Transaction Interface Design Goals

MAP Transaction Interface Structure

MAP Transaction Interface Features

INTRODUCTION

This paper discusses the MAP Transaction Interface (MTI), an interface between a MAP (Manufacturing Automation Protocol) network and application programs for a factory control system. The factory control system was developed by itp boston, inc. of Cambridge, Massachusetts. The interface was developed by the author as part of that system.

The purpose of the factory control system is to operate a fully automated automobile parts factory. The factory is a large (80,000 square feet) facility with over forty work cells. When completed, the factory will be capable of running without human intervention. It is expected that at least one full shift a day will be run in this manner. The function of ITP boston was to act as system development coordinators and to provide the overall control software for the factory.

After discussing MAP and the Application Layer facilities, the paper will discuss why an interface was needed, the general structure that was used, and some of the features it provides. The conclusion points out some of the specific successes, problems and future issues that have arisen from this project.

Since the MAP Transaction Interface (MTI) is a piece of computer software, some of the discussion will be moderately technical from a computer programming point of view. It is hoped that the non-programming reader can see some of the problems that were encountered by the programming staff and some approaches that work.

Several aspects of the environment were fixed before design work started. A Stratus XA600, a fault-tolerant computer, would be the computer for the factory control programs. Oracle, a relational database handler, would be used for all file accesses. For portability, all code would be written in C and run under Unix. MAP would be used for all communications, using an implementation of the Level 7 Application Layer developed by Stratus Computer according to Appendix 13 of the MAP 2.1 specification.

Transactions are assumed to contain MMFS (Manufacturing Message Format Standard) format data. See Appendix 6A of the MAP 2.1 specification for details of this manner of encoding data.

The "normal" scenario for communication between two processes, whether they are on the same computer or not, is fairly simple. Figure 1 gives a sample scenario.

message scenario

Figure 1. Message Scenario

This sequence of events is relatively easy to implement, if there is no particular urgency in communicating the message, nor constraints on resources

This communications system developed by itp boston for the factory control system required the ability to handle heavy traffic, with severe requirements for reliable transmission. The equipment and system software imposed strong restrictions on the design.

The system to be developed would interconnect 40 to 50 manufacturing cells 7 major factory control processes. Computers from at least four different vendors would be used. Transaction volume was expected to be 200,000 transactions per 8-hour shift, though transactions would be relatively small.

MAP

In general, communications systems fall into 2 categories — packet oriented and connection oriented.

Packet oriented systems are analogous to the mail system, where a third party handles data transmission between two communicating entities. There is no direct interaction between the two parties.

A connection oriented system is analogous to the telephone system. The two parties communicating directly participate. For example, consider what is done to call a person with a telephone. You must be part of the telephone system. You dial a special number specific to the desired party, wait for them to pick up the phone, and are able to talk only if they complete the connection by picking up their phone. Once the connection is made, communications is bi-directional (both parties can talk at the same time) and the communications facilities are reserved for that connection until one of the parties hangs up.

MAP is a connection oriented protocol and its functions are strongly similar to those of the telephone system.

The MAP Application Layer is the top layer of the 7 level ISO model for data communications (see Figure 2).

MAP architecture

Figure 2. MAP Architecture - Communication Layers

The lower layers deal with the physical aspects of communications; the Application Layer provides tools to maintain communications from a logical, more application-oriented viewpoint.

The Application Layer uses 10 primitives for communicating:

Most of the commands are available in either synchronous or asynchronous modes.

Synchronous commands do not complete until the underlying function is complete. For example, when a synchronous receive returns, either a message has been read into the user’s receiving area or there was no message to receive.

An asynchronous command may not have completed its task when it returns. For example, an asynchronous receive will try to read a message, but if none is available, it will leave behind a "demon" to accept a message, while control is returned to the program which called it. The main program then can continue processing in the interim. The main program must periodically check MAP to see if any pending actions have completed.

MAP Application Layer

MAP Application Layer Commands

The following is a brief explanation of the MAP Application Layer commands and some of the problems that were encountered in their use.

Activate

Activate announces the user process to MAP. Referring to the telephone analogy this is like getting a phone number from the telephone company. To successfully activate, the activating process must be known to MAP; however, there is no security check beyond the MAP name used by the activating process. Even though the MAP network has a list of allowable MAP names available to the network director, a process is unknown to the network until it has been activated.

Connect

Connect initiates a two-way connection. This is like dialing the phone. A feature of the asynchronous version of this command is that it will return to the application process while the actual connection is pending if the called party does not answer. This greatly simplifies the timing of making a link - both the connecting and answering party can wait on the other. Be aware that this works only if the called party has activated themselves onto the network. If the called process is not fully active, the MAP network director has no knowledge of the existence of it. Thus, the director will end the connect attempt, saying that no session connection is available, even though the connection attempt might be a legitimate one.

Answer

Answer completes the connection. This is similar to picking up the phone when it rings. Just as a person has no idea who is calling when the phone rings, answer is not specific for any one connection; the command will respond to any attempt to connect to it.

Wait

Wait suspends the process until a MAP event occurs. This is analogous to waiting for the phone to ring or for the other party to say something. It is used in conjunction with asynchronous MAP commands. Wait and check IO are the only ways to determine the outcome of an asynchronous command. There is no ability to wait for operating system events - indeed, they are mutually exclusive.

Send

Send sends a message. This is like speaking over the phone. Be aware that completion of a send does not mean that the message has been received at the other end. It only means that MAP has successfully accepted the message and thinks it knows what to do with it. The message could still be lost.

Receive

Receive receives the message. The telephone system analogue is listening on the phone while someone speaks. When a receive command is issued, one of the parameters is the size of the receiving buffer. MAP has an excellent feature in that it will segment the incoming message so as not to overflow the buffer; however, it places the data into the buffer based only on a pointer to that buffer and the size passed to it. If the buffer is in fact smaller than that, MAP will overwrite adjacent memory.

Disconnect

Disconnect breaks a connection. The analogy is hanging up, though, unlike a telephone system, there may be outstanding messages in the MAP pipe. These messages are still available to the receiver. No more messages can be sent. It is the responsibility of each process to clear out any outstanding messages.

Abort

Abort also breaks a connection. The analogy is cutting the telephone wire. Any outstanding messages are lost.

Check Event

Check event is the companion, or other half of the wait—for—event command. If more than one command is associated with the same event flag, check event is the only means of distinguishing between them. When a wait detects a MAP event (for example, the completion of a previously—issued connect that had not immediately completed), it tells the user that something has happened. Check event tells the user what has happened. If more than one event associated with the same event flag has completed between the time the wait woke up and the check event call was issued, there is no indication that there is more than one outstanding command to be accepted. It is the responsibility of the user to interrogate each possible command that could have set the triggered event flag.

Check I/O

Check IO interrogates a pending (asynchronous) command to see if it has and if so, allows its completion. For example, if an asynchronous receive has finally gotten a message, the message is not yet available to the user program. Issuing a check 10 command causes the receive to place the received message into the input buffer. Anyone using this command must bear in mind that MAP blindly dumps the received data into that part of memory given to it back when the receive was issued; the check 10 may be a very different part of the program. The user must make sure that the receiving area that MAP had been given remains available until the receive completes.

MAP Applications Layer Limitations

These primitives certainly make a communications system easier to use — much more so than having to deal with physical attributes, such as an RS-232C requires. While the systems programmer undoubtedly will appreciate application programmer has even more simple needs. Simply put, the applications programmer just wants to send or receive to a named party.

Applications programmers have too many application details to worry about to have to concern themselves even with the reduced set of commands provided by the Applications Layer.

The problem is to provide to application Programmers a view that is just as simple as that mentioned above, while also providing additional services.

For example, completion of a send by a sending process does not mean that the message has been received at the other end. It only means that MAP has successfully accepted the message and thinks it knows what to do with it. The message could still be lost. This is one reason a MAP interface was provided. An interface can verify the receipt of messages and re—send them as necessary.

Maintenance of connections is another major reason for having an interface to MAP. Something is needed to handle all the details of figuring out when has gone down, archiving messages until the link is back up, re-establishing the connection, etc.

MAP Transaction Interface

MAP TRANSACTION INTERFACE OVERVIEW

The purpose of the MAP Transaction Interface is to provide to application modules an easy—to—use window into MAP. The aim is to remove from the application module any concerns about MAP or transmitting/receiving a on other than the data in the individual transaction and the destination of that transaction.

To do this, several services are provided that are invisible to the application module:

Connecting to MAP
Connecting to MAP nodes
Reconnecting to failed nodes
Disconnecting from nodes and MAP
Backup for selected messages sent and received
Automatic restore of selected unprocessed transactions on warm or cold restart.
Transaction (not application) acknowledgements
Automatic process rollback on failure (via commits)
Services to enhance performance
- backup buffering
- message transaction buffering

MAP Transaction Interface Design Goals

Absolute recoverability
Minimal disk i/o
High throughput (but not necessarily fast response time)
CPU efficiency
Time-sequentiality of transactions from any one source
Automatic accommodation of node failure
Flexibility in handling performance tuning

To implement programs that had to wait on both MAP and system events (such as terminal handlers), in those cases, the program was divided into two cooperating processes. One process is system—event driven, the other is MAP driven, both connected by a Unix pipe.

The behavior of the connect function had a serious impact on the manner in which connections were re-established by the MTI. We had hoped that an asynchronous connect would wait for failed processes to be restarted, but MAP will wait only for activated processes. As a consequence, re-connections are handled by a polling scheme.

MAP Transaction Interface Structure

The intuitive approach to communicating would have the application process running as the main process, calling server subroutines to handle data transmission and receipt. While simple in concept, this immediately presented several difficulties. Since communications events occur at logically different times in the cycle of an application, the application programs would have had to devote considerable attention to the details of communicating. Also, details of message backup, restoration, communications synchronization, and process restart, became very complex.

Most of these problems went away when the structure was inverted and a transaction—processing approach was used (see figure 3). All the details of communicating can now be outside the application module. The only thing of which an application module is aware is that it has been invoked to handle a message which is presented to it as a argument. It does not need to know how the message got there. The very fact that the routine has been called implies that a message has arrived and is ready to be processed. Moreover, since the entire system is transaction driven, the application modules represent state transitions in a factory control state machine. This simplified the overall design problem.

Figure 3. MAP Transaction Interface

Figure 4 shows the structure of the MAP Transaction Interface shell. The boxes marked AP (Application Process) are places where user code is invoked by the shell to process individual transactions.

Figure 4. MAP Transaction Interface Shell

The general structure of any process that uses MAP and the MAP Transaction Interface is that the MAP Transaction Interface wraps around the application routines and provides a shell of MAP services (see Figure 4). The shell processes MAP transactions and passes them to application modules.

MAP Transaction Interface Features

Message Priorities

Message priorities are not supported at this time, though the design of the Transaction Interface supports the concept of message priorities as a future enhancement

A problem to be aware of is that prioritizing transactions can result in transactions being processed out of the order in which they were generated.

MAP Events

If there are no incoming messages to be processed, the MAP Transaction Interface will go into a wait state. It will stay asleep until MAP wakes it up to handle the completion of some previously pending event. Note that the MAP event flags used to awaken the MTI are not operating system event flags. It is hoped that eventually there will be a way to respond to system and MAP events through a single facility. This facility does not exist yet, so, at this time, application modules must not do operating system wait-for-event calls.

There is a facility to trigger application processing at specific times. As part of the implementation of the MTI, a special node was provided to handle "wake-up calls". An application process that wishes to be triggered at a certain time sends a transaction via MAP to that node. At the designated time, the special node will send a transaction back to the application module. Receipt of the message triggers application processing. It is the responsibility of each person implementing a MAP node with the MTI to make sure there is an appropriate module to handle each transaction.

Processing Control

The general statement of MAP node processing is that the shell reads as many incoming transactions as it can, stacking them up. Each message on the stack is then presented to the appropriate application module for processing. When all the incoming messages have been processed, outgoing messages accumulated during processing are sent. Then either the cycle is repeated or the MTI waits for something to happen.

An application module can interrupt this cycle if it has an urgent message to send. The cycle can also be interrupted by system efficiency requirements. For example, it might be desired to return to reading MAP at least every n seconds, regardless of how far along the current stack of transactions may be in processing.

Commits

A commit is a synchronization event that provides a known, stable point to which a MAP node can return in the case of failure. The node can be sure that processing is correct (or at least manageable) up to that point, and recovery procedures can then complete restoring the node to the system.

A single commit applies to every update that was done since the prior commit whether an update was done by the MTI in backing up transactions, or

by an application module updating an application database. This means that if a rollback is done, not only are the transactions restored to the way they were prior to being processed, but also, every update that was done by any application modules is undone. This is the heart of the recovery process in the MTI. Moreover, the inverted structure of the MTI makes this almost as simple as a single statement.

Of course, to allow the integrity of the MTI commit, application modules must never do a commit on their own.

The interface shell provides two commits. The first is after new messages are read in. This ensures that messages are recoverable and provides an application roll—back point if this node should fail while processing this batch of incoming transactions. The second is after application processing but before sending the accumulated transactions. This ensures the completed application processing and makes sure that messages are recoverable if this node should fail while sending.

Acknowledgements

Acknowledgements are messages sent between two parties telling each other about the messages they have been sending to each other. The MTI uses acknowledgements but they are not for use by application modules. In a network, acknowledgements may be viewed as existing at three levels.

Level I is network acknowledgement. This is used by the transport mechanism (in this case, MAP) to tell itself whether a message has been delivered. It does not guarantee anything other than noting a certain movement of data from one place to another. MAP guarantees a "best effort" to deliver a message, which means that the message absolutely will be delivered as long as MAP, the MAP hardware, the sending and receiving nodes are all up for the duration of the transmittal; if any of these conditions is not true, the message possibly will be lost. While these conditions are true, MAP uses an acknowledgement mechanism to ensure delivery. These acknowledgements are never seen by either the interface or application modules.

Level II is the MTI acknowledgement These are the acknowledgements used by the MAP Transaction Interface. They are used by the Interface to ensure transmission in those cases not covered by MAP. To the Interface, receipt of an acknowledgement indicates that responsibility for a message has been accepted by the receiving node. This means that the sender no longer needs to protect the message for resending, rollback, or recovery. This does not mean the receiver understands the received message or knows what to do with it. These acknowledgements are never seen by application modules and are treated by the transport mechanism as ordinary messages.

Level III is the application acknowledgement. This is sent from one application to another to say that a certain application event has occurred. It is used to synchronize factory processes. They are treated by the interface as ordinary messages. They are the responsibility of the application modules.

There are two categories of transaction in the ITP system — those that require acknowledgements and those that do not. Transactions in the first group will be delivered by the Transaction Interface no matter what happens; even if this process dies, the receiving process dies, MAP dies, or even the entire system dies. To ensure this level of robustness, a moderate amount of overhead is incurred, primarily through backing these messages up to disk.

Some transactions do not need this level of service, either because of their data content (for example, logging transactions) or volume. Such transactions can be designated to the interface and they will be neither backed up nor acknowledged. This saves an extra network message and possibly 4 database accesses. The cost is that delivery of these transactions is not guaranteed.

Duplicate Messages

There is a separate transaction number counter kept for each node with which a given node can speak. This is incremented as each transaction is sent or received. It is used by the receiving process to detect duplicate messages. A duplicate message can be sent if the transport network is very slow and an acknowledgement is not received for a while. If a node receives a message with a transaction number less than or equal to the current high number associated with that message’s source, it is assumed to be a duplicate. An acknowledgement is sent, if necessary, and the message ignored. The acknowledgement is necessary to ensure that the sender is relieved of responsibility for the message.

Connections

Connection control is exclusively that of the Transaction Interface. Connections are made, maintained, and restored without any application knowledge.

A MAP node can be up, down, or pending. An up node is one for which the MAP connection has successfully been made and is available for use. A down node is one for which its link has failed in either a send or receive operation. It is available for only the connect/answer function. A pending node is one which has an outstanding connect or answer function.

An Oracle database is available to each node describing the other nods in the system and the connect/answer protocol to be used. Each node shell consults this table to determine whether it answers or connects to another node.

A node may also be blocked. This is a node in which an application module has detected a malfunction. The node has been disconnected and is not available for any function, even connect/answer.

CONCLUSIONS

The MAP Applications Layer works. It greatly simplifies the use of the MAP communications network.

One of the hopes for easing the job of interfacing with MAP was the promised extra reliability of a fault-tolerant computer. In fact, the hardware has proven extremely reliable.

While volume tests of the MAP implementation have not been done, services provided by the Stratus MAP implementation seem easily adequate for the volumes expected, with transmission times within a single processor of about 1 second on a lightly loaded machine.

In spite of all this, there still was need for a network interface. MAP still does not address the needs of application programmers. Reliability of transmission and network security are areas outside the scope of MAP and thus must be implemented by users.

Even with fault-tolerant hardware, the reality of non-fault-tolerant software necessitates almost as many precautions.

The itp MAP Transaction Interface has addressed and solved many of these It is very easy to use, and imposes an architecture on the system that has been easy to work with. Messages do get received even when processes and hardware crash. Connections can be broken and re-established dynamically. Even with all the services of MAP and a special mainframe computer, all this took a great deal of work.

And even with the services of the MTI, there is still more to be done.

The area of restarting parts of a factory is one that needs more investigation. If a cell or factory control process drops out of the MAP for a short time for hardware reasons, this interface can easily handle the situation. If the dropout is for software reasons, resolving the software or data problem takes a fair but acceptable degree of work. If the dropout, for either reason, is for an extended period of time, the situation can be very complex, because it involves a changed state of the factory and how the revived process can be re—integrated into that new state.

In summary, the MAP Application Layer Interface is a useful and effective way for application programs to communicate with each other. The addition of fault-tolerant hardware has eliminated or greatly reduced th most obvious "sin" in networks that users see — system hardware failure. Now, for the systems designer, the most complex problem in a networked system is dealing with software failure, and this is the most difficult problem of all.

As a final note, the author specifically would like to acknowledge the contribution made in developing and implementing this interface by Douglas Richardson, Robert Hartman, and John Silletto, all of itp boston, inc.