CMS/TSO PipeLines is a product that runs on IBM mainframes. On a z/VM system, it runs under CMS. On a z/OS system, it runs under TSO. As the name suggests, data is passed through a series of builtin or user defined stages. Along the way, the data can be modified or deleted from the pipe. You may be saying to yourself that there is nothing new in that. What is different is that PipeLines support concurrent parallel pipes and data can be routed into and out of a pipe.
PipeLib class library can be accessed from a .NET program, I think you will find that running it under PowerShell is more beneficial. I do all of my testing through PowerShell and any pipelines I supply will most likely be PowerShell scripts. Therefore, as minimum system requirements, I recommend the following:
Data can be seen as flowing from one stage to another in sequence. This also applies when data is sent to a second pipe. It flows in sequence through the second pipe. But what happens when the second pipe routes data back to the first pipe. What will the sequence of the data be after the merge in the first pipe?
A Seemingly Contradictory Statement
The order in which the stages of a pipeline run is undefined. You may ask yourself, how is the concept of data flow compatible with the previous paragraph. The answer is the IO logic in the stages imposes an order upon the data flow. A stage processes data using the following logic:
- While data is available, do the following
- Peek at the next available record in data buffer
- Process that record
- Write record
- Wait for record to be consumed by the reading stage
- Read, consume the record from the data buffer
- Repeat from step 1
As long as this basic logic is followed by a stage, data will flow in sequence through the pipeline.
'(end ?) < file.txt'
'|spec /-> / 1 1-* n'+
'|spec / / 1 1-* n'+
This pipe reads a file and looks for records that begin with the
abc". When one is found, it is marked and written to the console. If the
string is not found, the record text is shifted to the right and sent back to the first pipe to be written to the console. When the find stage writes a record, whether to its primary or secondary output, it waits for the record to be consumed before reading the next record. The spec stage gets the record, writes its output and then waits. The faninary stage gets and writes the record, then waits. The console stage gets the record, writes it to the terminal and since it is the last stage, and has nobody to write to, it consumes the record.
When the console stage consumes the record that releases the faninany stage from its wait and it consumes the record. Then the spec stage and then back to the find stage. The find stage reads the next record and the process starts over.
If this read, write and wait structure was not imposed, the data written to the console would be out of sequence with the data file. The faninany stage reads data from any input that has data available. Thus, without the imposed structure, it could read several records from its primary stream before it reads the secondary, thus altering the data sequence.
Stalling a Pipe
A pipeline has stalled when data stops flowing through it. Consider the following:
'(end ?) ...'+
The fanout stage writes a record to its primary output. The fanin stage reads it, then issues a read for the next record on its primary input. The fanout stage then writes a record to its secondary output. This record ends up on the secondary input to the fanin stage. The pipe line then stalls. Why? The fanin stage is waiting for data on its primary input and the fanout stage is waiting for the record on its secondary output to be consumed. You have a lock out and the pipeline stalls. In general, all stalls are caused by a writing stage waiting on a reading stage which is waiting on the writing stage.
Preventing A Stall
To prevent a stall, you need a stage that consumes its input, thus freeing the writing stage, before writing its output. Using the above example:
'(end ?) ...'+
The buffer stage reads and stores all input before it writes the contents of its internal buffer. Now when the fanout stage get an EOF condition on its input, it forwards that condition to its primary and secondary outputs. When the fanin stage receives an EOF on its primary input, it begins to read from its secondary input. The buffer stage receives the EOF on its primary input and starts to write the contents of its buffer. The fanin stage, which is waiting for data, can now read the data from the buffer.
The stall could also have been prevented by using a faninany stage instead of a fanin. That would probably change the order of the output records and you would have to add other stages to preserve the order, if that was a requirement.
When a pipeline stalls, you will see a message similar to this:
WARNING: PipeLine stalled: stages still running are:
P1S1-< State: WaitingWrite
P1S2-fanout State: WaitingWrite
P1S3-fanin State: WaitingRead
P1S4-cons State: WaitingRead
The pipeline that generated this message is:
'(end ?) < sss.ps1'+
If a pipeline stalls insert a buffer stage will in all likelihood prevent it. But, depending on the pipeline, a buffer stage may be overkill. The buffer stage buffers all input and only writes it at EOF. Maybe a copy stage will suffice. It buffers one record before writing it. Or, the elastic stage. It buffers just enough records to prevent a stall.
The purpose of this article was to explain how data flows through a pipeline, how stalls arise and how to prevent stalls.
Stay tuned for Part III: User Written Stages.