Data Polymorphism

sonofdelphi

1.26/5 (13 votes)

Nov 22, 2005

CPOL

4 min read

44895

Why data polymorphism may be needed sometimes...

Introduction

Languages that support polymorphism seem to require functions to model polymorphic requirements. This suffices in most cases, but there are cases where only the data is polymorphic - the data take a different form in a different context. While modeling data polymorphism is possible through the use of templates (to be discussed in a later post), this article makes a case for the language to allow interfaces to have abstract data members, whose types are unknown in the base class and will be meaningful only in the derived context.

Interfaces (and Abstract Base Classes) are an expressive notation for mandating the implementation of required functions in derived classes. Why not utilize the same notation to accommodate data as well? In a nutshell, the following is possible:

class Abstract
{
 virtual void pureVirtualFunction()=0; //Unknown behaviour, specifiable
 void concreteFunction() 
 {
  pureVirtualFunction(); // We can call the undefined function
 }
};

But the following is not:

class Abstract
{
 virtual pureVirtualData; //Unknown data, Not allowed

 void concreteFunction
 {  
   print pureVirtualData.Count; // We know how to use the data though
 }
};

What does this imply? Could there be cases where such a feature would help?

Some months back, I encountered a requirement which I thought really called out for more direct support for data polymorphism in the language. In this article I'll explain the problem in a generalized way, leaving out some domain-specific details. Please do let me know if more details are needed to understand the problem and why the limitations mentioned are significant.

The Requirement

An algorithm needs to be implemented on the transmissions of an existing client-server system to improve its performance. Broadly, the algorithm transparently intercepts the packets, applies some transforms based on the packet and relays them.

In order to implement the algorithm, components should be deployed at both the server and the client. The algorithm depends on the functionality in a packet and a factor which is determined by whether the component is at the server or at the client.

Design with Behavioural Polymorphism

Packet is a concrete class which provides a buffer-storage area and some domain-specific functionality.

class Packet
{
 Byte Buffer[];

 virtual double toDouble();
 virtual int toInteger();
 
 void applyXXXToPacket()
 ...//Other domain-specific functions
 ...
};

The algorithm depends on Packet functions to transform the packets with a factor which is based on Packet::toDouble() and on Packet::toInteger(). Both these conversions are to be done slightly differently from the implementations in the Packet class and, also differently at client- and server-sides.

We have the inheritance:
ServerPacket: public Packet, and we reimplement toInteger and toDouble for server
ClientPacket: public Packet, and we reimplement toInteger and toDouble for client.

Algorithm components only need to apply some transforms on Packets. We generalize to the abstraction,

class AlgorithmComponent
{
 Packet& p; //Reference type required to achieve behavioural polymorphism

 void ProcessPacketXXX();
 ... //Functions which use Packet::functions
 ... //
};

The component implementing the algorithm only differs slightly on the server and client sides. But they do differ and so we have the specializations,
ServerComponent: public AlgorithmComponent
ClientComponent: public AlgorithmComponent

It worked! But...

Performance Issues

It turned out that both the required conversion functions were very expensive, leading to erratic performance. All that was really required was a one-time computation of the conversion, which could be done in the constructors of the specialized Packets. Saving these in a member variable would solve the performance issues. The ClientComponent and the ServerComponent could then use this saved value directly. But it is not possible for the generalized AlgorithmComponent to access a data member of the generalized Packet class through a Packet reference and get the implementations of the derived classes.

Consider the code snippet below:

class Base{ 
public:
 int i;
 Base(){i=2;} //In B, i is 2
};

class Derived:  public Base
{
public:
 int i;
 Derived(){i=5;} //In D, i is 5
};

void main()
{
 Derived d;
 Base& b=d;
 cout << b.i ; //Prints 2, not 5. No polymorphism for data access. :-(
}

This means that the abstraction of AlgorithmComponent would collapse - breaking the class hierarchy.

And a Trivial(?) Violation of a Design Constraint

An AlgorithmComponent necessarily has to be either server-side or client-side. It was just an abstraction of design which accurately summarized the operation of the algorithm. AlgorithmComponent should not be instantiable, it is an abstract base class by nature.

Implementing this particular design constraint is possible if we make the functions which depend on Packet pure virtual. Technically, for AlgorithmComponent to be abstract, we need to make only one of the ProcessPacketXXX()s pure virtual. But that would not accurately model the dependency on Packet for the other functions.

Functions that depend on Packet can be completely specified. We could avoid code duplication if we code them in the base class itself. The abstractness of AlgorithmComponent is because of its dependency on Packet. So, Packet is really what should be pure virtual.

What we should have had was an AlgorithmComponent with completely specified functions but with an unspecified Packet that had to be compulsorily supplied by the derived class.

Conclusion

What if, in the discussed example, ServerComponent depended on a double/class Foo and ClientComponent depended on an int/class Bar? Shouldn't abstract base classes be allowed to have undefined data members? Or equivalently, why not allow data members in interfaces?

Workarounds to these do exist, for example:

with templates (the discussion stopped just short of it, don't you think?), but it would be a compile-time solution. This will be discussed in a later article.
with composition instead of inheritance (which is how we chose to implement it finally), but lacks the simplicity and elegance of the explained design along with code duplication and what not.

Would it not be simpler and cleaner to express such a design if the language just allowed the data also to be truly polymorphic?

- Thomas Jay Cubb