Instance Model Programming

Abstract

Here the term "instance" refers primarily to instance data. Instance data is persistent data allocated to enable code reentrancy. It is commonly found in shared libraries, as well as in instances of a class in object oriented programming (OOP). The programming model presented here is similar to OOP, but with a very different emphasis. The OOP model along with its terminology is derived from SIMULA, a simulation programming language. Most programming is not simulation, so the fit is somewhat awkward. The encapsulation and information hiding provided by SIMULA can be used in general programming, without the need to impose the entire simulation model.

Background

Computer programming is fundamentally procedural. At its core, it is detailed instructions to a computer on how to accomplish a task. Historically, assembly language was the first abstraction over the basic binary instructions that a computer recognizes. Next came still higher order languages like FORTRAN and COBOL. Around the same time, structured programming was introduced as a general way to organize code into relatively autonomous units.

Along the way, other languages were developed to address specialized areas of programming. Functional programming was introduced to aid the special needs of artificial intelligence development. (Some see functional programming as a vain attempt to turn programming into mathematics.) Other so called fourth generation languages provided even higher levels of abstraction, but did not find usage much out of their limited domains. Object oriented programming was a new paradigm that gained wide acceptance at all levels of software development. It had its roots in the simulation language, SIMULA.

SIMULA addressed the specialized needs of simulation programming. Briefly, simulation usually involves the creation, operation, and destruction of large numbers of data structures that mimic objects in the real world for the purpose of statistical analysis to model, analyze, and predict the behavior of a system. Real world objects are classified and described by their properties. So an aggregation of properties and behaviors becomes a class. Where classes may have similarities, inheritance is used to generalize the model and simplify its implementation.

Object oriented programming

OOP is a higher order model that attempts by structure to impose a less error prone method on the basic procedure of software development. Literally by structure, since the fundamental unit is a class which is just an enhanced C struct in most common implementations. Definitions of OOP vary, but here a simple static model of OOP is used. So no dynamic dispatch or other advanced features. Basically just information hiding and encapsulation, aggregation, limited inheritance, and of course instance data.

Classes are structures that encapsulate all of both the public and private data and methods that are used by the class. Everything in the class is visible and included, at least its declaration is visible and often even some or all of the implementation is included in the struct itself. This tends to result in a large, ungainly structure. The private data declarations are included even though they cannot be used. Private method declarations are also included presumably to simplify the implementation of the class by a compiler.

Problems with OOP

The main problem is the kitchen sink approach that has already been alluded to above. The Parnas(3) information hiding principle is violated by the inclusion of directly accessible public data in the class structure. The inclusion of declarations of private data and private methods just simply violates common sense. So in OOP, huge amounts of unnecessary information obscure and complicate the structure declaration. This is unfortunately necessary for a simple compiler to be able to implement transparent inheritance.

So it would seem that the most important and overly used feature of OOP is also its main weakness. As the Gang of Four(1) has pointed out, inheritance as practiced in its usual white box fashion is a problem because it tends to break encapsulation. It can be argued that this fault overshadows the code reuse benefit provided by inheritance. That is to say, inheritance as it is commonly implemented and used, not the idea itself. Inheritance will be shown to be easily implementable in a black box fashion that does not break encapsulation.

Another problem is the OOP overemphasis on data. Except in a database, the data in most programs is a minor player. The design and implementation of algorithms constitutes the vast bulk and difficulty in developing software. Data is needed and used, but it generally is not the major component of a program. Yet in the OOP paradigm methods only exist to serve the data. This is just a perspective that has been promoted in an attempt to legitimize a somewhat flawed and artificial model of programming. The emphasis on objects and how they interact with each other often results in code that tends to be convoluted by its effort to fit a procedural algorithm into a data centric model.

A concrete example of the convolution charge is in order. Traditionally in parsing the algorithm is implemented as a parsing routine that processes tokens one at a time. A simpler scanner routine is used by the parser to assemble tokens from the input text that is being parsed. The parser calls the scanner to supply tokens as needed by the algorithm. In the early rush to OOP, some programmers dutifully tried to fit parsing to the OOP model by having the "data parse itself." So the data supplies itself to the scanner which then invokes the parser a separate time as each token is recognized. Of course this will work, but at the expense of having the parser maintain state between calls. So an efficient algorithm is turned on its head solely because the data must be in charge of the process.

A different model

Structures, or records as they are sometimes called, provide most of the means to implement the basic useful features of OOP. They were designed specifically to implement aggregation of disparate data types. A structure that includes an instance of a different structure can be thought of as inheriting the properties of that second structure. If the primary structure provides similarly named methods to wrap around the secondary structure methods, then explicit manual inheritance is achieved. Wrapped methods are exposed just as public methods are in OOP. Methods not explicitly wrapped can still be used by the primary structure, and be thought of as being private, though not exactly in the same way as the term private is used in OOP.

Inheritance for the purpose of code reuse is a valuable tool. If the inherited code is then modified and reexported with its new functionality, even more value is obtained. To continue in the parsing vein, if a parser requires more than one level of lookahead, it is useful to wrap a peeker around a scanner. The peeker can buffer input tokens at different lookahead levels and supply them to the parser as needed. The scanner can remain unchanged and the peeker has no need to have knowledge of the internal workings of the scanner, only its public interface. This means that an existing scanner or scanner generator like LEX can be used without modification or even the need for it to be written in an OOP language. The key is that the interface must be black box.

Constructors and destructors are also easily implemented manually in most programming languages. The main difference from OOP is that they must be called explicitly instead of having the compiler call them at declaration time and at the end of the scope of the structure. A very minor difference indeed for the constructor since a "new" declaration looks quite like a function call. Remembering to call the destructor is an inconvenience and a possible source of error. It is here that the reader is reminded that most code is not simulation, so not that much instance destruction is expected to occur in many programming environments. In any case, we are talking about an avoidable programmer error, not a flaw in the model.

The last and most important feature to implement is the persistent instance data. Again, trivially easy to do in most languages. An extra anonymous pointer as the first argument to a function has long been used as a way to provide instance data. In OOP, the same pointer is provided implicitly and can be referred to as "this" or similar. Actually, in OOP the implicit pointer refers to the structure itself, not just the data part of the structure. The suggestion here is that all of the data should be kept completely hidden. Access is provided solely through routines in the structure that contains only a single anonymous data pointer referred to as the "id." The id points to a data structure that is declared and defined in the implementation code. Thus all of the data is truly private and the implementation of the black box is complete.

An implementation in C

In C the fundamental unit of the instance model is a header file along with one or more implementation C code files(2). The instance structure is a typedef as in figure 1. Note that the structure that would be a class in OOP consists of only a single data pointer along with the public function pointers. The defines are syntactic sugar making it unnecessary to specify the name of the structure twice. A side benefit of this is that it returns primary importance to the function name. Of course, the defines and their use is optional in the model.


typedef struct _peeker
{
    void     *id;

    short   (*peek) ( void *id, int level );
    short   (*get)  ( void *id );      
    int     (*get_line_number) ( void *id );
    void    (*release) ( void *id );
    
} peeker_t;

#define peek(self,a)            self.peek(self.id, a)
#define get(self)               self.get(self.id)
#define get_line_number(self)   self.get_line_number(self.id)
#define release(self)           self.release(self.id)

PUBLIC
peeker_t
MakePeeker ( symbols_t  symbols ); 

Figure 1. Instance header file for a peeker


The implementation file in C is shown in figure 2. Note that everything in this file is private except for the constructor. In C, file level privacy is provided by the "static" keyword. The structs are private by virtue of not being included in any header file.


#include "scanner.h"
#include "peeker.h"

typedef struct _peek {
    symbol_t       *buffer [ 128 ],
                  **current;
    int             count;
} peek_t;

typedef struct _data {
    symbols_t  sym;         // symbol table instance
    scanner_t  scan;        // inherited instance of a scanner
    peek_t     peek;        // peek buffer
} data_t;

// ----------------------------------------------------------------------- 
// get - scan next token
// ----------------------------------------------------------------------- 
                           
PRIVATE
short 
(get) ( void *id )
{                          
    data_t *d = (data_t *) id;
    ushort  token;

    if ( d->peek.count ) {                          // previous peek token
        token = (*d->peek.current)->token;
        set_current ( d->sym, *d->peek.current );
        if ( --d->peek.count == 0 ) {               // peek buffer empty
            d->peek.current = d->peek.buffer;
        } else { 
            ++d->peek.current;
        }
    } else {
        token = get (d->scan);                      // get a new token
    }

    return  token;
}

// ----------------------------------------------------------------------- 
// peek - scan next token without consuming it
// ----------------------------------------------------------------------- 

PRIVATE
short 
(peek) ( void *id,
         int   level )
{
    data_t   *d = (data_t *) id;
    short     token;
    symbol_t *save_symbol;

    if ( level > d->peek.count ) {                      // need to scan 
        save_symbol = get_current (d->sym);
        token = get (d->scan);
        d->peek.current[d->peek.count++] = get_current (d->sym);
        set_current ( d->sym, save_symbol );
    }
    token = d->peek.current[level-1]->token;

    return  token;
}

// -----------------------------------------------------------------------
// get_line_number - only a wrapper for the scanner_t function
// -----------------------------------------------------------------------

PRIVATE
int     
(get_line_number) ( void *id )
{
    data_t  *d = (data_t *) id;

    return  get_line_number ( d->scan );
}

// -----------------------------------------------------------------------
// release - free the allocated resources
// -----------------------------------------------------------------------

PRIVATE
void
(release) ( void  *id )
{
    data_t  *d = (data_t *) id;

    release ( d->scan );
    free ( id );
}

// -----------------------------------------------------------------------
// MakePeeker - inherits scanner_t
// ----------------------------------------------------------------------- 

PUBLIC
peeker_t
MakePeeker ( symbols_t  symbols )                // symbol table instance
{
    data_t   data;
    peeker_t peeker;

    data.scan = MakeScanner ( symbols );         // inherited instance
    data.sym = symbols;
    data.peek.current = data.peek.buffer;
    data.peek.count = 0;

    peeker.id = malloc ( sizeof ( data_t ) );
    if ( peeker.id ) {
        *(data_t *) peeker.id = data;
    } else {
        fprintf ( stderr, "MakePeeker: out of memory\n" );
    }
    peeker.get_line_number = get_line_number;
    peeker.get = get;
    peeker.peek = peek;
    peeker.release = release;

    return  peeker;
}
Figure 2. Instance implementation C code file for a peeker

The MakePeeker C function is the instance constructor for the peeker. It inherits a scanner instance by using MakeScanner to construct one. A symbol table manager instance is passed to the peeker as an argument by value and then saved in the peeker structure. Once constructed, the peeker is also returned by value. Since nothing in the instance structure is modifiable, and since the data is defined by reference within the structure, passing the structure by value and passing it by reference are semantically equivalent. By value is preferred here because it allows the use of dot notation which is more esthetically pleasing than the C pointer arrows. However, nothing prevents the use of by reference for those who may prefer it.

Note that the unfortunate need to use pointers is limited to the case where the implementation data is being used within the implementation file. The code presented here uses unusually short identifiers to try to make the pointers less obtrusive. So the anonymous data pointer is called "id", and the typed references to it are named simply "d" which is as compact as possible an abbreviation for "data." Similarly, the instances contained within the peeker have abbreviated names. They are basically just tags that contribute little to the understanding of the code, so barely enough to identify them is sufficient. The use of an explicit pointer does however provide a very clear notion of what is happening in the code. When functionality is provided under the covers, it can sometimes be confusing.

A more traditional model

The instance model as presented so far provides complete information hiding. If this requirement is relaxed slightly, a simpler and more familiar model is possible. This is accomplished by allowing the instance structure to contain the public data. Private data and methods are still hidden in the implementation file. Macros can be provided to access the public data in a function style rather than directly, providing efficiency along with a sort of information hiding. In newer implementations of C, inline functions can also be used for data access. Figures 3 and 4 provide C code to illustrate this version of the instance model.


#include "scanner.h"

typedef struct _peek {
    symbol_t   *buffer [ 128 ],
              **current;
    int         count;
} peek_t;

#define SELF    struct _peeker

typedef struct _peeker
{
    symbols_t *sym;
    scanner_t  scan;            // inherited instance
    peek_t     pk;

    short   (*peek) ( SELF *d, int level );
    short   (*get)  ( SELF *d );
    int     (*get_line_number) ( SELF *d );

} peeker_t;

#define peek(self,a)            (*self).peek(self, a)
#define get(self)               (*self).get(self)
#define get_line_number(self)   (*self).get_line_number(self)

PUBLIC
void
InitializePeeker ( peeker_t  *peeker,
                   symbols_t *symbols ); 
Figure 3. Simpler Instance header

The header file in figure 3 illustrates some of the advantages and disadvantages of the simpler version of the instance model. The view of the public instance data in this simple case is not much of a problem, except for the need to include the peek_t structure. This data structure is part of the low level implementation details that should ideally be hidden from casual view. A considerable advantage that is shown here is that the release function is not needed at all for an instance that does not require user level cleanup. The standard C scope rules handle destruction of the instance automatically.

The implementation file in figure 4 is quite similar to the one in figure 2. The primary structural difference is that the data declarations have been moved to the header file. The constructor is also functionally different. It is descriptively named InitializePeeker rather than MakePeeker because it only initializes an existing instance that has already been created by a standard C declaration in the calling code. The data area already exists in that structure, so no need to allocate it here.


#include "peeker.h"

// ----------------------------------------------------------------------- 
// get - scan next token
// ----------------------------------------------------------------------- 
                           
INSTANCE_METHOD
short 
(get) ( peeker_t *d )
{                          
    short   token;

    if ( d->pk.count ) {                          // previous peek_token
        token = (*d->pk.current)->token;
        set_current ( d->sym, *d->pk.current );
        if ( --d->pk.count == 0 ) {                 // peek buffer empty
            d->pk.current = d->pk.buffer;
        } else { 
            ++d->pk.current;
        }
    } else {
        token = get ( &d->scan );
    }

    return  token;
}

// ----------------------------------------------------------------------- 
// peek - scan next token without consuming it
// ----------------------------------------------------------------------- 

INSTANCE_METHOD
short 
(peek) ( peeker_t *d,
         int       level )
{
    short     token;
    symbol_t *save_symbol;

    if ( level > d->pk.count ) {
        save_symbol = get_current(d->sym);
        token = get ( &d->scan );
        d->pk.current[d->pk.count++] = get_current(d->sym);
        set_current ( d->sym, save_symbol );
    }
    token = d->pk.current[level-1]->token;

    return  token;
}

// -----------------------------------------------------------------------
// get_line_number - return line_number from scanner_t
// -----------------------------------------------------------------------

INSTANCE_METHOD
int     
(get_line_number) ( peeker_t *d )
{
    return  d->scan.line_number;
}

// ----------------------------------------------------------------------- 
// InitializePeeker - inherits scanner_t
// ----------------------------------------------------------------------- 

PUBLIC
void
InitializePeeker ( peeker_t  *peeker,
                   symbols_t *symbols ) 
{
    InitializeScanner ( &peeker->scan, symbols );
    peeker->sym = symbols;
    peeker->pk.current = peeker->pk.buffer;
    peeker->pk.count = 0;

    peeker->get_line_number = get_line_number;
    peeker->get = get;
    peeker->peek = peek;
}
Figure 4. Simpler Instance implementation

Critique

Several of the conveniences provided by OOP compilers are lost in the instance model. A major one is the implicit call to inherited methods without the need to explicitly include them in the class. The loss of this convenience to the programmer is compensated for by the full description of the interface in the structure itself. There is no need to refer to the inherited class to find out what methods are available. The black box implementation shields the programmer from all implementation details. If a method "get" is provided in the structure, only its description is needed, not its source or even any knowledge of whether it is inherited or directly implemented. This provides even more flexibility since an inherited structure can be transparently replaced with a similar one that provides the "get" method without the need to change anything in the code declaration that uses the inherited method.

Conclusion

A very simple OO programming model was presented. The value of the model is that it provides a method to implement a black box version of OOP in a non-OOP language. The model is similar enough to existing OOP languages that it should be easy for most programmers to pick up. Features of the model include

  • implemented in native C code
  • no special syntax
  • no additional compiler or preprocessor
  • no additional runtime
  • The code samples presented were extracted from a full C language parser. The complete source code for that parser is located at http://slkpg.byethost7.com.

    Dedication

    The Instance Model is dedicated to the memory of Dennis M. Richie.


    (1) Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides (1994). Design Patterns: Elements of Reusable Object-Oriented Software Addison-Wesley, Boston, Massachusetts.

    (2) Kernighan, B.W. and D.M. Ritchie (1988). The C Programming Language, 2nd ed. Prentice-Hall, Englewood Cliffs, NJ.

    (3) Parnas, David (1972). "On the Criteria to Be Used in Decomposing Systems Into Modules," Communications of the ACM, December, pp ???.


    Home

    Copyright © 2001-2012 SLK Systems LLC