Tag: programming

PIC18F Software Project Survival Guide 2

This is the second installment on a series of post regarding software programming on PIC18F cpu family. You can find the first here.
Tools
You can (and you really should for a non-trivial project if you care about your mental sanity) program the 18F using the C language.
Compiler
Basically there are two options – the first is the MCC18 compiler from Microchip and the other is the HiTech C. MCC18 is cheap and crappy, HiTech C is expensive and more optimizing (I cannot say whether is crappy or not since I never used it).
MCC18 is not fully C89 standard, on the other hand you need some extension to get your work done on this little devil. HiTech could be more ISO/ANSI compliant (I don’t know), but it is not compatible with MCC18 (is something they are planning to add in next releases, anyway I wouldn’t hold my breath). For this reason you’d better chose early which compiler you want to go with since they are not compatible. Probably you can manage to write portable code, but be prepared to write a lot of wrapper layers. Nonetheless you have to sort this out before starting coding.
Just to give you a hint about the compatibility problem I am talking about, apart from the way the two compilers provide access to the hardware registers, the HiTech uses the “const” attribute to chose the storage for variables, while MCC18 relies on non-standard storage qualifier keywords rom and (optionally) ram.

When I say that MCC18 is crappy, I have a number of arguments to support my point. Each point cost me at least a couple of hours to discover and work around, but sometimes I needed to spend days.
ISO/ANSI compliance is lacking from the preprocessor to the compiler. Not only preprocessor fails to properly expand nested macros, but it also messes up line numbering when a function-like macro is invoked on multiple lines.
For the first problem I haven’t found any workaround but to hand-code a part of the preprocessor work. For line numbers I use the backslash to foul the preprocessor in believing it is just a long line

#define A(B,C,D) /* macro definition */

A( longParameterB,
    longParameterC,
    longParameterD );

Compiler warnings are inadequate at best. For example you don’t get any message if a function that return a non-void type has no return statement. On the contrary when you compare an unsigned int to 0 (and not 0u) you get a warning. And you get warning for correct code, for example you can’t pass a T* to a const void* parameter without getting the warning, event if the two pointers have the same size and the same internal representation.
This behavior makes your life hard if your programming guidelines require maximum warning level and no warnings and doesn’t help you with real problems in your code. I use PC-Lint to spot real problems, but a run of gcc with some #define to handle non-standard constructs will spot most of them.
About warnings I had to fight back my loathing of useless casts and add them just for shutting up the compiler.
Given the poor state of the lad, I hadn’t been able to write a static assertion macro. Usually you write such a macro by turning a boolean condition in a compile-time expression that can be either valid or invalid (e.g. declaring an array with -1 or 0 elements, declaring an enum and assigning the first value to 1/0 or 1/1…). I haven’t found any way to get the compiler refuse any of these constructs.
One of the worst part of the toolchain is that it can produce code that breaks some hardware limitation without a warning. For example the compiler relies on a global temporary area for computing numerical expressions (array access is a case). The code generated expects that the temporary area is entirely contained in a data memory bank. The compiler nor the linker are able to detect when this area falls across a data memory bank boundary and alert the programmer. This is nasty because you can get subtle problems or having a failing program just after a recompilation.
Similarly the C startup code relies on a similar constraint for a group of variables, should they not fit in the same data memory bank, the initialization silently fails.
It took me few minutes to rewrite the startup code initialization routine and can’t see any noticeable slowdown.
I would advise to:

  • rewrite the C startup code, keep in mind the limitations of the compiler (breaks on objects laid across page boundary, breaks on accessing struct larger than 127 bytes, break on accessing automatic variables if the space taken is some tens of bytes);
  • use another tool (gcc/Pc-lint) to parse the source code and get meaningful warnings (missing returns, == instead of =, unused variables, uninitialized variables and so on);
  • enforce data structure invariants and consistency by use of assertion;
  • if you find a way to implement static assertion, let me know.

Next time I’ll write about linker and assembler.

PIC18F Software Project Survival Guide

Now that I’m getting nearly through I feel confident about posting this series of posts about my work experience on PIC18F. Although my writing could seem a bit intimidating or cathedratical I would like to receive your feedback and your thoughts on the matter. I got through, but I don’t like to have universal solutions 🙂
So, at last you failed in defending your position. No use in all the documentation you provided, the articles from internet and blog posts where with no doubt, PIC was clearly depicted as the wrong choice.
But either your boss or your customer (at this point it makes not much difference) imposed a PIC18F on your project. And she also gave a reason you can hardly argue about – Microchip never sends a CPU the way of the dodo… so, twenty years from now we could continue to manufacture the same device with the same hardware avoiding the need for engineering maintenance.
Given that this device will be sold in billion units, that makes a lot of sense.
Their problem is solved, but yours are just looming at a horizon crowded with dark clouds.
Good news first, you can do it – PIC 18Fs (after some twiddling) have enough CPU power to fit most of the applications you can throw at them. I just completed a device which acts as a central hub of a real time network and provides information via a 128×64 pixels display.
Bad news – it won’t be easy, for anything more convoluted than a remote gate opener, due at most by yesterday (as most of the project requires nowadays) your life is going to be a little hell. I’ll try to describe if not the safest path in this hell, at least the one where you cannot get hurt too badly.
So, let’s start by architecture.

Architecture
PIC18 architecture is described almost everywhere (checked on the back of your cereal box, recently?), but the first place you are going to look, the datasheet, will be mostly helpless. So I will try not to repeat anything and I will not go much into details, but I will try to give you a picture with the objective of showing the capabilities and the drawbacks of these gizmos.
First these are 8 bits CPUs rooted in the RISC field – simple instruction, simple task, low code density.
The memory follows the so called Harvard architecture – two distinct memories for data and for program instructions. Data memory is called Register File, while program memory is called… Program Memory. Data memory is a RAM, while program memory is a flash.
Program memory is linear (no banks, no pages), each word is 16 bits wide, but the memory can be accessed for reading (or writing) data one byte at time. Current PIC18s have program memory sizes up to 128k, but nothing in their design prevents them to address up to 16Mbytes (2^24).
You can erase and write the program memory from the PIC program itself (this is called self-programming), but there are some constraints – first memory is organized in pages, 1024 bytes each. In order to write the program memory you have first to erase it and this can be done only one page at time. Once the page has been erased you may write it altogether or just one byte at time.
The worst part is that when the program memory is erased or written the program memory bus is used and therefore the execution is stalled. This stall can last for several milliseconds.
Data memory can be accessed either linearly or through banks of 256 bytes each depending on the assembly instruction you use. Data memory for PIC18s is up to 4k, again there’s nothing in the design that prevents the CPU to address up to 64k or RAM. In the data memory there is a special section (Special Function Registers) where hardware registers can be accessed.
PIC18 architecture becomes quite funny on SFR since you can find the usual timer, interrupts and peripherals control registers along with CPU registers such as the status flags, the W register (a sort of accumulator). Further there are registers that basically implements specific addressing. For example PIC18 has no instruction for indirect addressing (i.e. read from a location pointed to by a register); if you want to indirect access a location you have to load the location into a SFR (say FSR0 for example) and then read from another SFR (e.g. INDF0). If you want a post-increment you read from POSTINC0.
That may sound elegant, but it is a nightmare for a C compiler, basically any function that accepts a pointer could thrash part of the CPU state, since most of the CPU state is memory mapped!
That’s also the reason why, conservatively, the C compiler pushes about 60 bytes of context into the stack on entering a generic interrupt handler.
There is a third memory in every PIC18F – the hardware return stack. This is a LIFO memory with 31 entries. Each entry is the return address stored every time that a CALL (or RCALL) assembly instruction is executed.
Still on the CPU side, PIC18F features two level of interrupts – the high priority level interrupt and the low priority interrupt, you can assign every interrupt on the MCU to one or the other of the levels.
Talking about the peripherals you will find almost everything – from low pin count device to 100 pins MCUs with parallel port interface, external memory and ethernet controller. Even in a 28 pins DIL package you found a number of digital I/Os, comparators, DA and AD converters, PWMs. Every pin is multiplexed on two or three different functions. I2C and SPI are available on each chip, while USB port is available only on a couple of sub-families.

Next time, I’ll talk about tools

Commit logs

Geek’s humor comes in several flavors, “[http://whatthecommit.com/|what the commit]” is a tasty one. I wonder how funny it is looking at my commit log… say yesterday:


 9:42 - 127 bytes saved
 11:07 - 140 bytes saved
 11:20 - 882 bytes saved (got rid of Xyz state machine)
 11:25 - 22 bytes saved (got rid of Wuz - unused code, just program memory waster)
 11:41 - 431 bytes saved (got rid entirely of Xyz)
 14:00 - prevented race condition on frame receive.
 14:13 - ISO/ANSI insulting warning removed with the following comment
     // The cast below is not needed in ANSI C and should be considered a
     // bad coding practice. BUT we are stuck with MCC18 compiler that is
     // NOT ANSI compliant.
 14:46 - 18 bytes saved by splitting fn() into newFn() and retryFn()
 15:55 - 54 bytes saved by removing a couple of arguments from a function.
 16:20 - 54 bytes saved by replacing function fu().
 

With less than 10k free in the program memory, yesterday it was again Spring Cleaning Day… but long are gone the days of easy gains, now we are sorting of scraping the bottom of the barrel. Here’s some math for you.. if a contractor is paid 400€/day and the difference between a @!* PIC18 and any other micro is less than half a dollar, how many units of your device you need to sell to justify the contractor work?

PIC 18F and its stinking stack

During the past week PIC 18f continued to be my main headaches supplier. I think I already wrote enough about the messiness of the Harvard architecture combined with “modern” languages such as C. Well, since two separate address spaces didn’t seem enough, the chip designers opted to add a third for call return stack. The place where the CPU stores the return address before jumping to a subroutine is into a distinct address space… that’s not so addressable – since you can read and write only the topmost location.
Being our model the top of the range, it features 31 levels in the stack. That means that you can sub-call no more than 31 times without breaking the stack (or, you can’t compute factorials greater than 31! :-)).
Well 31 is a reasonable amount, or at least I thought so. Then we ran out of Program Memory, so we enabled the compiler space optimization. I must admit that these optimizations are quite effective being able to trim down the size of your binary of some 30%. Most of this percentage is achieved by the so-called “procedural abstraction” optimization. This procedure factorize common sequence of assembly into sub-routine. By repeatedly applying the technique, more and more blocks get extracted and replaced by a CALL instruction.
The side effect is that this technique has a dramatic impact on the stack – it may doubles the number of nested calls.
In our case the stack exploded when a deeply nested graphic routine was interrupted by the low-level interrupt that in turn was interrupted by the high level interrupt.
I had the terrible “game-over” feeling – I had to use the optimization to fit in the memory, but I had no control over the nesting of the calls. In fact every action I could take to soften the nesting would be undone automatically by the optimizer.
I don’t like to give up, so I fired up acrobat reader on the datasheet trying to dig a way out of troubles.
PIC 18f datasheet about call return stack isn’t terribly clear, so I had to read it several times. Eventually I concluded that stack overflow and underflow have two kinds of handling selectable once for all by configuration bits (sort of fuses you set when programming the device). In the first way when the execution attempts to push the 31st entry in the stack, just flips a flag. Next pushes just overwrite the entry. I thought hard about this, but I found no use whatsoever.
Next mode provides a sort of interrupt. On the 31st push the execution jumps to 0x000000 (the reset vector) with the overflow condition flagged.
Woha, I realized, I could trap this address and check for the overflow before handing the execution flow to the standard firmware. If I am here for an overflow, I could dump the hardware stack (one entry at time, it is possible) into somewhere in RAM and continue the execution where…
Exactly, where?
The CPU discards the target address of the call, it just pushes the return address into the stack. Well I could peek around the return address to see whether a CALL either relative or absolute has been issued and recover the target address.
I really enjoyed the idea, but the more I though of it, the more I found critical cases. What if the jump was caused by a call to a pointer? The PIC code for such an addressing is all but straightforward. Maybe I could detect the call pattern and perform some recovery.
Eventually I found the real killer of my idea in interrupts. In fact the call return stack is shared for normal execution and interrupts. The code for recovering the target address would have to be rather convoluted – check for a CALL/RCALL instruction, check for a CALL to a pointer, check for interrupt flags (there are tons of them) and decided whether a low or high priority interrupt has triggered…
Dead end. I wondered what was the idea of the chip designer when implementing the stack overflow mechanism. I found it useless, done this way, and it could have worked fine with some additional circuitry – it needed just to save the target address and it would have been great.
Back to paper and headache.
I did some search on internet and found another compiler (a bit too experimental to be employed in an industrial project) that save the return address from the hardware stack into the software stack. Doing this, the hardware stack never grew over 3 entries. Good! But the MCC18 compiler is really basic, it doesn’t allow this technique, nor it allows the programmer to specify user defined prologue/epilogue for functions.
At this point occurred to me that the stack explosion had always been during interrupt time. So, I thought, I may save the stack when I enter the low level interrupt and the stack level is over a certain threshold, and then restore everything before leaving the interrupt.
The implementation was quite straightforward (almost flawless, if you pardon my ubiquitous off-by-one bug). The only problem is that sometimes the low level interrupt incurs in a bad penalty of copying some 100 bytes from the stack into RAM in order to avoid the stack explosion.
But it makes a good war story.

messed up startup

  /* we'll make the assumption in the following code that these statics
   * will be allocated into the same bank.
   */

When I first saw the this comment in the MCC18 C startup code some time ago, it sent a chill down my spine. But promptly I recovered, after all, this is one of the most widely used MCU on the planet and this is the preferred development environment. If they did it this way, they would have good reasons.

Then I wasted a whole afternoon tracking a problem in initialization, only to discover that the assumption had been silently broken by the linker and the code ceased to properly work, wreaking havoc in initialization of the static data.
It is beyond my understanding how could you, while professionally doing your job, base a core component (the startup code) on such assumption without either being sure that the assumption is always correct or there is an automatic detection that reports when the assumption no longer holds true.
Objectively it would have been fairer to provide a less efficient version safely and always working leaving to the eager programmer the task of crafting a high performance version if needed, rather than vice-versa.
Valentino, in comments to a previous post, asked me if I have data about difference in pricing among PIC and other MCUs. Well, not handy, but the difference is really small (my guess is 0.5€ to 1.5€) and it doesn’t really make sense to undergo all these troubles unless your volumes are filthy high.

Pic Indolor

Some time ago, there was a tv-ad about a syringe named “Pic indolor” (I guess, if not clear, it could be translated as “Pic painless”). Fast forward some decades, long is gone that ad – now Pic, to me, is only Pic painful, the regrettable MCU from Microchip that I am so unwillingly forced to use in my daily job. I already wrote about it, but there are still complains.
The current device I am working on sports quite a comprehensive set of complex features that are expected to run at once:
• proprietary field bus communication with failure detection and avoidance;
• distributed monitoring;
• Usb communication;
• Graphical User Interface, with status and menus;
• Audio (analogical, thanks god);
We have the top range Pic 18f, meaning 128k of program memory and about 3.5k of Ram (or, as Microchip engineers meant us to say, general purpose registers or shortly GPR).
Pic is renowned for its code density, or, to put it better is known for the lack of thereof. The Harvard architecture makes things even worse. In fact there isn’t such a thing as a generic pointer. The good ol’ void* is not. Pointers have to be differentiated in Program Memory pointers (24 bit wide) and GPR pointers (16 bits wide). But the difference does not end here: it goes down to the assembly level – different instructions and registers have to be used whether you want to access Program Memory or GPR. That means that the same algorithm, that can be coded in C with the help of some macro tricks, has to be translated in two copies or into a combinatorial explosion of copies if the pointers involved are more than one. A perfect example of this nightmare is in the standard library that comes with Microchip C Compiler (MCC18) – take a strcpy and you will find four versions since you may want to copy from ram to ram, ram to rom, rom to rom or rom to ram. That is annoying only up to the point where you run out of memory. At this point you can no longer afford the flexibility.
That was my case for image blitting functions – I left open the chance to copy images either from Gpr or Program Memory. Now that chance is gone.
The troubles do not end here.
Pic 18f architecture provides two levels of interrupts – high and low priority, since the name tells it all, I will not insult your wits by explaining it.
High level interrupt has some hardware facilities to save and restore non-interrupt context, but we can not even think of use it since it is strictly dedicated to the proprietary field bus driver. This ISR has some hard time constraints needing a run every bunch of microseconds, but this is another story.
So we use the low priority interrupt vector for anything else, most notably timers.
Since we are a gang of C programmers we try to stay away as much as possible from assembly. That’s fine, but writing interrupt code on Pic18 with MCC18, comes with a hefty price.
On entering the interrupt the C language run time support dumps 63 bytes of context into the stack. Given that, if you don’t want to incur in extra penalties, the stack size is 256 bytes, CRT eats up one quarter of the stack.
To be fair it is not just MCC18 fault, it is more how the PIC architecture has been designed – rather than having real CPU registers and operations that work on those registers, PIC has memory mapped hardware registers that both implements CPU registers and addressing modes.
For example, when saving the context you have to save:
• FSR0, FSR2 indirect addressing into RAM registers;
• TBLPTR indirect addressing into Program Memory;
• TABLAT indirect addressing into Program Memory read value,
• PRODH, PRODL multiplication operands.
By comparison you save the whole Z80 context in 20 bytes, but if you restrict code to not use the alternate register set, then you just save 12 bytes moreover you don’t have dumb limitation on the stack size.
Well enough with digression. What is in my ISR?
There is an interrupt source detection routine that calls the specific ISR for the occurred interrupt. If the specific ISR is a timer tick, the timer list is swept and triggered timers are notified by using a deferred procedure call.
That’s another piece of code I am proud of having written. Basically rather than performing callbacks from within the interrupt callbacks you just register your callback with your defined arguments and let a handler to perform the call later from non interrupt context.
This has the main advantage that the callback code can take its time to do what it is supposed to do since interrupts are enabled. Also the called back code could mostly ignore interrupt re-entrance problems since it is always called synchronously with the non-interrupt main loop.
This proved to be also a life saver in this occasion so that stack doesn’t grow too much in interrupt adding to those 63 bytes. To ease the impact I moved all the timer code from interrupt to non-interrupt via deferred procedure call.
This bought me some oxygen, but still the application was suffocating in limited amount of stack.
I turned my attention to display driver and its functions. Their weight on the stack was considerable and basically they don’t need to be re-entrant. Consider masked blit aside from offset registers, coordinates, read and write value you have one pointer to data, one pointer to mask and, in my case two pointers to the previous data and mask line, that’s quite a lot of stuff living in stack space.
With a deep sense of sadness I moved all those auto variables into the static universe and almost by magic I solved both the stack problem and freed enough Program Memory to fit in the memory space even without optimization.
The emergency alert just went off, but I am not sleeping relaxed sleeps because I know that it is just a question of time, soon or later, well before the end of the project, the problem will rise its ugly head again.
The cost impact on the project is unlikely to be light…

Managing Project

The current project I am working on could be considered a medium size. It involves 5 software engineers and 1 hardware engineer for about 8 months. I contributed to the planning of the software components, but I am quite critical about my skill in predicting future. Considering the pressure from top and middle management to complete the project on a given date I would take my own planning prediction with great care. So I dared to ask the project manager if he was doing some sort of risk assessment. His answer – “What the heck! If I had to do _even_ risk assessment then I would have no time at all for anything”.

Harmful + Evil = RAII for C

Some days ago I read an article about goto heresy that triggered me to write about my personal experience with the infamous goto instruction in C. The only reason I found for employing goto in C was error handling. Valentino provided me with a good idea on which I elaborate a bit. Thanks to this idea you can achieve a passable approximation of RAII in C.

It is not free (beer sense) as it is in C++ or other languages that support automatic object destruction, but if you stick to a set of IMHO acceptable conventions you should be happy with it. Shouldn’t these conventions fit for you, you may easily bend the solution to satisfy your taste.

Before delving into the technical description of the idea I am going to list the conventions that are requested for the idea to work.

First-class names have to be defined with the typedef instruction. E.g.

typedef struct
{
    int x;
}
Foo;

Then each class needs a constructor named like the class name with a trailing “_ctor”. In the same way, the destructor has a trailing “_dtor”. The first argument of the constructor is a pointer to the object to construct. Moreover, the constructor returns true if the operation has been successful or false in case of construction failure. It is up to the constructor to clean up in case of failure and not to leak any resources.

In the same way, the destructor has a single argument – the pointer to the object to destruct. By the way, by constructing I mean to receive a chunk of raw memory and turn it into a well-formed, invariant-ready, usable, and valid object. It has nothing to do with memory allocation – memory is provided by the code that calls the constructor. The destructor does the opposite – takes a valid object and by freeing the associated resources makes it a useless bunch of raw bytes, ready to be recycled by someone else. Now the idea is simple (as most of them after you know) – you need a way to keep track of what you construct so that when an error occurs you can go back and call destructors for each object already built. Since you don’t know how many objects are going to be constructed the data structure that fits best is the linked list. And, if you are clever enough you may avoid the dynamic allocation at all by employing cleverly crafted node names. When an object is successfully built a node of the list is created. Inside the node, the pointer to the built object is stored along with the pointer to the destructor. You know which is the destructor because you have the object type. When a constructor fails the execution jumps (via a goto) to the error handling trap. The trap simply sweeps the linked list and processes each node by calling the destructor on the object. Thanks to the C preprocessor the implementation is not so convoluted.

#define RAII_INIT   typedef void DtorFn( void* );  
                    struct DtorNode                
                    {                              
                        DtorFn* dtor;              
                        void* object;              
                        struct DtorNode* next;      
                    }                              
                    * dtorHead__ = NULL

#define RAII_CTOR( x__, T__, ... )      
    RAII_CTOR_WITH_LINE( __LINE__, x__, T__, __VA_ARGS__ )


#define RAII_CTOR_WITH_LINE( L__, x__, T__, ... )  
    struct DtorNode dtor_##T__##_##L__;            
    if( T__##_ctor( x__, __VA_ARGS__ ) )            
    {                                              
        dtor_##T__##_##L__.dtor = (DtorFn*)T__##_dtor;      
        dtor_##T__##_##L__.object = x__;            
        dtor_##T__##_##L__.next = dtorHead__;      
        dtorHead__ = &dtor_##T__##_##L__;          
    }                                              
    else                                            
    {                                              
        goto failureTrap__;                        
    }

#define RAII_TRAP                                  
    failureTrap__:                                  
        while( dtorHead__ != NULL )                
        {                                          
            dtorHead__->dtor( dtorHead__->object );
            dtorHead__ = dtorHead__->next;          
        }

RAII_INIT the mechanism by defining the type of the linked list node and the pointer to the head of the list. Note that a single link list is enough since I want to have FIFO behavior (the first constructed object is the last to be destroyed). Also, the name of the type will be local to the function where this macro will be instantiated, therefore there won’t be a collision in the global namespace.

RAII_CTOR macro is used to invoke an object constructor. The real work is done by the RAII_CTOR_WITH_LINE, which accepts the same arguments as RAII_CTOR plus the line where the macro is expanded. The line is needed to create unique node identifiers within the same function.

RAII_CTOR needs the name of the object type in order to be able to build the name of the constructor and the name of the destructor. From this information, the macro is able to call the constructor and add a node to the destruction list if successful or jump to the destructor trap if the constructor fails.

RAII_TRAP is the trap, to be located at the end of the function. It intercepts a constructor failure and performs the required destruction by scanning the list.

In order to use the macros you lay out the function according to the following canvas:

bool f( /* whatever */ )
{
    RAII_INIT;
    // some code
    RAII_CTOR( ... );  // one or more ctor(s)
    return true;    // everything was fine

    RAII_TRAP;      // code below is executed only in case of error.
    return false;
}

As you see the trap performs the destruction, but leaves you the space to add your own code (in the example the “return false;” statement).

So far so good, but you may argue that memory allocation and file open/close already have their conventions set in the standard library that doesn’t fit my macro requirements.

Don’t worry, it is quite straightforward to hammer malloc/free and fopen/fclose in the _ctor/_dtor schema. It is as simple as:

#define malloc_ctor(X__,Y__) (((X__) = malloc( Y__ )) != NULL)
#define malloc_dtor free

#define fopen_ctor(X__,NAME__,MODE__)   (((X__) = fopen( NAME__, MODE__ ))!= NULL )
#define fopen_dtor fclose

Here is an example of how the code that employs my RAII macros could look:

bool
f( void )
{
    RAII_INIT;

    Foo foo;
    FILE* file;
    void* memory;

    RAII_CTOR( memory, malloc, 100 );
    RAII_CTOR( file, fopen, "zippo", "w" );
    RAII_CTOR( &foo, Foo, 0 );

    return true;

    RAII_TRAP;
    return false;
}

This code has some great advantages over the solutions I presented in my old post. First, it has no explicit goto (the goto is hidden, as much as it is in any other structured statement). Then you don’t have to care about the construction order and explicitly write the destructor calls.

Though there are some drawbacks. First, the linked list has an overhead that I don’t think the optimizer will be able to avoid. The space overhead is 1 function pointer and 2 data pointers (plus alignment padding) for each constructed object. This space is taken from the stack, but it is completely released when the function returns.

The code requires a C99 compliant, or, at least a compiler that allows you to declare variables anywhere in the code (and not just at the block beginning). I think that the function pointer and argument pointer juggling are a bit on (or maybe beyond) the edge of the standard compliance. I tested the code on a PC, but maybe it fails on more exotic architectures.

So, what do you think?

Considering Goto Harmful, but…

Since I started programming in C until a few months ago I religiously practiced the rule “Don’t use goto” (totaling for about 23 years of abstinence). I remember I was puzzled at first – coming from BASIC programming, I hardly believed you could get along without the infamous instruction. To change my habits I took this as a challenge, and in a short time I was able to get rid of the evil statement.
In practice I was helped by a bunch of C statement that are basically disguised goto instructions: break, continue and return.
Break and continue allows you to jump out of a loop or at the next iteration; while return is a jump out of the current function.
Single exit point (i.e. just one return per function) is often preached as a Right Thing, but when programming in C, single exit fights madly with error management, forcing you either to deeply nest conditionals or to add boolean variables with the sole purpose of skipping code in case of error.
Amiga was the first computer I programmed in C, it was an advanced machine for those time, but experimental in many ways. For example Amiga operating system provided you with full multitasking capabilities, but the hardware lacked of an MMU therefore no protected memory was in place. This forced the programmer to be very careful about error conditions – one unhandled error and the entire system could be nuked by a single failing program.
That’s probably why I have been always attentive to error handling and graceful exit.
It was back then that I start using the idiom:

bool ok1;
bool ok2;
bool ok3;

ok1 = f1();
ok2 = f2();
ok3 = f3();

if( ok1 && ok2 && ok3 )
{
    // f1(), f2() and f3() returned ok.
}

if( ok1 ) free1();
if( ok2 ) free2();
if( ok3 ) free3();

This helps to avoid some nesting, but fails in tracking which function succeeded and which didn’t. That could be fine in some situation, but not in others. For example if you have to free some resources allocated in f2(), you have to know if f2() succeeded or not.
Conversely, the idiom below:

bool ok1;
bool ok2;
bool ok3;

ok1 = f1();
ok2 = f2();
ok3 = f3();

if( ok1 && ok2 && ok3 )
{
    // f1(), f2() and f3() returned ok.
}

if( ok1 ) free1();
if( ok2 ) free2();
if( ok3 ) free3();

Performs proper cleanup, but fails to capture that f2() has to be executed if, and only if, f1() succeeded.
Then I went the C++ way for several years and gained a markedly object oriented approach.
Using C++ you don’t have to worry much about these details if you happen to use the RAII idiom. That is, automatic object (i.e. local instances) gets automatically destroyed when the scope is left regardless of the reason that causes the execution to leave the scope.
In other words, if a function fails, be it with an exception or by reporting a specific error and triggering a return, objects that were built are destroyed, leaving the system in a good, non-leaking state.
Fast forward some years I am back to C programming with a heavy legacy of object oriented approach. This means that I try to design modules in an Object Oriented way – modules define classes, each class has one constructor that prepare the instance for usage. Each class also has one destructor (that may be empty, but this is an implementation detail, so if it changes in the future you don’t have to change the calling code).
This is the setting were the C error management issue arose again. I want to mimic a C++-like behavior so that when in the constructor there are 3 “sub-objects” to construct I want that proper clean up (i.e. destructor calls) are invoked in case of error.
If you follow a strictly structured approach (without exception support), you get a very convoluted code:

if( f1_ctor() )
{
    if( f2_ctor() )
    {
        if( f3_ctor() )
        {
            // succesfull
            return true;
        }
        else
        {
            f2_dtor();
            f1_dtor();
        }
    }
    else
    {
        f1_dtor();
    }
}
return false;

The lack of “fall through” semantic forces you to duplicate code and therefore makes the coding and the maintenance more error prone. In fact, suppose you have to add a third call f0_ctor() that must be called before f1_ctor(). Then you have to change nearly everything, indentation included.
Time to reconsider my mind framework. I would need something that selects a portion of “destructor” sequence. Something like a switch with fall through:

progress = 1;
if( f1_ctor() )
{
    progress = 2;
    if( f2_ctor() )
    {
        progress = 3;
        if( f3_ctor() )
        {
            progress = 0
        }
    }
}
switch( progress )
{
    case 0:
        return true;
    case 3:
        f2_dtor();
        // fall through
    case 2:
        f1_dtor();
        // fall through
    case 1:
        return false;
}

This can do, it is somewhat error prone when writing and/or changing the code. If you duplicate one of the progress codes you get a wrong cleanup that can go undetected.
Moreover it doesn’t seem to add much to the goto-based error management:

if( !f1_ctor() )
{
    goto error1;
}
if( !f2_ctor() )
{
    goto error2;
}
if( !f3_ctor() )
{
    goto error3;
}
return true;

error3:
    f2_dtor();
error2:
    f1_dtor();
error1:
    return false;

This notation is more terse (thus more readable) and appears to be more robust than the previous ones.
So why should I refrain from the goto statement in this case? There isn’t any good motivation.
I don’t want to provide any sort of “free for all” authorization in the wild usage of goto instruction. On the contrary my claim is that first you have to become of age without using goto (i.e. write programs for at least 18 years), practice Object Oriented Programming, carefully deal with error handling, then if you find yourself in a language lacking of a suitable error management, you may use goto… only if everything else is worse.

Z80 vs. PIC

Yesterday I wrote some lines of PIC assembly code to manage interrupt service routine so that I can select from C the code to execute when an interrupt occurs. Just to give you an idea of the pain, I will show you a comparison between a Z80 (1971) and a PIC18 (2002) in an indirect call. Let’s say that you want to jump at a program address stored in two bytes at address TL and TH.

Z80 PIC18
    jr L1
L2: ld hl,(TL)
    jp (hl)
L1: call L2

 

   bra L1
L2 movff TH,PCLATH
   movlb bank(TL)
   movf TL,W,B
   movf PCLAT
L1 rcall L2

 

Z80 routine is 9 bytes long while PIC18 spreads over 14 bytes. When it comes to execution times things are not so bad for PIC18 – 36 machine cycles compared to 53 of the Z80. My guess is that a 2002 architecture involves a pipeline that allows the CPU to crank out an instruction for machine cycle. In fact modern incarnations of the Zilog CPU have a revised architecture that runs 4 times faster or more than the original Z80.