PIC 18F and its stinking stack

During the past week PIC 18f continued to be my main headaches supplier. I think I already wrote enough about the messiness of the Harvard architecture combined with “modern” languages such as C. Well, since two separate address spaces didn’t seem enough, the chip designers opted to add a third for call return stack. The place where the CPU stores the return address before jumping to a subroutine is into a distinct address space… that’s not so addressable – since you can read and write only the topmost location.
Being our model the top of the range, it features 31 levels in the stack. That means that you can sub-call no more than 31 times without breaking the stack (or, you can’t compute factorials greater than 31! :-)).
Well 31 is a reasonable amount, or at least I thought so. Then we ran out of Program Memory, so we enabled the compiler space optimization. I must admit that these optimizations are quite effective being able to trim down the size of your binary of some 30%. Most of this percentage is achieved by the so-called “procedural abstraction” optimization. This procedure factorize common sequence of assembly into sub-routine. By repeatedly applying the technique, more and more blocks get extracted and replaced by a CALL instruction.
The side effect is that this technique has a dramatic impact on the stack – it may doubles the number of nested calls.
In our case the stack exploded when a deeply nested graphic routine was interrupted by the low-level interrupt that in turn was interrupted by the high level interrupt.
I had the terrible “game-over” feeling – I had to use the optimization to fit in the memory, but I had no control over the nesting of the calls. In fact every action I could take to soften the nesting would be undone automatically by the optimizer.
I don’t like to give up, so I fired up acrobat reader on the datasheet trying to dig a way out of troubles.
PIC 18f datasheet about call return stack isn’t terribly clear, so I had to read it several times. Eventually I concluded that stack overflow and underflow have two kinds of handling selectable once for all by configuration bits (sort of fuses you set when programming the device). In the first way when the execution attempts to push the 31st entry in the stack, just flips a flag. Next pushes just overwrite the entry. I thought hard about this, but I found no use whatsoever.
Next mode provides a sort of interrupt. On the 31st push the execution jumps to 0x000000 (the reset vector) with the overflow condition flagged.
Woha, I realized, I could trap this address and check for the overflow before handing the execution flow to the standard firmware. If I am here for an overflow, I could dump the hardware stack (one entry at time, it is possible) into somewhere in RAM and continue the execution where…
Exactly, where?
The CPU discards the target address of the call, it just pushes the return address into the stack. Well I could peek around the return address to see whether a CALL either relative or absolute has been issued and recover the target address.
I really enjoyed the idea, but the more I though of it, the more I found critical cases. What if the jump was caused by a call to a pointer? The PIC code for such an addressing is all but straightforward. Maybe I could detect the call pattern and perform some recovery.
Eventually I found the real killer of my idea in interrupts. In fact the call return stack is shared for normal execution and interrupts. The code for recovering the target address would have to be rather convoluted – check for a CALL/RCALL instruction, check for a CALL to a pointer, check for interrupt flags (there are tons of them) and decided whether a low or high priority interrupt has triggered…
Dead end. I wondered what was the idea of the chip designer when implementing the stack overflow mechanism. I found it useless, done this way, and it could have worked fine with some additional circuitry – it needed just to save the target address and it would have been great.
Back to paper and headache.
I did some search on internet and found another compiler (a bit too experimental to be employed in an industrial project) that save the return address from the hardware stack into the software stack. Doing this, the hardware stack never grew over 3 entries. Good! But the MCC18 compiler is really basic, it doesn’t allow this technique, nor it allows the programmer to specify user defined prologue/epilogue for functions.
At this point occurred to me that the stack explosion had always been during interrupt time. So, I thought, I may save the stack when I enter the low level interrupt and the stack level is over a certain threshold, and then restore everything before leaving the interrupt.
The implementation was quite straightforward (almost flawless, if you pardon my ubiquitous off-by-one bug). The only problem is that sometimes the low level interrupt incurs in a bad penalty of copying some 100 bytes from the stack into RAM in order to avoid the stack explosion.
But it makes a good war story.


E’ la prima volta che ci capita ed è stato bello, ci ha lasciato senza parole, ma con una bellissima sensazione di tenerezza. Oggi è il nostro anniversario di matrimonio, e Juan e Mariana ci hanno fatto una bellissima sorpresa nel poco tempo che c’è tra il rientro in casa e la cena.Juan, questa mattina ha preparato un biglietto di auguri, mentre Mariana si è messa a decorare la tavola con le candele e i segnaposti a forma di cuore. Poi, mentre la pasta cuoceva, si sono chiusi in cameretta intimandoci di non entrare. A pasta già in tavola, gli abbiamo dato ben 10 secondi per completare la sorpresa. 1 (corr) 2 (nnguardr) 3 (spost) 4 (torn) 5 (attac) 6 (stacc) 7 (quilì) 8 (ecco) 9 eeee… 10.
Oh, ma come hanno fatto così in fretta a gonfiare e legare insieme così tanti palloncini e appenderli in cucina?! E i palloncini numerati riportano 14… come i nostri anni di nozze.
Grazie bimbi!

messed up startup

When I first saw the this comment in the MCC18 C startup code some time ago, it sent a chill down my spine. But promptly I recovered, after all, this is one of the most widely used MCU on the planet and this is the preferred development environment. If they did it this way, they would have good reasons.

Then I wasted a whole afternoon tracking a problem in initialization, only to discover that the assumption had been silently broken by the linker and the code ceased to properly work, wreaking havoc in initialization of the static data.
It is beyond my understanding how could you, while professionally doing your job, base a core component (the startup code) on such assumption without either being sure that the assumption is always correct or there is an automatic detection that reports when the assumption no longer holds true.
Objectively it would have been fairer to provide a less efficient version safely and always working leaving to the eager programmer the task of crafting a high performance version if needed, rather than vice-versa.
Valentino, in comments to a previous post, asked me if I have data about difference in pricing among PIC and other MCUs. Well, not handy, but the difference is really small (my guess is 0.5€ to 1.5€) and it doesn’t really make sense to undergo all these troubles unless your volumes are filthy high.