This is the fifth installment of a series of posts regarding software programming on PIC18F cpu family. Previous installments: first, second, third and fourth.
Lost in memories
The first problem you are faced with when designing your application is the PIC memory architecture. And saying that this is a problem is like saying that a multi-head nuke missile is a sort of firecracker. You could say that most of the problems you’ll find working with PIC18s can be classified in two sets: those who stem from memory and those who impact on memory.
First problem is size.
PIC 18F if programmed in C (and you want to use C and not assembly to keep you mental sanity for what is left of your life out of the working hours) tends to be quite memory hungry. Code density is low even if compared to the early CPUs developed by mankind.
To make things worse, Harvard Architecture isn’t going to help you – since pointers are implemented differently down into the assembly instructions level whether they point to Ram (file register) or to flash (Program Memory), you will need to code the same function twice (or more). Consider that the standard library strcpy function has four different implementations because of the four combination you get from its arguments (copy ram to ram, ram to flash, flash to ram and flash to flash).
I read about a C compiler that masks these differences away, (with a pointer wrapper, I presume) but according to the website is far from being production level completeness. Also such an approach penalizes execution time when you know where data are located and that their position is not going to change.
If you really need to handle data both in data memory and program memory you can write a wrapper. I need to sequentially read bytes from any source, so I wrote two wrapper classes ByteReader (and ByteWriter with limited functionalities). The additional benefit is that you can adapt the wrapper to read from an external flash memory as well.
When you let your hardware engineers decide which PIC18 to put on board of the device you are developing, take care about some flash memory subtleties.
All PICs 18F can program their own program memory. But there is a sub-family of PICs (18Fx(456)K2x) whose members have a dedicated small (usually 1k or less) data flash attached to the CPU core by a third memory bus.
You may wonder why the Microchip engineers went the chore of adding a third bus and differentiate on chip memory and addressing. Well, they had, indeed, very good reasons:
- program memory can be written one byte at time, but needs to be erased one 4k page at time. 4K on 128k is quite a fraction, but worse, you are forced to juggle with data you want to preserve and considering you don’t have 4k of data memory it is going to be quite a juggle.
- When you write the program memory the corresponding bus is stalled, since this is the bus where the program instructions are taken and there is no instruction cache, the CPU is stalled. Typical erase/write times for a flash memory causes the CPU to stall for 5-10ms, possibly more and that can be a showstopper for a real time application.
If you need persistent storage and there are no PICs with the feature list you need, you may resort to an external flash memory connected either by I2C or SPI.
Anyway, regardless of the application, always beg for the device with the largest memory (i.e. 128k in the current production)
Since you are going to sell billions of devices there will be some pressures about picking a small memory footprint device. Resist! You must not lose this battle! You can argue that PIC with different memory sizes are pin-to-pin compatible and there’s no need to add risks on the development when you can downsize the memory in pre-production or at the first technical review.
128k of program memory may seem a lot for an embedded system, but given the low code density and the optimizer naivete is not that much.
On some devices of the 18F family (the ones with a high pin count) you can extend program memory with an external memory.
For our application we managed to fit everything in the base memory and we used the extra pins to connect a LCD screen (the parallel port and memory bus share the same pins). Also we employed an external flash for data only connected either via an SPI or an I2C depending on the specific device we were developing.
Although it could seem a good idea from a theoretical point of view, having two distinct addressing spaces for program and data, isn’t a good idea and having distinct instructions, with distinct addressing modes makes things ugly.
In fact you do want store data in non volatile memory – initialization data, constants, lookup tables – would be only for you have at most 4k of data memory.
The compiler trying to favor performance over conformance doesn’t help much.
Let’s start from the void pointer (void*). In C this kind of pointer is just a (temporary) container for any pointer. You take any typed pointer and you can convert it into void* and then, when you need it, you have to convert it back to the original type.
With MCC18 you have two main problems stemming from the default storage chosen for pointers. In facts pointers are data memory pointers by default. Void pointer makes no exception and it is two bytes long. The problem is that it cannot host a program memory pointer when you use the large memory model. Large memory model is needed only when you need to access more than 64k of program memory and implies that program memory pointers are 3 bytes long.
The other problem is that, by this convention, the pointed type it is not enough to tell apart program and data memory pointer. That is be it a uint8_t* or a uint8_t const* you cannot tell anything about the memory region the uint8_t is stored.
For this discrimination MCC18 provides two qualifiers: rom and ram; the latter being optional, since by default everything is stored in ram.
On one side I prefer the qualifier approach when compared to an “intelligent” approach where the compiler decides silently where to put what based on “const”-ness or other consideration. In fact I use const qualifier quite everywhere and not just for data stored in program memory.
On the other side having to explicitly provide several versions of the same function is a plain waste of space. I would like having a flexible approach where by default I have generic pointers, handled by the compiler through an adapter layer, and specialized pointers (rom/ram) on demand where performance matters.
Hardware and Software stack
First PICs were really simple processors featuring a couple of levels for the call stack. 18F architecture has 31levels of call stack, thus enabling the CPUs to power medium sized architecture.
At a first glance 31 levels may seem a lot… what the heck, even Windows stack trace rarely spans over such a distance.
That would be fine, but let’s take a closer look at what happens when you run low on memory and you enable all the optimizations.
One of such optimization is called Procedural Abstraction and does a fine work in trimming down the size of the code. This transformation examines the intermediate code (or maybe the assembly code directly) and creates subroutines for duplicated code snippets. Operating at a lower abstraction level than the C source, the optimization has far more opportunities of applying the transformation.
Although clever the optimization has a drawback – it takes out of programmer’s hands the call stack control. This is generally true for every optimizing compiler (e.g.: when the compiler moves a static function into the calling place), but to a much lesser extent. MCC18 is capable of factorizing every bunch of assembly instructions in the middle of every C statement building up C lines by composing several subroutines. A nightmare to debug, a hell to understand in the disassembly listing and a sure way to eat those 31 call stack entries really quick.
I already wrote about how to recover from a stack over/underflow and restart debugging without having to re-program the chip. Now let’s see how to avoid the overflow at all.
First you can decided the stack overflow handling by setting the STVREN configuration bit. Basically you can chose among the “ignore” and the “trap” policies. Unfortunately, as we’re going to see, they are both rather ineffective.
Ignoring stack overflow means that when the limit is trespassed the execution continues jumping at the requested address without pushing the return address into the call-stack.
This means that at the first return instruction, rather than returning to the caller, execution jumps back last non-overflowing call.
Ideally, since overflow flag STKFUL (in STKPTR register) is set on overflow you could think of a function stub that checks this flag on entry. The trouble is that once you are in the called function you have no way to recover the return address since it is lost.
Changing ignore to trapping may seem more promising. In fact when this mode is selected, on limit trespassing the execution jumps to address 0x0 and the STKFUL flag is set. This acts somewhat like a reset, but since the micro state is not reset you could think of it as a trap.
Shamefully, yet again you don’t have a way to recover a return address, so you can’t do a save/restore call-stack.
After headscraping far too long on this problem, I decided to simplify the problem making the assumption that if the stack overflow occurs, it occurs only within interrupts. That makes somewhat sense since for sure interrupts are going to impact on the stack. So I added some code in the low level interrupt for checking the stack pointer against a threshold and saving all the hardware stack in a data memory region. This allows to reset the stack pointer and continue to execute the interrupt code. Before leaving the interrupt the opposite operation is performed.
Ideally it would have been nice to have the C run time support to handle the issue by adding a prologue/epilogue to every function so that hardware stack could have been “virtualized”. It is not much of a pessimization since you have already to handle the software stack for parameters and you may optimize out prologue/epilogue for functions that do not perform subroutine calls.
As of today no industrial C compiler implements this.
Next time I’ll write about Extended Mode.