Using GCC with the TI Stellaris Launchpad – Newlib

I left off in my last post with compiling programs using GCC and and flashing them to the Launchpad, now I’m going to discuss some of whats going on in the back ground when we try to use the ever so popular printf and malloc functions from the C Standard Library. The implementation of the libc were using is provided by newlib (inplace of the glibc) and most everything will work right out of the box for us, but some functionality requires a little extra work. For example, if you tried to use any function that allocates memory or prints to standard out you would encounter linker errors when building your project. The functions the compiler will be complaining about are known as System Calls and are the glue that connect libc to the host platform.

What I’m hoping to do here is explain how to implement a few of the system calls that will interact with the memory and UARTs on our Launchpad, doing so will give us a fully mostly functional system allowing us to write programs just like you would on your desktop. The end goal is to be able to compile and run the defacto C intro program which will exercise our new found heap and stdio skills.

#include <stdio.h>
int main()
{
    printf("Hello, World\n");
    return 0;
}

More fun with Linker Scripts

Before we get to writing system calls we need to set up the heap and stack so that functions have a place to allocate memory. Most of the work at this point is done in the linker script that gets passed to ld at compile time. We’re going to modify the script provided by Stellarisware to provide symbols that define the beginning/end of the stack/heap. Below is an excerpt from the script I’ve been using.

_stack_size = 4K;                                               //1
...
SECTIONS
{
    ...
    .bss : AT(ADDR(.data) + SIZEOF(.data))
    {
        _bss = .;
        *(.bss*)
        *(COMMON)
        _ebss = .;
    } > SRAM
    _heap_bottom = .;                                           //2
    _heap_top = ORIGIN(SRAM) + LENGTH(SRAM) - _stack_size;      //3

    _stack_bottom = _heap_top;                                  //4
    _stack_top = ORIGIN(SRAM) + LENGTH(SRAM);                   //5
}

What we’re doing here is defining 4 symbols heap_bottom, heap_top, stack_top, and stack_bottom that define the start and end of our data structures. I’ve included a picture depicting the stereotypical memory layout overlayed with the symbols you see in the script.

  1. Define the size of the stack
  2. Define the symbol that locates the bottom of the stack.
  3. This is getting the address of the end of SRAM and subtracting 4K bytes down from that giving us the top of our heap and the bottom of our stack.
  4. Top of the stack = bottom of the heap.
  5. This will calculate the end address of SRAM, which is also the top of our stack.

Operation of the Stack and Heap

As several commentors pointed in on my last post, I made a mistake in the linker script I posted by defining the stack “below” the heap, which is opposite of the standard stack on top, and heap on bottom. Alex made this comment about the operation of malloc. I wasn’t able to reproduce this behaviour, but it is something to look out for.

… As I remember, I did this switching once, and default malloc understood that it ran out of memory (it checks that the stack pointer > heap pointer) and returned 0.

To avoid going down the rabbit hole of explaining how to gauge stack/heap utilization, I’m just going to say that allocating 4K for a stack should be plenty for anything you would do on a platform of this size. But one thing to note is that if you build your application in debug mode (gcc flag -g), the compiler will try to keep all your variables in the stack vs. registers making it easier to inspect the operation of your program at runtime. Depending on the size of your application, this could mean that you need to allocate a larger stack. (I can’t find my source for this, but I’m 99% sure its true, if anyone has a link to some relevant documentation please leave a comment below.)

More than one way to do it

If you took a look through the original startup_gcc.c file from the Stellarisware examples you’ll see that the folks at TI created a 64 word array and used that as the stack. The processor we’re using has 32KB of Ram available for use, and allocating only 64 words seemed a little conservative to me. Also, if you’ve ever worked on other embedded systems or do in the future, defining the stack from the linker script is standard procedure.

Note: In typical applications you wouldn’t define start and end symbols for the stack and the heap, I’m just trying to illustrate what’s happening here. Normally you would define a symbol to indicate where the end of memory would be, _end, as well as the size of the stack which would be sufficient to completely describe our memory layout.

System Calls

Namespaces

Throughout the rest of this article you’ll be seeing references to system calls like read and write, you will also see references to similarly named _read and _write functions. The reason for the distinction between these two naming conventions is to allow Board Support Package (BSP) developers to develop board specific system calls without stepping on the toes of pre-existing functions. In this case, functions with the preceding _ are considered to be in the user name space, and functions without are considered to be in the C name space (yes, even though none of these functions are standard C functions, they’re considered to be in the C namespace).

This was a design decision made by the newlib developers and when push comes to shove a call to write() will end up calling _write(), and the BSP developer is more than welcome to cut out the middleman and just implement write, though it will require some fancy foot work when invoking the linker. But, in the spirit of being in Rome, were going to do as they do and follow convention and implement our functions with the leading _.

More information can be found here

Re-entrancy

Another small thing that deserves its own blurb here is there are two sets of system calls that exist, the ones mentioned previously _sbrk, and the re-entrant set _sbrk_r. For our purposes we don’t need to worry about the re-entrant versions, they exist to allow multiple threads to use the same set of system calls by allocating a per thread data structure that contains the all the global variables normally used by stdlib (ex. errno).

The way its setup for us is that a call to write will call _write_r which will call _write, so if you try to compile a program that is missing a _write implementation you will see and error about a missing reference to _write in _write_r. Try and say that 10 times fast.

Moving On

At this point if you tried to compile our program you would be greeted by the site of some link errors. The functions that it will be complaining about are part the System Call gang, _sbrk, _read, _write, _open, _close, etc....

$ make
CC         src/proj0.c
CC         src/stellariscommon.c
CC         src/startup.c
CC         src/syscalls.c
LD         .obj/proj0.out
undefined reference to `_lseek'
undefined reference to `_isatty'
...
undefined reference to `_write'
undefined reference to `_read'

To get started we’re going to implement skeleton versions of these, if you take a look at the SourceWare site (maintainers of newlib) you can find a sample implementation of all these system calls. They don’t have to do anything at this point, just exist to get the linker off our backs.

caddr_t _sbrk(unsigned int incr){ return 0; }

int _close(int file){ return -1; }

int _fstat(int file){ return -1; }

int _isatty(int file){ return -1; }

int _lseek(int file, int ptr, int dir){ return -1; }

int _open(const char *name, int flags, int mode){ return -1; }

int _read(int file, char *ptr, int len){ return -1 }

int _write(int file, char *ptr, unsigned int len){ return -1; }

_sbrk

The first function I’m going to talk about is _sbrk which is the workhorse behind malloc, calloc, realloc, and even parts of printf (see note). _sbrk is the interface between all of these functions and the heap we previously setup in our linker script. Below is a copy of the default implementation from the Sourceware that has been modified to use our symbols.

static char *heap_end = 0;
extern unsigned long _heap_bottom;
extern unsigned long _heap_top;

caddr_t _sbrk(unsigned int incr)
{
    char *prev_heap_end;
    if (heap_end == 0) {
        heap_end = (caddr_t)&_heap_bottom;            //1
    }
    prev_heap_end = heap_end;                         //2
    if (heap_end + incr > (caddr_t)&_heap_top) {      //3
        return (caddr_t)0;                            //4
    }
    heap_end += incr;
    return (caddr_t) prev_heap_end;                   //5
}
  1. This gets executed the first time through and sets our static heap_end pointer the bottom of the heap.
  2. We want to keep track of the current heap_end value because it’s the value we return to the caller.
  3. This checks to make sure that we don’t allocate more memory that we have access to. If we allocated space on the heap past _heap_top we could possibly be interfering with the stack.
  4. If allocated this new block of memory pushes us into the stack space, return a null pointer.
  5. We return the previous value of heap end, since that is the starting address of newly allocated memory block.

Note: At first I was a little confused about what printf could possibly use the heap for but come to find out its used to allocate temporary space for creating string representations of floating point numbers (which as I recently learned, a somewhat involved process outside the scope of this blog, but if you’re interested, do a search for ecvt or fcvt).

malloc

While in the process of putting these pieces together and debugging some issues I was having with _sbrk I was messing around with malloc to make sure everything was working correctly. In doing so I made an unexpected observation, right after entering main I called my PrintMemoryLayout function and noticed that there was already some memory allocated on the heap, (1792 bytes worth).

_heap_bottom   = 0x20000900
_heap_end      = 0x20001000 : usage: 1792
_heap_top      = 0x20007000
Sizeof(heap)   = 26368 bytes

Alright, no big deal, I then made a call to malloc(512), expecting the heap_end pointer to move by 128 (512 bytes = 128 words), but it didn’t. So I tried again, this time using malloc(1024) and the heap_end pointer jumped to 5888 bytes of usage, a change of 4KB. I made another call to malloc(1024) and the heap_end pointer didn’t move.

_heap_bottom   = 0x20000900
_heap_end      = 0x20002000 : usage: 5888
_heap_top      = 0x20007000
Sizeof(heap)   = 26368 bytes

I’m not really sure whats happening here, but I suspect its using a memory pool/chunking in the background to reduce fragmentation, but I haven’t had time (or inclination) to investigate this yet so if anyone has any insight on what’s happening, please share.

_write

Cool, so at this point we can allocate memory and compile our program, but if you flash it to the Launchpad you won’t see anything on the UART. What gives? Well, I happen to know that our call to printf will eventually call write to output to stdout, and right now our _write function is just returning -1, which isn’t very useful.

Below is a basic implementation of _write that will output character to UART0 (Assuming that my SetupStdio() or equivalent routine has been called).

int _write(int file, char *ptr, unsigned int len){
    unsigned int i;
    for(i = 0; i < len; i++){
        MAP_UARTCharPut(UART0_BASE, ptr[i]);
    }
    return i;
}

Now, if you rebuild our test program you should see this coming across the wire.

$ screen /dev/ttyACM0 115200
Hello, World!

Wrapping Everything Up

So we have a working copy of printf and its friends fprintf and sprintf, we also have a working copy of malloc, but any self-respecting embedded engineer wouldn’t touch that with 10 foot pole. At this point you could leave everything as is, or, continue implementing the other syscalls to do something meaningful. I’m currently working on using open to set up the different UARTs by passing in strings like “UART0”, “UART1”, etc to indicate what I want to open. I’ll post this code to the repo later this week.

Admittedly, all this is a bit overkill for an embedded system, but when I’m working on a project I like to write and debug as much of it as i can on a PC, and then move it to the embedded device. Having all of this functionality makes it easier to get the business logic out-of-the-way allowing me to get a running prototype that I can always optimize later on.

The Goods

The source code for this post can be found on my Github account.

wget https://github.com/eehusky/Stellaris-GCC/archive/b2-newlib.tar.gz
tar xf b2-newlib.tar.gz
cd Stellaris-GCC-b2-newlib/prog0

Up Next

My next post(s) will be discuss setting up the Hard Floating point unit, doing a hard/softfp/and soft floating point library performance comparison, the CMSIS DSP Library, and changing toolchains.  (I’m probably going to split this one across two posts.)

I’m finding this whole blogging experience to be rather fun and would love to hear any suggestions my readers have for future topics.

This Series

Stellaris-GCC: Intro
Stellaris-GCC: Newlib-SysCalls-Stacks-Heaps-Oh My!
Stellaris-GCC: The Hard FPU
Stellaris-GCC: SD-Cards and File Systems

%d bloggers like this: