Using GCC with the TI Stellaris Launchpad – Newlib

I left off in my last post with compiling programs using GCC and and flashing them to the Launchpad, now I’m going to discuss some of whats going on in the back ground when we try to use the ever so popular printf and malloc functions from the C Standard Library. The implementation of the libc were using is provided by newlib (inplace of the glibc) and most everything will work right out of the box for us, but some functionality requires a little extra work. For example, if you tried to use any function that allocates memory or prints to standard out you would encounter linker errors when building your project. The functions the compiler will be complaining about are known as System Calls and are the glue that connect libc to the host platform.

What I’m hoping to do here is explain how to implement a few of the system calls that will interact with the memory and UARTs on our Launchpad, doing so will give us a fully mostly functional system allowing us to write programs just like you would on your desktop. The end goal is to be able to compile and run the defacto C intro program which will exercise our new found heap and stdio skills.

#include <stdio.h>
int main()
{
    printf("Hello, World\n");
    return 0;
}

More fun with Linker Scripts

Before we get to writing system calls we need to set up the heap and stack so that functions have a place to allocate memory. Most of the work at this point is done in the linker script that gets passed to ld at compile time. We’re going to modify the script provided by Stellarisware to provide symbols that define the beginning/end of the stack/heap. Below is an excerpt from the script I’ve been using.

_stack_size = 4K;                                               //1
...
SECTIONS
{
    ...
    .bss : AT(ADDR(.data) + SIZEOF(.data))
    {
        _bss = .;
        *(.bss*)
        *(COMMON)
        _ebss = .;
    } > SRAM
    _heap_bottom = .;                                           //2
    _heap_top = ORIGIN(SRAM) + LENGTH(SRAM) - _stack_size;      //3

    _stack_bottom = _heap_top;                                  //4
    _stack_top = ORIGIN(SRAM) + LENGTH(SRAM);                   //5
}

What we’re doing here is defining 4 symbols heap_bottom, heap_top, stack_top, and stack_bottom that define the start and end of our data structures. I’ve included a picture depicting the stereotypical memory layout overlayed with the symbols you see in the script.

  1. Define the size of the stack
  2. Define the symbol that locates the bottom of the stack.
  3. This is getting the address of the end of SRAM and subtracting 4K bytes down from that giving us the top of our heap and the bottom of our stack.
  4. Top of the stack = bottom of the heap.
  5. This will calculate the end address of SRAM, which is also the top of our stack.

Operation of the Stack and Heap

As several commentors pointed in on my last post, I made a mistake in the linker script I posted by defining the stack “below” the heap, which is opposite of the standard stack on top, and heap on bottom. Alex made this comment about the operation of malloc. I wasn’t able to reproduce this behaviour, but it is something to look out for.

… As I remember, I did this switching once, and default malloc understood that it ran out of memory (it checks that the stack pointer > heap pointer) and returned 0.

To avoid going down the rabbit hole of explaining how to gauge stack/heap utilization, I’m just going to say that allocating 4K for a stack should be plenty for anything you would do on a platform of this size. But one thing to note is that if you build your application in debug mode (gcc flag -g), the compiler will try to keep all your variables in the stack vs. registers making it easier to inspect the operation of your program at runtime. Depending on the size of your application, this could mean that you need to allocate a larger stack. (I can’t find my source for this, but I’m 99% sure its true, if anyone has a link to some relevant documentation please leave a comment below.)

More than one way to do it

If you took a look through the original startup_gcc.c file from the Stellarisware examples you’ll see that the folks at TI created a 64 word array and used that as the stack. The processor we’re using has 32KB of Ram available for use, and allocating only 64 words seemed a little conservative to me. Also, if you’ve ever worked on other embedded systems or do in the future, defining the stack from the linker script is standard procedure.

Note: In typical applications you wouldn’t define start and end symbols for the stack and the heap, I’m just trying to illustrate what’s happening here. Normally you would define a symbol to indicate where the end of memory would be, _end, as well as the size of the stack which would be sufficient to completely describe our memory layout.

System Calls

Namespaces

Throughout the rest of this article you’ll be seeing references to system calls like read and write, you will also see references to similarly named _read and _write functions. The reason for the distinction between these two naming conventions is to allow Board Support Package (BSP) developers to develop board specific system calls without stepping on the toes of pre-existing functions. In this case, functions with the preceding _ are considered to be in the user name space, and functions without are considered to be in the C name space (yes, even though none of these functions are standard C functions, they’re considered to be in the C namespace).

This was a design decision made by the newlib developers and when push comes to shove a call to write() will end up calling _write(), and the BSP developer is more than welcome to cut out the middleman and just implement write, though it will require some fancy foot work when invoking the linker. But, in the spirit of being in Rome, were going to do as they do and follow convention and implement our functions with the leading _.

More information can be found here

Re-entrancy

Another small thing that deserves its own blurb here is there are two sets of system calls that exist, the ones mentioned previously _sbrk, and the re-entrant set _sbrk_r. For our purposes we don’t need to worry about the re-entrant versions, they exist to allow multiple threads to use the same set of system calls by allocating a per thread data structure that contains the all the global variables normally used by stdlib (ex. errno).

The way its setup for us is that a call to write will call _write_r which will call _write, so if you try to compile a program that is missing a _write implementation you will see and error about a missing reference to _write in _write_r. Try and say that 10 times fast.

Moving On

At this point if you tried to compile our program you would be greeted by the site of some link errors. The functions that it will be complaining about are part the System Call gang, _sbrk, _read, _write, _open, _close, etc....

$ make
CC         src/proj0.c
CC         src/stellariscommon.c
CC         src/startup.c
CC         src/syscalls.c
LD         .obj/proj0.out
undefined reference to `_lseek'
undefined reference to `_isatty'
...
undefined reference to `_write'
undefined reference to `_read'

To get started we’re going to implement skeleton versions of these, if you take a look at the SourceWare site (maintainers of newlib) you can find a sample implementation of all these system calls. They don’t have to do anything at this point, just exist to get the linker off our backs.

caddr_t _sbrk(unsigned int incr){ return 0; }

int _close(int file){ return -1; }

int _fstat(int file){ return -1; }

int _isatty(int file){ return -1; }

int _lseek(int file, int ptr, int dir){ return -1; }

int _open(const char *name, int flags, int mode){ return -1; }

int _read(int file, char *ptr, int len){ return -1 }

int _write(int file, char *ptr, unsigned int len){ return -1; }

_sbrk

The first function I’m going to talk about is _sbrk which is the workhorse behind malloc, calloc, realloc, and even parts of printf (see note). _sbrk is the interface between all of these functions and the heap we previously setup in our linker script. Below is a copy of the default implementation from the Sourceware that has been modified to use our symbols.

static char *heap_end = 0;
extern unsigned long _heap_bottom;
extern unsigned long _heap_top;

caddr_t _sbrk(unsigned int incr)
{
    char *prev_heap_end;
    if (heap_end == 0) {
        heap_end = (caddr_t)&_heap_bottom;            //1
    }
    prev_heap_end = heap_end;                         //2
    if (heap_end + incr > (caddr_t)&_heap_top) {      //3
        return (caddr_t)0;                            //4
    }
    heap_end += incr;
    return (caddr_t) prev_heap_end;                   //5
}
  1. This gets executed the first time through and sets our static heap_end pointer the bottom of the heap.
  2. We want to keep track of the current heap_end value because it’s the value we return to the caller.
  3. This checks to make sure that we don’t allocate more memory that we have access to. If we allocated space on the heap past _heap_top we could possibly be interfering with the stack.
  4. If allocated this new block of memory pushes us into the stack space, return a null pointer.
  5. We return the previous value of heap end, since that is the starting address of newly allocated memory block.

Note: At first I was a little confused about what printf could possibly use the heap for but come to find out its used to allocate temporary space for creating string representations of floating point numbers (which as I recently learned, a somewhat involved process outside the scope of this blog, but if you’re interested, do a search for ecvt or fcvt).

malloc

While in the process of putting these pieces together and debugging some issues I was having with _sbrk I was messing around with malloc to make sure everything was working correctly. In doing so I made an unexpected observation, right after entering main I called my PrintMemoryLayout function and noticed that there was already some memory allocated on the heap, (1792 bytes worth).

_heap_bottom   = 0x20000900
_heap_end      = 0x20001000 : usage: 1792
_heap_top      = 0x20007000
Sizeof(heap)   = 26368 bytes

Alright, no big deal, I then made a call to malloc(512), expecting the heap_end pointer to move by 128 (512 bytes = 128 words), but it didn’t. So I tried again, this time using malloc(1024) and the heap_end pointer jumped to 5888 bytes of usage, a change of 4KB. I made another call to malloc(1024) and the heap_end pointer didn’t move.

_heap_bottom   = 0x20000900
_heap_end      = 0x20002000 : usage: 5888
_heap_top      = 0x20007000
Sizeof(heap)   = 26368 bytes

I’m not really sure whats happening here, but I suspect its using a memory pool/chunking in the background to reduce fragmentation, but I haven’t had time (or inclination) to investigate this yet so if anyone has any insight on what’s happening, please share.

_write

Cool, so at this point we can allocate memory and compile our program, but if you flash it to the Launchpad you won’t see anything on the UART. What gives? Well, I happen to know that our call to printf will eventually call write to output to stdout, and right now our _write function is just returning -1, which isn’t very useful.

Below is a basic implementation of _write that will output character to UART0 (Assuming that my SetupStdio() or equivalent routine has been called).

int _write(int file, char *ptr, unsigned int len){
    unsigned int i;
    for(i = 0; i < len; i++){
        MAP_UARTCharPut(UART0_BASE, ptr[i]);
    }
    return i;
}

Now, if you rebuild our test program you should see this coming across the wire.

$ screen /dev/ttyACM0 115200
Hello, World!

Wrapping Everything Up

So we have a working copy of printf and its friends fprintf and sprintf, we also have a working copy of malloc, but any self-respecting embedded engineer wouldn’t touch that with 10 foot pole. At this point you could leave everything as is, or, continue implementing the other syscalls to do something meaningful. I’m currently working on using open to set up the different UARTs by passing in strings like “UART0”, “UART1”, etc to indicate what I want to open. I’ll post this code to the repo later this week.

Admittedly, all this is a bit overkill for an embedded system, but when I’m working on a project I like to write and debug as much of it as i can on a PC, and then move it to the embedded device. Having all of this functionality makes it easier to get the business logic out-of-the-way allowing me to get a running prototype that I can always optimize later on.

The Goods

The source code for this post can be found on my Github account.

wget https://github.com/eehusky/Stellaris-GCC/archive/b2-newlib.tar.gz
tar xf b2-newlib.tar.gz
cd Stellaris-GCC-b2-newlib/prog0

Up Next

My next post(s) will be discuss setting up the Hard Floating point unit, doing a hard/softfp/and soft floating point library performance comparison, the CMSIS DSP Library, and changing toolchains.  (I’m probably going to split this one across two posts.)

I’m finding this whole blogging experience to be rather fun and would love to hear any suggestions my readers have for future topics.

This Series

Stellaris-GCC: Intro
Stellaris-GCC: Newlib-SysCalls-Stacks-Heaps-Oh My!
Stellaris-GCC: The Hard FPU
Stellaris-GCC: SD-Cards and File Systems

Using GCC with TI Stellaris Launchpad – A more in depth look

This is the first in a series of posts that will be going over various aspects of using the new Stellaris Launchpad with GCC. This post is going to be a rundown of how the various compiler flags, linker scripts, libraries and drivers work together to give us a working program for our dev board.

A couple of weeks ago the folks over at Recursive Labs posted about using the ARM gcc toolchain to build binaries for the TI Stellaris Launchpad. The directions that RL laid out were pretty straight forward, they made use of the Summon ARM Toolchain (SAT) to get a working copy of GCC and newlib. After that it was a matter of wading through the maze of TI supplied Makefiles to get the proper flags to build our projects.

Multilibs and the Awesomeness of SAT

I’ve never used SAT before to build and ARM toolchain and I have to say that its really fantastic, and the folks that have been working on it are under no certain terms, considered a hero by me. When you build the toolchain it will build it with Multilib support by default, which means it will compile a version of newlib (libc, libm) and libgcc for various ARM processors (Cortex-M0, M3, and our M4).

Not only that, it will build libraries for our Cortex-M4 in both softfp and hard floating point unit flavors so we can bounce back and forth between the two just by changing a gcc flag.

GCC

There are a bunch flags that are being passed to the compiler that at first glance didn’t make a bunch of sense. After a little digging around I’ve found what most out what most of the flags stand for.

CFLAGS += -mthumb                  #Using the Thumb Instruction Set
CFLAGS += -mcpu=cortex-m4          #The CPU Variant
CFLAGS += -mfloat-abi=softfp       #Which floating point ABI to use
CFLAGS += -mfpu=fpv4-sp-d16        #The type of FPU we are using
CFLAGS += -Os                      #Compile with Size Optimizations
CFLAGS += -ffunction-sections      #Create a separate function section
CFLAGS += -fdata-sections          #Create a separate data section
CFLAGS += -MD                      #Create dependency files (*.d)
CFLAGS += -std=c99                 #Comply with C99
CFLAGS += -Wall                    #Be anal Enable All Warnings 
CFLAGS += -pedantic                #Be extra anal More ANSI Checks
CFLAGS += -Dgcc                    #Flag used in driverlib for compiler specific flags
CFLAGS += -DPART_LM4F120H5QR       #Flag used in driverlib for specifying the silicon version.
CFLAGS += -DTARGET_IS_BLIZZARD_RA1 #Used in driverlib to determine what is loaded in rom.

Loader

Another thing that could use some de-mystification is the loader scripts that tell the linker where to put all the data, text, and bss sections of the executable you build when its flashed to the device. There is one included with each of the demo projects and if you open them all up, you’ll notice that they are all the same. So you can pick one and re-use it with each of your projects.

After all of the sources are compiled and you have a stack of object files the next step is to link them together using arm-none-eabi-ld. This is also the step where the stdlib functions get brought into the mix as well. Unlike compiling a normal program there are a couple extra things that need to be done here.

LDFLAGS    += -T blinky.ld              #Path to Linker Script
LDFLAGS    += --entry ResetISR          #Name of the application entry point
LDFLAGS    += --gc-sections             #Tell the linker to ignore functions that aren't used.

Using GCC as the Linker

Normally when you compile a program, gcc will perform a lot of magic for you behind the scenes by compiling and linking your program all in one step. Here we are doing it in three steps, compiling all of the source files, linking all the objects together into and ELF executable, and then copying the relevant sections out of the ELF executable and dumping them in a raw binary format that can be flashed to the device.

If you were interested in skipping the middle step that would be possible but the flags can get kind of messy. Since we need to pass in a few arguments to the linker itself (listed below), we do that by prefacing all the arguments with -Wl,<ARG>. It would look something like this.

arm-none-eabi-gcc $(CFLAGS) -c blinky.c
arm-none-eabi-gcc $(CFLAGS) -c startup_gcc.c
arm-none-eabi-gcc $(CFLAGS) -Wl,--script=blinky.ld -Wl,--entry ResetISR \
    -Wl,--gc-sections -o blinky.out blinky.o startup_gcc.o libm.a libc.a libgcc.a

Is there any advantage to doing this? I don’t think so, but I thought it was worth a blurb to show the difference.

Using StellarisWare’s DriverLib

When you are writing your program and want to use the functionality within driverlib to setup the peripherals you will need to compile driverlib with gcc. After you have built your toolchain and added the bin directory to your PATH variable cd into the Stellarisware root folder and type make. This will go through and build all the projects and driverlib, allowing you to link the function calls into your programs.

With that out of the way, now we’re going to discover why you don’t actually need to use any of it!

ROM, MAP, and Driverlib

One thing that confused me for a bit in using Stellarisware was the difference between calls to driverlib functions being prefixed with ROM_ or MAP_. Confusion that would have been easily remedied if I had bothered to read the manual, but thats no fun. So to save anyone else a trip to the documentation center here is a brief rundown of what’s happening.

The Stellaris Launchpad comes preloaded with a copy of driverlib in ROM so that when you are creating your programs you don’t need to link in a copy of driverlib yourself. What the MAP_ and ROM_ prefixes are meant to accomplish is increase the portability of applications you write for the LaunchPad. Different versions of the hardware will have different subsets of driverlib loaded into ROM.

//Has the defines for the constants passed to the
//functions as well as the stubs for the plain function calls
#include "driverlib/sysctl.h"
//Has the definitions for ROM_XXX function calls
#include "driverlib/rom.h"
//Has the definitions for MAP_XXX function calls
//and relies on the information in rom.h to work.
#include "driverlib/rom_map.h"

/* This function call to set the clock rate will require that you link
 * Driverlib as part of the compilation process.
 */
SysCtlClockSet(SYSCTL_SYSDIV_4|SYSCTL_USE_PLL|SYSCTL_XTAL_16MHZ|SYSCTL_OSC_MAIN);

/* This function call to set the clock rate will not require that you
 * link in driverlib during compilation.  But will fail to compile if
 * the particular device you are using does not have this function loaded into ROM.
 */
ROM_SysCtlClockSet(SYSCTL_SYSDIV_4|SYSCTL_USE_PLL|SYSCTL_XTAL_16MHZ|SYSCTL_OSC_MAIN);

/* This function call to set the clock rate will not require that you
 * link in driverlib during compilation, unless the device you are
 * using doesn't have the function in ROM, then it will link against driverlib.
 */
MAP_SysCtlClockSet(SYSCTL_SYSDIV_4|SYSCTL_USE_PLL|SYSCTL_XTAL_16MHZ|SYSCTL_OSC_MAIN);

One thing that may be important to note about using this feature would be the -DTARGET_IS_BLIZZARD_RA1 flag that is passed to gcc during compilation, this tells the preprocessor what device the code is being compiled for and maps in the correct function calls for MAP_ ROM_ calls. Not passing this in will result in compilation errors.

Another thing to note that has burned me a few times is if you ever nest these function calls like this

//Won’t Work
MAP_UARTConfigSetExpClk(UART0_BASE, MAP_SysCtlClockGet(), 115200, UART_CONFIG_PAR_NONE);
//Should Work
MAP_UARTConfigSetExpClk(UART0_BASE, (MAP_SysCtlClockGet()) , 115200, UART_CONFIG_PAR_NONE);

You need to make sure that you wrap the inner calls to MAP_ ROM_ functions in () other wise the preprocessor will get them mucked up and will result in a compilation error, or, and this is what happened to me, will result in a runtime error where the UART will stop working for apparently no reason.

So if you plan on moving your code to a different development platform down the road, it would be be a smart idea to use the MAP_ version of the function calls to save yourself some hassle down the road. If not, at least use the ROM_ calls to save yourself some code size. If you end up needing to modify the code in driverlib you can just make a call to the regular function and it will link in your modified version.

If you want further reading check out the Driverlib User’s Guide (SW-DRL-UG-9453.pdf) in the doc/ folder of your Stellarisware directory.

The Goods

All the of the code presented here can be found on my github account located here. There is some extra functionality included in the code that I haven’t covered yet, so if you want a sneak peak at whats coming, take a look.

wget https://github.com/eehusky/Stellaris-GCC/archive/b1-intro.tar.gz
tar xf b1-intro.tar.gz
cd Stellaris-GCC-b1-intro/prog0

Disclaimer:

This is my first attempt at “blogging”, so if anyone has a comments/criticisms please drop a comment below. Any feedback would be greatly appreciated.

This Series

Stellaris-GCC: Intro
Stellaris-GCC: Newlib-SysCalls-Stacks-Heaps-Oh My!
Stellaris-GCC: The Hard FPU
Stellaris-GCC: SD-Cards and File Systems

References

Recursive Labs: Programming the Stellaris Launchpad with GNU/Linux
GitHub: EEhusky – Stellaris-GCC
GitHub: summon-arm-toolchain
GitHub: lm4tools
TI: Stellaris Ware Download
gnu.org: ARM-Options