Blink LED in ARM assembly on STM32F4
If you are like me, you always have the urge to understand ARM Cortex-M architecture in more details. You want to write a small code in assembly language to reach closer to the hardware. But you struggle to find a good how-to-guide for assembly programming. Most of the examples you find are written in higher level languages like C/C++. Don’t give up yet. Read on this post which will help you take the first step in learning ARM assembly programming.
After exploring how startup code works for an STM32F4 microcontroller, it is now time to do something useful. For an embedded system, the time tested tradition for saying Hello World! is to blink an LED. Keeping with the tradition, let’s try to blink an LED in ARM assembly. Using the STM32F4 Discovery board, we will see how to blink all the on-board LEDs.
Things you need to blink an LED in ARM Assembly
You will need a few things in order to enjoy coding in ARM assembly.
Software - KEIL MDK - The code presented here uses ARM assembler instead of GNU assembler. The ARM assembler is part of MDK (Microcontroller Development Kit) toolchain. The MDK-ARM bundles the toolchain, uVision IDE and required software packs. You can download the Lite edition from Keil website for free.
Hardware - STM32F4 discovery board - The STM32F4 Discovery board from ST Micro is based on STM32F407VGT6, an ARM Cortex-M4 microcontroller. The board provides 4 user controllable LEDs in addition to other peripherals. An ST-Link debugger on-board helps you flash and debug your code, single step through it from within Keil IDE. We will blink all these 4 LEDs with our ARM assembly code.
Note: If you don’t have discovery board, you can use the simulator in Keil IDE. This post however sticks to using the discovery board.
- Documents - I refer to these documents while explaining code in this post. These documents are also available from within Keil uVision IDE from the Books tab in project window.
Before we understand the assembly code to blink an LED, let’s directly go to the execution phase. To make it easier for you, the source code and instructions to execute code are available on GitHub.
Essentially the steps to execute the code are:
- Clone GitHub repo
- Install Keil MDK Lite and open the project in Keil uVision
- Connect STM32F4 discovery board to your PC
- Build and download code to the board and see the LEDs blinking
How It Works
Now that you have seen the execution, let’s take it apart piece by piece and understand how the ARM assembly code to blink an LED works.
Blinking an LED involves controlling some hardware. Therefore, we need to get acquainted with it before starting to code. The STM32F4 discovery board comes with 4 onboard LEDs connected to PD12 to PD15. These are the GPIO port D pins 12, 13, 14 and 15 respectively. The port pins connect to anodes of LEDs through a resistor with no external driving circuitry. Hence, to blink the LEDs, we need to switch these GPIO pins high and low with a time delay. Switching a pin to logic high will turn on the LED. Conversely switching it to logic low will turn off the LED. We configure the pins in push-pull mode. Consequently, we won’t need any pull-up or pull-down resistors. For more details refer to the STM32F4 Discovery board user manual.
To blink an LED in ARM assembly, we use startup code from STM32F4 software pack and put the application logic in the file LED.s. The startup code provides initialization of stack, heap areas and provides a framework for exception handlers including reset handler. We are not going to use heap or stack in our code. However, it is a good practice to keep the startup code separate. You can get more details on how a startup code works in an earlier post
The startup code calls two functions -
__main (referred to as main in this post). We therefore provide definition for these two functions in our application code in ARM assembly in file LED.s. Let’s now explore the application logic in these functions to blink the LEDs.
Constants and Register Definitions
Towards top of this file, we have a constant value.
EQU is an ARM assembler directive which defines a constant. Assembler writes this constant value into a memory location in code space with a DCD or DCW directive. The data thus written in memory is accessible in code with the LDR (Load data from memory to register) instruction. I shall explain the calculation of this value shortly.
Next there are some register definitions which are the addresses of these registers in STM32F4. Again, the assembler writes these values as data in the code space with DCD or DCW directives.
Let’s get familiar with these registers as it will prove useful to understand the code. We will get into more details later in the post.
RCC_AHB1ENR(RCC AHB1 peripheral clock enable register) enables or disables clock supply to various peripherals. AHB1 indicates these peripherals are on AHB1 bus of processor. To use GPIO port D, we need to enable clock for GPIO-D via this register.
GPIOD_MODER(GPIO-D Mode register) controls the mode for port D pins. We can configure each pin independently as either of these modes:
- Digital Input
- Digital Output
- Analog Function
- Alternate function peripheral within the microcontroller
GPIOD_OTYPER(GPIO-D Output Type register) controls whether the pin works in push-pull mode or as an open-drain pin. In case of open drain, we need external pull-up or pull-down.
GPIOD_OSPEEDR(GPIO-D Output Speed register) controls the maximum switching speed on the port pin. The maximum speed also depends on supply voltage, current drawn from pin and load capacitance. We can configure the speed as low / medium / high / very high.
GPIOD_PUPDR(GPIO-D Pull-up / Pull-down register) enables internal pull-up or pull-down resistor for each port D pin.
GPIOD_ODR(GPIO-D Output Data register) is the register to write data to output on the port pins.
How to Find Register Address
The registers in STM32F4 are memory mapped so every register has an address associated with it. However, neither the reference manual nor the datasheet mentions these addresses explicitly. So let me elaborate about how to find (or rather form) the address of a register.
First of all, navigate to description of the register in STM32F407 reference manual and note the register offset. Next, navigate to memory map (section 2.3) in the manual. You will find boundary address range of the peripheral associated with the register. Finally, form the address of the register by adding the offset to base address of boundary range.
As an example let’s find out the address of register RCC_AHB1ENR. This register is part of the RCC registers (Reset and Clock control). The offset of this register is 0x30 as described in section 7.3.10 of reference manual. The memory map shows the RCC boundary address range as 0x4002 3800 to 0x4002 3BFF. Therefore the address of RCC_AHB1ENR register is 0x4002 3830.
Similarly, the boundary address range for GPIO-D registers is 0x4002 0C00 to 0x4002 0FFF. So we can calculate addresses for all GPIO-D registers by adding their corresponding offsets to the base address 0x4002 0C00. So GPIOD_MODER register address is 04002 0C00 since it has an offset of 0x00. You can verify addresses for remaining GPIO-D registers. Section 8.4.1 of the reference manual contains description for GPIO-D registers.
Let’s now move to the actual code. The processor startup code calls SystemInit function when it starts to execute but before calling the main function. This function contains all the required initialization for our application. In our example this code enables clock for GPIO-D peripheral and further configures the GPIO-D registers. We configure pins PD12 to PD15 as output in push-pull mode with pull-up/pull-down disabled.
To configure a register we follow a read-modify-write cycle. Let’s illustrate this with the code to update the register RCC_AHB1ENR.
The instruction LDR
R1, =RCC_AHB1ENR loads address of the register
RCC_AHB1ENR into R0. LDR is an instruction to load data from memory. In this case, the data stored in memory is the address of register
EQU statement earlier. Note the use of
= prefix in the value to be loaded in R0. This prefix indicates that the value to be loaded is explicitly stored in memory (with
DCD/DCW etc. directives). If the value is an immediate constant then we prefix it with
#. In that case the assembler itself adds a
DCW or similar directive to store the constant in memory.
The next LDR instruction
LDR R0, [R1] reads data from the address contained in R1, which is the address of
RCC_AHB1ENR register. So this instruction reads current value of the register and puts the data in R0.
RCC AHB1 Peripheral Clock Enable register
Let’s go through the details of RCC_AHB1ENR register to understand which bits we need to modify.
As you can see, setting bit 3 of this register enables clock for GPIO-D. Hence we bitwise OR the contents of R0 to set this bit with the instruction
ORR.W R0, #0x08. The
.W suffix in this instruction tells the processor to OR the entire word.
.Wsuffix is actually redundant here since we are ORing one of the lower bits only. However, I have followed the format of using
.Wsuffix in all instructions for consistency. In fact the assembler eliminates the
.Wsuffix in such cases.
Finally the instruction
STR R0, [R1] stores the the contents of R0 to the address contained in R1, which is the address of RCC_AHB1ENR register in our case.
You can see that the code uses same pattern of instructions for all the registers. All we need to do is find out the required value for respective registers. Here are the layouts of registers we need to configure with the relevant bits.
GPIO Mode Register
Each port pin associates with two bits in the mode register. With these two mode bits, we can configure the corresponding port pin in one of the four possible modes as listed below.
- Input mode (00)
- Output mode (01)
- Analog mode (10)
- Alternate function (11).
As shown in the layout, setting the bits to 01b configures the corresponding pin as an output. We need to bitwise AND to clear odd bits between 24 to 31 (AND with 0x55FF FFFF). In addition, we also need to bitwise OR to set even bits between 24 to 31 (OR with 0x5500 0000). Here is the code to update the mode register.
GPIO Output Type Register
The output type register allows individual port pins output output configuration as either push-pull or open-drain type.
Setting a bit configures open drain mode while clearing it configures the corresponding pin in open drain mode. Since we are going to use push-pull mode, we bitwise AND the register value with 0xFFFF 0FFF. This clears bits 12 to 15. Here is the code to update this register.
GPIO Output Speed Register
The output speed register allows us to configure the maximum output switching speed for the port pins. The actual maximum speed depends on other factors such as supply voltage, load capacitance etc. As per STM32F407 datasheet, the maximum speed in slow speed is between 2MHz to 8MHz. We configure the speed to slow mode because out blinking frequency (2 Hz) is much lesser than this.
Setting the bits to ‘00b’ configures the corresponding port pins for slow speed. Thus here is the code that updates this register.
GPIO Pull-up / Pull-down Register
STM32F407 provides configurable internal pull-up / pull-down resistors for each pin. This register allows to enable or disable these internal resistors. We don’t need any pull-up or pull-down to drive the LEDs.
Clearing the bits disables pull-up/pull-down for the corresponding pin. So we carry out a bitwise AND with 0x00FF FFFF to configure the pins. Here is the required code.
After initialization, the main loop repeats these steps infinitely:
- Switch on the LEDs,
- Add a time delay,
- Switch off the LEDs
- Add a time delay
To turn on the LEDs, we need to write a 1 to the corresponding bit in register
GPIOD_ODR. In contrast, clearing the bit turns the LED off. The process of writing to
GPIOD_ODR is exactly same as we followed earlier to configure the registers. We read current value of
GPIOD_ODR and do a bitwise OR to set the bits, while a bitwise AND to clear the bits.
Delay Loop and Timing Calculation
The delay loop is a blocking loop which decrements the DELAY_INTERVAL until it becomes zero.
LDR R2, =DELAY_INTERVAL loads the value of DEALY_INTERVAL into register R2.
CBZ instruction jumps to the label specified if the contents of specified register are zero. With
CBZ R2, turnOFF, if R2 has become zero then the code will jump to label turnOFF and turn the LEDs off. Otherwise it continues to the next instruction
SUBS R2, R2, #1 which decrements R2 and stores the result back to R2. Next instruction
B is an unconditional branch instruction. The code simply jumps back to start of the loop with
B delay1 for the next iteration.
The delay loop executes 3 instructions repeatedly as many number of times as the DELAY_COUNTER value loaded in R2. We need to find out execution time for each of these instructions. We can then calculate total delay.
The first two instructions
SUBS take one cycle each to execute. Furthermore the unconditional branch instruction
B takes 3 cycles. As a result, we have a total of 5 cycles in one loop of the delay. You can get more details about instruction timing from the Technical Reference Manual for ARMV7-M (which the Cortex-M4 processor is based on) .
We have used the default internal 16MHz clock. Timing for one clock cycle is therefore 62.5 nsec. At this speed, one iteration of the delay loop will take (5*62.5) = 312.5 nsec . Rounding off to 313nsec, the number of loops for a 500msec delay is = 500msec/313nsec. This results into 1597444 decimal or 0x186004 in hexadecimal. We can ignore overhead of additional instructions since it is negligible compared to blinking interval.
We explored this small ARM assembly code to blink an LED using very few number of instructions. I must confess that this is neither a foolproof nor a good way to code. In particular the blocking delay loop is not the best use of processor. However, the purpose of this post was to get a fair idea of assembly language programming on an ARM processor. In real world applications, you will use exceptions, interrupts and much smarter ways of coding.
Programming in assembly helps you get intimate with the microcontroller hardware. As a result you get to learn more about internals of a microcontroller / processor. Assembly may not be the preferred choice when it comes to large complex programs due to the complications involved. Yet learning assembly programming does help in debugging and optimizing your code.
In conclusion, this code example can be a good base to learn assembly programming for ARM Cortex-M. As a suggestion, you can try switching on the 4 LEDs one by one with a time delay in between. You can also implement a binary counter from 0x0000 to 0xFFFF using the 4 LEDs.
In one of the next blog posts, I will explain how to make use of interrupts and avoid blocking processor. Till then stay tuned and do let me know your suggestions and comments about this post!
Have fun coding!