Tutorials: Microcontroller systems (MICSY)


last updated: 2021-04-23


Song of this chapter: Barbara Streisand > Memories > Memory

The Central Processing Unit CPU of our computer, embedded device or microcontroller has at least three parts.


A microcontroller (μC) is a small computer on a single integrated circuit. The chip contains at least one CPU along with memory (RAM, Flash and EEPROM) and programmable input/output peripherals.


Microcontroller often don't use the Complex Instruction Set (CISC) that classical computer processors use but they are Reduced Instruction Set Computer (RISC).

With RISC we have commands that are integrated in hardware (no microcode), so we get a high speed in execution (less decoding) of e.g. interrupts an a low reaction time on sensors. Most commands can be done in one clock cycle. Only the commands load and store (2 cycles) are acting on the memory. The other commands use the many General Purpose Register GPR, which allow to reduce memory access and gain time.

Havard vs Von Neumann

The Harvard architecture uses two physically separated memories, one for the data and one for the commands (instructions). Also the access to the memories is done over separated buses for data and commands.

In a pure von Neumann architecture, instructions and data are stored in the same memory and accessed over the same bus. The CPU cannot simultaneously read commands and data from or to the memory.

Most modern computers have a hybrid designs, and use the best of both architectures.

The Harvard architecture is faster then Von Neumann because the CPU can read an instruction and perform data memory access at the same time! The Harvard architecture gives the possibility to use different word length for both memories, because it has distinct code and data address spaces. So in short: Harvard is faster, Von Neumann is more flexible.


Microcontroller have to be cheap and fast. They mostly use the Harvard architecture design. Our Arduino chips (ATmega 328 (Uno) or Atmega32u4 (Leonardo, Teensy 2.0)) use a relatively pure Harvard architecture. Programs are stored in Flash memory and data is stored in SRAM.

Even if memory access is mostly managed by the compiler and run-time system, it is essential to understand memory access and addressing to write secure and error-free software.

Microcontroller (µC) (wiki)

A small computer used for embedded devices, with one or more processor cores, memory and programmable input/output peripherals integrated on a single integrated circuit is called a microcontroller.

A System on Chip SoC is a term that is not well defined. Often it is a microcontroller plus mixed-signal circuits and/or radio frequency signal processing functions and/or FPGA.

Let's look at a µC of the AVR series from ATMEL (Microchip):

avr microcontroller

We distinguish the following blocks:

AVR's are available in many different housings, and can use voltages from 1.8 V to 5.5 V. Apart from ISP they can be programmed using a bootloader (Arduino) or debugged over JTAG.

Each year new controller hit the market with further new functions. To reduce the complexity we will focus on the ATmega328 (Arduino Uno) and ATmega32u4 (Arduino Leonardo or Teensy 2.0).


Let's look at our Arduino Uno (ATmega328P).
Three memory blocks are available. Non-volatile 16-bit FLASH program memory, a little non-volatile 8-bit EEPROM and the volatile 8-bit SRAM.

memory atmega328

The 1 kibibyte EEPROM (wiki)

Memory that retains its data when the power supply is switched off is called non-volatile memory.

EPROM stands for Erasable Programmable Read-Only Memory. The programming process is not electrically reversible and photons (light) are needed to erase the chip. All memory is erased at the same time. Today EPROMs are mostly replaced by EEPROMs (Electrically Erasable Programmable ROM) .

EEPROMs can be programmed and erased electrically in-circuit, by applying special programming signals. The ATmega328P contains 1 kibibyte of data EEPROM memory. It is organized as a separate data space, in which single bytes can be read and written. The EEPROM has an endurance of at least 100000 write/erase cycles. Naturally you can read it as often as you like. Reading and writing to the EEPROM is rather slow (some milliseconds per byte). EEPROM is slow to write and read, but has an endurance of 100,000 cycle

It is important not to write accidentally in a loop to the the EEPROM! The maximum write cycles could be reached very quickly.

memory atmega328 EEPROM

Information on using EEPROMs in Arduino can be found in the Arduino reference.

"Just do it" Memory 1:

The 32 kibibyte Flash (wiki)

The on-chip program (instruction) memory uses the Flash technology and is In-System reProgrammable (ISP).

memory atmega328 Flash

The AVR's Flash memory uses Atmels high density non-volatile memory technology and is specially designed to hold the program data (hex file). It can be written/erased only 10000 times (EEPROM 100000 times).

All AVR instructions are 16 or 32 bits wide. The Flash is organized in 16-bit words, so we get place for about 16k instructions.

"Just do it" Memory 2:

The Flash is divided in two main sections, the boot program (bootloader) section and the application program section including the interrupt vector table. Both sections have lock bits, that allow write and read/write protection, so reverse-engineering is not easily possible with a locked chip.

Interrupt vector table

At the beginning of the Flash resides the interrupt vector table. We will look at this table in the next chapter. After powering up the hardware loads the address 0x0000 into the program counter and executes a RESET interrupt. The two first words in Flash will tell where the main program is located.

Bootlader section

For In System Programming we can use a programmer like the ATMEL AVRISP mk2. The SPI serial interface is used to program the chip and a 6-pin or 10-pin header connects the programmer to the ATmega chip.

If we don't have a programmer, a second Arduino board can be used to replace the programmer (look here).

To program an Arduino board using a programmer, you have to hold the Shift-key before clickung on Upload (or use Sketch > Uploading Using Programmer). The list of supported programmers is found under Tools > Programmer. You can even use a second Arduino as external programmer (look here).


isp header

Arduino was so successful, because we don't need a programmer! There are special commands in Assembly language to write to the Flash! These commands are only used by a bootloader program that was programmed with a programmer to a special memory section at the end of the Flash memory. The bootloader program can reprogram the Flash. Such a bootloader program can be a security problem in IoT devices!

The chips on the Arduino boards are preprogrammed, and have already a bootloader program. If we want to replace a chip in a board, or want to build our own Arduino board, we have to program the bootloader by ourself. This can be done with Tools > Burn Bootloader in the Arduino IDE.

More information can be found at https://www.arduino.cc/en/Tutorial/ArduinoISP.

Use Flash in Arduino

The Flash is not be meant to be changed by the program, but it is possible to read data from the Flash with the Assembly language command lpm (load program memory, indirect addressing). Because we have a huge amount of memory it could be interesting to write constants (e.g. constant text) to the Flash to save SRAM space. If we do this in Assembly language it is important not to overwrite the program, or the bootloader! In Arduino the compiler helps to avoid these errors.

We have two possibilities to use the Flash:

The number of bytes used in Flash are shown in Arduino after compiling in the output window (File > Preferences > Show verbose output during: both off!).

Flash memory

Let's look at the Flash used by another program not explicitly using PROGMEM:

/* Flash_test_no_PROGMEM.ino */

int digits[] = {0,1,2,3,4,5,6,7,8,9};

 void setup() {
  for (int i = 0; i<10; i++) {
    int digit = digits[i];
void loop() {}

Flash memory

There is a difference in the global variables used (SRAM) but not in the Flash size. Why is this? The C or C++ compiler (gcc) does optimization. What optimisations is done by passing a flag to the compiler. In Arduino by default the flag -Os is passed. -Os stands for size optimisation. So the Assembly code generated already does use the Flash with lpm for us. Let's try what happens if we pass -O3 for speed optimisation (can be changed in the file platform.txt for Uno or boards.txt for Teensy in the arduino/hardware folder).

For our example with PROGMEM:

Flash memory

The example without PROGMEM:

Flash memory

Now we see a difference in the Flash size.
But we also see a large overhead of additional Arduino code and the toll of using a high-level language instead of the Assembly language. In Assembly language we could write the same program by using less than 100 bytes!

The 2.25 kibibyte SRAM (wiki)

The static RAM (random-access-memory) uses flip-flops to store the bits. SRAM is volatile meaning that data is eventually lost when the memory is not powered.

memory atmega328 SRAM

SRAM is fast, so most operations of our µC are happening in SRAM. The Harvard architecture with it's separated memories and buses for program and data allows instructions to be executed with a single level pipelining. During excecution of an instruction, the next instruction is pre-fetched from the Flash, so allowing instructions being executed in one clock cycle.

We get 5 different sections in our SRAM:

Assembly language

To better understand the access to SRAM memory, and the different addressing modes, let's have a brief glance at the Assembly language.
As we don't want to code directly in binary or hex (which was done 50 years ago), we use the (low-level) Assembly language. Assembly language uses one command per machine instruction. The commands mnemonic is often an acronym or shortcut of the command (e.g. brne for BRanch if Not Equal). Apart the commands, assembler directives, macros, and symbolic labels of program and memory locations are used to facilitate the programming. The source code is a text-only file. This text file is converted into executable machine code by a utility program called assembler.

An overview of all the Assembly language commands can be found here and the complete instruction set here and here.

One instruction consists of the opcode or mnemonic and the operands. We get commands without operand, with one operand or two operands. Here three examples:

  opcode operand(s) description
no operand: cli clear the global interrupt flag
one operand: inc r16 increment the working register 16
two operands: ldi r17,120 load immediate decimal 120 to working register r17

With two operands we have always first the destination and than the source:

avr instruction

Most commands need only one word (16 bit) in Flash. An exception is e.g. the jump command (jmp k) that need 2 words. This can be seen in the Flash vector table where two words are reserved per interrupt to place a jump instruction.

Some commands we will need in our following exercise:

opcode operand(s) description operation flags clocks
sbiw Rd, K Subtract Immediate from Word Rd + 1:Rd ← Rd + 1:Rd - K Z,C,N,V,S 2
ldi Rd, K Load Immediate Rd ← K 1
push Rr Push Register on Stack STACK ← Rr
pop Rd Pop Register from Stack Rd ← STACK 2
sbi A, b Set Bit in I/O Register I/O(A, b) ← 1 2
cbi A, b Clear Bit in I/O Register I/O(A, b) ← 0 2
rjmp k Relative Jump PC ← PC + k + 1 2
rcall k Relative Call Subroutine PC ← PC + k + 1 3 / 4(1)
brne k Branch if Not Equal if (Z = 0) then PC ← PC + k + 1 1 / 2
ret Subroutine Return PC ← STACK 4 / 5(1)

The shortcuts for the operands are:

The shortcuts for the flags (see SREG later in this chapter): C: Carry flag, Z: Zero flag, N: Negative flag, V: two’s complement overflow flag, S: sign flag (N ⊕ V), H: Half carry flag, T: Transfer flag, I: global Interrupt flag.

Assembly code needs less space and can be much faster than code from a high-level language. Because today's controller have much more memory and are higher clocked, than some years ago, Assembly code is used less and less. The big disadvantage of Assembly code is to not being portable. Assembly code runs only on the controller for which it's written.

"Just do it" Memory 3:

verbose output

Inline assembler

It is also possible to use assembler inside the Arduino (C) code. Here an example with the Assembly instruction nop (no operation) which does nothing, but to kill time (exactly 1/16 µs with a 16 MHz crystal):

    asm ("nop \n");

The Assembly code is encased in parenthesis preceded by the compiler keyword asm or __asm__. The assembler instructions are enclosed inside quotations and terminated with the escape sequence for the linefeed character, '\n' because the avr-as assembler used by Arduino requires a single instruction per line.

By looking in the Arduino libraries we see that inline assembler is used to define among other things new functions. In the header file hardware/arduino/avr/cores/arduino/Arduino.h we find the following code:

    // avr-libc defines _NOP() since 1.6.2
    #ifndef _NOP
    #define _NOP() do { __asm__ volatile ("nop"); } while (0)

so we can could _NOP(); instead of asm ("nop \n");.

General purpose register GPR (r0-r31)

memory atmega328 SRAM GPR

The 32 general purpose register (also called register file) are the working horses and are needed by many instructions. Especially all arithmetic and logic instructions pass by these register. In the instruction set Rd (destination) and Rr (source) are used for the GPR. A typical command is the ADD or ADC command:

Mnemonic Operands Description Operation Flags Clocks
add Rd, Rr Add without Carry Rd ← Rd + Rr Z,C,N,V,S,H 1
adc Rd, Rr Add with Carry Rd ← Rd + Rr + C Z,C,N,V,S,H 1
Special-purpose register SPR

memory atmega328 SRAM SPR

We have already seen the SPR DDR, PIN and PORT, and know that we can manipulate them directly from Arduino (using the whole port or setting and clearing bits with masks). This makes e.g. sense if we want to use a whole port as in our SSD example. It can reduce and simplify the source code.
But another reason can be that we want to use features of our ATmega controller that are not implemented in Arduino (e.g. differential input for the Analog to Digital Converter), or want access to data we don't see in Arduino, like the Status REGister SREG or the Stack Pointer SP.
Often each single bit in the SPR's has it's role to play and must be manipulated separately. For the first 32 SPR this can be done as seen with the Assembly commands sbi and cbi. For the other 32 SPR's and the extended SPR's masking must be used.

A list of the SPR can be found at the end of the data sheet of each controller.

To better understand SPR's, let's take a closer look at one of them:

The Status REGister SREG (wiki)

It's one of the most important register. Often it is named also flag register because it marks important states with bits (flag shown, bit set; flag not shown, bit reset).

It is register mostly used by hardware to signal information about the results of operations (bit 0-5). What flags are affected by an operation can be seen in the instruction set. Additionally we have a user flag (T) to use in programs to mark special events and the Interrupt flag to allow or disable interrupts globally. For each flag of the status register exist two commands to set or reset the flag (e.g. set and clt for the t flag).

Status register SREG

     7     6       5   4      3      2   1    0
Interrupt Transfer Half carry Sign oVerflow Negative Zero Carry
     I     T       H   S      V      N   Z    C

Example 1:

For a 16 bit addition per example we will add first the two low bytes with an ADD command. This will set the Carry flag if a carry occurs. The two high bytes will be added with an ADC command such adding the Carry bit if needed:

            ldi     r16,LOW(1000)   ;16 bit addition first number in r17:r16
            ldi     r17,HIGH(1000)  ;
            ldi     r18,LOW(2000)   ;second number in r19:r18
            ldi     r19,HIGH(2000)  ;
            add     r16,r18         ;result in r17:r16
            adc     r17,r19

Example 2:

A branch if equal command will branch (jump) to another address if the operation before the breq command is equal, meaning the Zero flag was set. Equal is tested by subtracting two numbers. They are equal if the subtraction is Zero.

            cpi     r16,100         ;compare r16 to 100
            breq    NEWLABEL        ;branch if equal to label NEWLABEL
            dec     r17             ;decrement r17
            breq    R17ZERO         ;branch if zero to label R17ZERO
"Just do it" Memory 4:
"Just do it" Memory 5:
"Just do it" Memory 6:
Extended special-purpose register SPR

memory atmega328 SRAM SPR ext.

In newer controller, the 64 SPR were not enough, because of their enhanced features and supplementary hardware (e.g. USB for Teensy/Leonardo (ATmega32u4)). The addresses from 0x60 to 0xFF (96-255) in SRAM are used for the supplementary SPR register. Their access in Assembly language is slower because the commands in and out (1 clock cycle) can't be used, but instead the commands load ld and store st (2 clock cycles).

Data memory

memory atmega328 SRAM data

The data memory area is the biggest block of volatile SRAM memory, some times called heap. It is normally used to store variables, tables etc. At a given time, some parts of the heap are in use and some are free (unused and thus available for future allocations). It is the work of the programming person to manage this memory allocation.

Mostly in Arduino we use static variables and memory management is no big concern, except the restricted size of the SRAM. But if we want or have to allocate memory dynamically this can be done in C. For more information you can read: https://en.wikipedia.org/wiki/C_dynamic_memory_allocation. The improper use of dynamic memory allocation is frequently a source of bugs (security issues, program crashes and segmentation faults), so it is best avoided if possible.

If the lack of data memory is a problem, it is better to change the microcontroller. New boards with better controller and more memory are often available for the same price (e.g. Arduino Uno vs ESP-boards).

Stack (wiki)

The stack is a part of the memory always needed if we use subroutines (functions) or interrupts (interrupt service routines (ISR)) in our program.

Even if the stack is a part of the SRAM, because of it's importance in programming it will get it's own chapter :).

Interesting links: