Tutorials

Microcontroller Systems: Memory

last updated: 14/05/19

Introduction

Song of this chapter: Barbara Streisand > Memories > Memory

The Central Processing Unit CPU of our computer, embedded device or microcontroller has at least three parts.

computer


A microcontroller (μC) is a small computer on a single integrated circuit. The chip contains at least one CPU along with memory (RAM, Flash and EEPROM) and programmable input/output peripherals.

RISC vs CISC

Microcontroller often don't use the Complex Instruction Set (CISC) that classical computer processors use but they are Reduced Instruction Set Computer (RISC) (modern PC processor use both CISC and RISC).

With RISC we have commands that are integrated in hardware (no microcode), so we get a high speed in execution (less decoding) of e.g. interrupts an a low reaction time on sensors. Most commands can be done in one clock cycle. Only the commands load and store (2 cycles) are acting on the memory. The other commands use the many general purpose register, wich allow to reduce memory access and gain time.

Havard vs Von Neumann

The Harvard architecture uses two physically separated memories, one for the data and one for the commands (instructions). Also the access to the memories is done over separated buses for data and commands.

In a pure von Neumann architecture, instructions and data are stored in the same memory and accessed over the same bus. The CPU cannot simultaneously read commands and data from or to the memory.

Most modern computers have a hybrid designs, and use the best of both architectures.

The Harvard architecture is faster then Von Neumann because the CPU can read an instruction and perform data memory access at the same time! The Harvard architecture gives the possibility to use different word length for both memories, because it has distinct code and data address spaces. So in short: Harvard is faster, Von Neumann is more flexible.

computer


Microcontroller have to be cheap and fast. They mostly use the Harvard architecture design. Our Arduino chip (ATmega 328 (Uno) or Atmega32u4 (Leonardo, Teensy 2.0)) use a relatively pure Harvard architecture. Programs are stored in Flash memory and data is stored in SRAM.

Even if memory access is mostly managed by the compiler and run-time system, it is essential to understand memory access and addressing to write secure and error-free software.

Microcontroller (µC) (wiki)

A small computer used for embedded devices, with one or more processor cores, memory and programmable input/output peripherals integrated on a single integrated circuit is called a microcontroller.

A system on chip (SoC) is a term that is not well defined. Often it is a microcontroller plus mixed-signal circuits and/or radio frequency signal processing functions and/or FPGA.

Let's look at a µC of the AVR series from ATMEL (Microchip):

avr microcontroller


We distinguish the following blocks:

AVR's are available in many different housings, and can use voltages from 1.8 V to 5.5 V. Apart from ISP they can be programmed using bootloader (Arduino) or debugged over JTAG.

Each year new controller hit the market with further new functions. To reduce the complexity we will focus on the ATmega328 (Uno) and ATmega32u4 (Leonardo Teensy 2.0).

Memory

Let's look at our Arduino Uno (ATmega328P). Three memory blocks are available. Non-volatile 16-bit Flash program memory, a little non-volatile 8-bit EEPROM an the volatile 8-bit SRAM.

memory atmega328


The 1 kibibyte EEPROM (wiki)

Memory that retains its data when the power supply is switched off is called non-volatile memory.

EPROM stands for Erasable Programmable Read-Only Memory. The programming process is not electrically reversible and photons (light) are needed to erase the chip. All memory is erased at the same time. Today EPROMs are mostly replaced by EEPROMs (Electrically Erasable Programmable ROM) .

EEPROMs can be programmed and erased electrically in-circuit, by applying special programming signals. The ATmega328P contains 1 kibibyte of data EEPROM memory. It is organized as a separate data space, in which single bytes can be read and written. The EEPROM has an endurance of at least 100000 write/erase cycles. Naturally you can read it as often as you like. Reading and writing to the EEPROM is rather slow (some milliseconds per byte). EEPROM is slow to write and read, but has an endurance of 100,000 cycle

It is important not to write accidentally in a loop to the the EEPROM! The maximum write cycles could be reached very quickly.

memory atmega328 EEPROM


Information on using EEPROMs in Arduino can be found in the Arduino reference.

"Just do it" M1:
    /* jdim1_EEPROM_read_write.ino *
     * Asks a password over Serial and stores it in EEPROM
     */

    #include <EEPROM.h>

    char password[30];      // array to store the password
    char in_char = 0;       // incoming serial character
    short eeprom_addr = 0;  // EEPROM address
    byte index = 0;
    byte pw_length = 0;

    void setup() {
      Serial.begin(9600); // initialize serial
      Serial.print("Please type a password (min. 12 bytes): ");
      while (true) {
        if(Serial.available() > 0) {
          in_char = Serial.read();      // read a character until
          if (in_char == '\n') {        // ENTER is pressed
            password[index] = '\0'; // end the pw string with NULLBYTE
            pw_length = index;
            break;                      // and get out
          }
          password[index] = in_char; // store the character
          index++;                   // increment where to write next
        }
      }
      Serial.println(password);
      // write the password to EEPROM beginning with address 0
      for (eeprom_addr=0; eeprom_addr<pw_length;eeprom_addr++) {
        EEPROM.write(eeprom_addr, password[eeprom_addr]);
      }
    }

    void loop() {
      for (eeprom_addr=0; eeprom_addr<pw_length;eeprom_addr++) {
        in_char = EEPROM.read(eeprom_addr);
        Serial.print(in_char);
      }
      Serial.println();
      delay(2000);
    }
    Serial.println(millis());
    /* JDIM1_EEPROM_md5 password.ino
     * Write the md5 hash of a password to the EEPROM
     * Asks a password over Serial and stores it in EEPROM
     * https://github.com/tzikis/ArduinoMD5/
     */

    #include <EEPROM.h>
    #include <MD5.h>

    char md5str_arr[50];
    char password[50];      // array to store the password
    char in_char = 0;       // incoming serial character
    short eeprom_addr = 1;  // EEPROM address
    byte pw_index = 0;
    byte pw_length = 0;
    unsigned char* hash;
    char *md5str;
    byte hash_length;


    void setup() {
      Serial.begin(9600); // initialize serial
      Serial.print("Please type a password (min. 12 bytes): ");
      while (true) {
        if(Serial.available() > 0) {
          in_char = Serial.read();      // read a character until
          if (in_char == '\n') {        // ENTER is pressed
            password[index] = '\0'; // end the pw string with NULLBYTE
            pw_length = index;
            break;                      // and get out
          }
          password[index] = in_char; // store the character
          index++;                   // increment where to write next
        }
      }
      Serial.println(password);
      // Calculate the hash
      hash=MD5::make_hash(password); //generate the MD5 hash for our string
      md5str = MD5::make_digest(hash, 16); //generate the hex encoding of our hash
      free(hash);
      Serial.print("Hash is: ");
      Serial.println(md5str);
      hash_length = strlen(md5str);

      EEPROM.write(0, hash_length);
      // write the password to EEPROM beginning with address 0
      for (eeprom_addr=1; eeprom_addr<hash_length+1;eeprom_addr++) {
        EEPROM.write(eeprom_addr, md5str[eeprom_addr-1]);
      }
      free(md5str); // give the memory back (optional only needed in loop)

      // only to test if properly written:
      Serial.print("Number of bytes (address 0 in EEPROM): ");
      Serial.println(EEPROM.read(0));
      Serial.print("Hash in EEPROM: ");
      for (eeprom_addr=1; eeprom_addr<hash_length+1;eeprom_addr++) {
        in_char = EEPROM.read(eeprom_addr);
        Serial.print(in_char);
      }
    }

    void loop() { }

The 32 kibibyte Flash (wiki)

The on-chip program (instruction) memory uses the Flash technology and is In-System reProgrammable (ISP).

memory atmega328 Flash


The AVR's flash memory uses Atmels high density non-volatile memory technology and is specially designed to hold the program data (hex file). It can be written/erased only 10000 times (EEPROM 100000 times). The Flash can not be changed byte-wise like the EEPROM but page(block)-wise (64 byte at a time).

All AVR instructions are 16 or 32 bits wide. The flash is organized in 16-bit words, so we get place for about 16k instructions.

"Just do it" M2:

The Flash is divided in two main sections, the boot program (bootloader) section and the application program section including the interrupt vector table. Both sections have lock bits, that allow write and read/write protection, so reverse-engineering is not easily possible with a locked chip.

Interrupt vector table

At the beginning of the Flash resides the interrupt vector table. We will look at this table in the next chapter. After powering up the hardware loads the address 0x0000 into the program counter and executes a RESET interrupt. The two first words in Flash will tell where the main program is located.

Bootlader section

For In System Programming we can use a programmer like the ATMEL AVRISP mk2. The SPI serial interface is used to program the chip and A 6-pin or 10-pin header connects the programmer to the ATmega chip.

If we don't have a programmer, a second Arduino board can be used to replace the programmer (look here).

To program an Arduino board using a programmer, you have to hold the Shift-key before clickung on Upload. The list of supported programmers is found under Tools > Programmer.

rover

isp header


Arduino was so successful, because we don't need a programmer! There are special commands in assembly language to write to the Flash! These commands are only used by a bootloader program that was programmed with a programmer to a special memory section at the end of the Flash memory. The bootloader program can reprogram the flash. Such a bootloader program can be a security problem in IoT devices!

The chips on the Arduino boards are preprogrammed, and have already a bootloader program. If we want to replace a chip in a board, or want to build our own Arduino board, we have to program the bootloader by ourself. This can be done with Tools > Burn Bootloaderin the Arduino IDE.

More information can be found at https://www.arduino.cc/en/Tutorial/ArduinoISP.

Use Flash in Arduino

The Flash is not be meant to be changed by the program, but it is possible to read data from the flash with the assembly language command lpm (load program memory, indirect addressing). Because we have a huge amount of memory it could be interesting to write constants (e.g. constant text) to the Flash to save SRAM space. If we do this in assembly language it is important not to overwrite the program, or the bootloader! In Arduino the compiler helps to avoid these errors.

We have two possibilities to use the flash:

    /* test_PROGMEM.ino */

    #include <avr/pgmspace.h>

    PROGMEM const int digits[] = {0,1,2,3,4,5,6,7,8,9};

     void setup() {
      Serial.begin(115200);
      for (int i = 0; i<10; i++) {
        int digit = pgm_read_word(&digits[i]);
        Serial.println(digit);
      }
    }
void loop() {}

The number of bytes used in Flash are shown in Arduino after compiling in the output window.

flash memory

Let's look at the Flash used by another program not explicitly using PROGMEM:

/* Flash_test_no_PROGMEM.ino */

int digits[] = {0,1,2,3,4,5,6,7,8,9};

 void setup() {
  Serial.begin(115200);
  for (int i = 0; i<10; i++) {
    int digit = digits[i];
    Serial.println(digit);
  }
}
void loop() {}

flash memory

There is a difference in the global variables used (SRAM) but not in the Flash size. Why is this? The C or C++ compiler (gcc) does optimization. What optimisations is done by passing a flag to the compiler. In Arduino by default the flag -Os is passed. -Os stands for size optimisation. So the assembly code generated already does use the Flash with lpm for us. Let's try what happens if we pass -O3 for speed optimisation (can be changed in the file platform.txt for Uno or boards.txt for Teensy in the arduino/hardware folder).

For our example with PROGMEM:

flash memory

The example without PROGMEM:

flash memory

No we see a difference in the Flash size.
But we also see a large overhead of additional Arduino code and the toll of using a high-level language instead of the assembly language. In assembly language we could write the same program by using less than 100 bytes!

The 2.25 kibibyte SRAM (wiki)

The static RAM (random-access-memory) uses flip-flops to store the bits. SRAM is volatile meaning that data is eventually lost when the memory is not powered.

memory atmega328 SRAM


SRAM is fast, so most operations of our µC are happening in SRAM. The Harvard architecture with it's separated memories and buses for program and data allows instructions to be executed with a single level pipelining. During excecution of an instruction, the next instruction is pre-fetched from the Flash, so allowing instructions being executed in one clock cycle.

We get 5 different sections in our SRAM:

Assembly language

The access to the different SRAM bytes is depends on the architecture's machine code instructions. So for better understanding SRAM memory let's have a brief glance at the Assembly language.
As we don't want to code directly in binary or hex (which was done 50 years ago), we use the low-level assembly language. Assembly language uses one statement per machine instruction. The statement is often an acronym or shortcut of the command (e.g. brne for BRanch Not Equal). Apart the commands, assembler directives, macros, and symbolic labels of program and memory locations are used to facilitate the programming. The source code is a text-only file. This text file is converted into executable machine code by a utility program called assembler.

An overview off all the assembly commands can be found here and the complete instruction set here and here.

Some commands we will need in our following exercise:

Mnemonic Operands Description Operation Flags Clocks
sbiw Rd, K Subtract Immediate from Word Rd + 1:Rd ← Rd + 1:Rd - K Z,C,N,V,S 2
ldi Rd, K Load Immediate Rd ← K 1
push Rr Push Register on Stack STACK ← Rr
pop Rd Pop Register from Stack Rd ← STACK 2
sbi A, b Set Bit in I/O Register I/O(A, b) ← 1 2
cbi A, b Clear Bit in I/O Register I/O(A, b) ← 0 2
rjmp k Relative Jump PC ← PC + k + 1 2
rcall k Relative Call Subroutine PC ← PC + k + 1 3 / 4(1)
brne k Branch if Not Equal if (Z = 0) then PC ← PC + k + 1 1 / 2
ret Subroutine Return PC ← STACK 4 / 5(1)

Assembly code needs less space and can be much faster than code from a high-level language. Because today's controller have much more memory and are higher clocked, than some years ago, assembly code is less and less used. This because of it's big disadvantage of not being portable. Assembly code runs only on the controller for wich it's written.

"Just do it" M3:
      ;*******************************************************************************
      ;*    blink.asm
      ;*******************************************************************************
      .NOLIST
      .INCLUDE "m328Pdef.inc"         ;include AVR definitions file (Mega m2560def.inc)
      .LIST
      .CSEG                           ;code segment: all lines from here go to Flash
      .ORG    0x0000                  ;organize addr. 0x0000 for RESET: program start
      RESET:  rjmp    SETUP           ;jump to SETUP (skip the ISR vector table)
      .ORG    INT_VECTORS_SIZE        ;organize address after vector table
      SETUP:  sbi     DDRB,5          ;PB5 = OUTPUT (Mega PB7)
      LOOP:   sbi     PORTB,5         ;LED On
              ldi     YL,LOW(1000)    ;parameter for subroutine Y = 1000
              ldi     YH,HIGH(1000)   ;(Y is external 16 bit counter)
              rcall   DELAY           ;call subroutine DELAY with parameter 1000ms
              cbi     PORTB,5         ;LED Off
              ldi     YL,LOW(1000)    ;delay(1000)
              ldi     YH,HIGH(1000)   ;
              rcall   DELAY           ;
              rjmp    LOOP            ;endless loop

      DELAY:  push    XL              ;save the 4 used (global) register to stack, so
              push    XH              ;that they are not changed by the subroutine
              push    YL              ;the 2 double reg. X and Y (16 bit) are now free
              push    YH              ;for local use
      DELAYE: ldi     XL,LOW(4000)    ;X = (t-tT)/4tT = 1ms/4*62,5ns ~ 4000
              ldi     XH,HIGH(4000)   ;initialize internal loop with 4000 for 1 second
      DELAYI: sbiw    X,1              ;decrement internal loop (counter X)
              brne    DELAYI           ;get out from internal loop if X = 0
              sbiw    Y,1              ;decrement external loop (counter Y)
              brne    DELAYE           ;get out from external loop if Y = 0
              pop    YH
              pop    YL
              pop    XH
              pop    XL
              ret                      ;return to mainloop (return address on stack)
      .EXIT                            ;end of program

verbose output

General purpose register GPR (r0-r31)

The 32 general purpose register (also called register file) are the working horses and are needed by many instructions. Especially all arithmetic and logic instructions pass by these register. In the instruction set Rd (destination) and Rr (source) are used for the GPR. A typical command is the ADD or ADC command:

Mnemonic Operands Description Operation Flags Clocks
add Rd, Rr Add without Carry Rd ← Rd + Rr Z,C,N,V,S,H 1
adc Rd, Rr Add with Carry Rd ← Rd + Rr + C Z,C,N,V,S,H 1
Special-purpose register SPR

We have already seen the SPR DDR, PIN and PORT, and know that we can manipulate them directly from Arduino (using the whole port or setting and clearing bits with masks). This makes e.g. sense if we want to use a whole port as in our SSD example. It can reduce and simplify the source code.
But another reason can be that we want to use features of our ATmega controller that are not implemented in Arduino (e.g. differential input for the Analog to Digital Converter), or want access to data we don't see in Arduino, like the status register or the Stack Pointer SP.
Often each single bit in the SPRs has it's role to play and must be manipulated separately. For the first 32 SPR this can be done as seen with the assembly commands sbi and cbi. For the other 32 SPRs and the extended SPRs masking must be used.

A list of the SPR can be found at the end of the data sheet of each controller.

To better understand SPRs, let's take a closer look at one of them:

The Status register (wiki)

It's one of the most important register. Often it is named also flag register because it marks important states with bits (flag shown, bit set; flag not shown, bit reset).

It is register mostly used by hardware to signal information about the results of operations (bit 0-5). What flags are affected by an operation can be seen in the instruction set. Additionally we have a user flag (T) to use in programs to mark special events and the Interrupt flag to allow or disable interrupts globally. For each flag of the status register exist two commands to set or reset the flag (e.g. set and clt for the t flag).

Status register SREG

     7     6       5   4      3      2   1    0
Interrupt Transfer Half carry Sign oVerflow Negative Zero Carry
     I     T       H   S      V      N   Z    C

Example 1:

For a 16 bit addition per example we will add first the two low bytes with an ADD command. This will set the Carry flag if a carry occurs. The two high bytes will be added with an ADC command such adding the Carry bit if needed:

    ...
            ldi     r16,LOW(1000)   ;16 bit addition first number in r17:r16
            ldi     r17,HIGH(1000)  ;
            ldi     r18,LOW(2000)   ;second number in r19:r18
            ldi     r19,HIGH(2000)  ;
            add     r16,r18         ;result in r17:r16
            adc     r17,r19
    ...

Example 2:

A branch if equal command will branch (jump) to another address if the operation before the breq command is equal, meaning the Zero flag was set. Equal is tested by subtracting two numbers. They are equal if the subtraction is Zero.

    ...
            cpi     r16,100         ;compare r16 to 100
            breq    NEWLABEL        ;branch if equal to label NEWLABEL
            dec     r17             ;decrement r17
            breq    R17ZERO         ;branch if zero to label R17ZERO
    ...
"Just do it" M4:
    void setup() {
      Serial.begin(9600);
      Serial.println(SREG,BIN);
                                 // Set the T-flag
      Serial.println(SREG,BIN);
                                 // Reset the T-flag
      Serial.println(SREG,BIN);
    }

    void loop() {}
"Just do it" M5:
    // sinewave_pwm_analogWrite.ino (https://www.arduino.cc/en/Tutorial/PWM)
    // https://www.daycounter.com/Calculators/Sine-Generator-Calculator2.phtml

    const byte PWM_pin = 14;    // PB5 on Teensy

    byte  sine_wave[256] = {
      128,134,140,146,152,158,165,170,176,182,188,193,198,203,208,213,
      218,222,226,230,234,237,240,243,245,248,250,251,253,254,254,255,
      255,255,254,254,253,251,250,248,245,243,240,237,234,230,226,222,
      218,213,208,203,198,193,188,182,176,170,165,158,152,146,140,134,
      128,121,115,109,103,97,90,85,79,73,67,62,57,52,47,42,37,33,29,
      25,21,18,15,12,10,7,5,4,2,1,1,0,0,0,1,1,2,4,5,7,10,12,15,18,21,
      25,29,33,37,42,47,52,57,62,67,73,79,85,90,97,103,109,115,121
    };

    void setup() {}

    void loop() {
      for (int i=0; i<128; i++) {
        analogWrite(PWM_pin, sine_wave[i]);
        delay(10);
      }
    }
    ...
    void setup() {
      pinMode(PWM_pin, OUTPUT);
      // Fast PWM for Teensy Timer 1 (3 outputs)
      TCCR1A = 0xA9; // COM1A1,COM1B1,COM1C1,WGM10;
      TCCR1B = 0x09; // WGM12 (Fast PWM 8 Bit) CK/1  -> 62.5 kHz
      OCR1A = 0;     // pwm off
    }

    void loop() {
      for (int i=0; i<128; i++) {
        OCR1A = sine_wave[i];
        delayMicroseconds(yourNumberHere);
      }
    }
Extended special-purpose register SPR
Heap
Stack

Addressing modes

Interesting links: