Tutorials: Microcontroller systems (MICSY)

Memory

last updated: 2021-04-23

Quick links to the subchapters

Introduction
Microcontroller (µC)
Memory
Interesting links

Introduction

Song of this chapter: Barbara Streisand > Memories > Memory

The Central Processing Unit CPU of our computer, embedded device or microcontroller has at least three parts.

We have seen so far the arithmetic blocks, the logical blocks and the register to create an Arithmetic and Logical Unit (ALU).
The Control Unit CU can be build from the same blocks.
The Memory Unit (of a microcontroller) will be treated in this chapter.
High-end microprocessors have a memory management unit (MMU), translating logical addresses and providing memory protection. Microcontrollers seldom include an MMU.

A microcontroller (μC) is a small computer on a single integrated circuit. The chip contains at least one CPU along with memory (RAM, Flash and EEPROM) and programmable input/output peripherals.

RISC vs CISC

Microcontroller often don't use the Complex Instruction Set (CISC) that classical computer processors use but they are Reduced Instruction Set Computer (RISC).

With RISC we have commands that are integrated in hardware (no microcode), so we get a high speed in execution (less decoding) of e.g. interrupts an a low reaction time on sensors. Most commands can be done in one clock cycle. Only the commands load and store (2 cycles) are acting on the memory. The other commands use the many General Purpose Register GPR, which allow to reduce memory access and gain time.

Havard vs Von Neumann

The Harvard architecture uses two physically separated memories, one for the data and one for the commands (instructions). Also the access to the memories is done over separated buses for data and commands.

In a pure von Neumann architecture, instructions and data are stored in the same memory and accessed over the same bus. The CPU cannot simultaneously read commands and data from or to the memory.

Most modern computers have a hybrid designs, and use the best of both architectures.

The Harvard architecture is faster then Von Neumann because the CPU can read an instruction and perform data memory access at the same time! The Harvard architecture gives the possibility to use different word length for both memories, because it has distinct code and data address spaces. So in short: Harvard is faster, Von Neumann is more flexible.

Microcontroller have to be cheap and fast. They mostly use the Harvard architecture design. Our Arduino chips (ATmega 328 (Uno) or Atmega32u4 (Leonardo, Teensy 2.0)) use a relatively pure Harvard architecture. Programs are stored in Flash memory and data is stored in SRAM.

Even if memory access is mostly managed by the compiler and run-time system, it is essential to understand memory access and addressing to write secure and error-free software.

Microcontroller (µC) (wiki)

A small computer used for embedded devices, with one or more processor cores, memory and programmable input/output peripherals integrated on a single integrated circuit is called a microcontroller.

A System on Chip SoC is a term that is not well defined. Often it is a microcontroller plus mixed-signal circuits and/or radio frequency signal processing functions and/or FPGA.

Let's look at a µC of the AVR series from ATMEL (Microchip):

We distinguish the following blocks:

CPU:
The CPU has an 8 bit ALU with status register, stack pointer, and 32 general purpose register. 8 GPR can be used as 16 bit register; 3 of the double register as address pointer (X, Y, Z). The GPR are part of the SRAM and thus volatile, meaning they lose their content without power, and are not defined when power returns!
Control Unit:
The RISC command and control unit with program counter can treat most 16 bit commmands (instructions) in one clock cycle. We reach near 16 million instructions per second (MIPS) with a 16 MHz crystal (about the speed of an INTEL 80386 in 1990).
Input/Output:
Depending on the controller we get from one to four 8-bit GPIO ports, 8- and 16-bit timer/counter, a real time clock (RTC), PWM outputs, external interrupts, serial interfaces (EIA232, I²C(TWI), SPI), analog to digital converter (ADC) and USB.
Memory:
Three memory blocks are available. Non-volatile 16-bit FLASH program memory, a little non-volatile 8-bit EEPROM and the volatile 8-bit SRAM. We will look in detail at these memories in this chapter.
Crystal:
With and internal RC-oscillator we can get a clock up to 8 MHz. With an external crystal we can push the clock up to 20 MHz (even more with an internal PLL). A watchdog timer helps to detectandrecover from hardware fault or program errors.
Interrupt unit:
An interrupt handler manages external and internal interrupts to handle different events. The controller can use up to six sleep modes.
Programming logic:
The controller ROM memory (FLASH, EEPROM, fuse and lock-bits) can be programmed and cleared over SPI (Serial Peripheral Interface). This is called In-System Programming ISP.

AVR's are available in many different housings, and can use voltages from 1.8 V to 5.5 V. Apart from ISP they can be programmed using a bootloader (Arduino) or debugged over JTAG.

Each year new controller hit the market with further new functions. To reduce the complexity we will focus on the ATmega328 (Arduino Uno) and ATmega32u4 (Arduino Leonardo or Teensy 2.0).

Memory

Let's look at our Arduino Uno (ATmega328P).
Three memory blocks are available. Non-volatile 16-bit FLASH program memory, a little non-volatile 8-bit EEPROM and the volatile 8-bit SRAM.

The 1 kibibyte EEPROM (wiki)

Memory that retains its data when the power supply is switched off is called non-volatile memory.

EPROM stands for Erasable Programmable Read-Only Memory. The programming process is not electrically reversible and photons (light) are needed to erase the chip. All memory is erased at the same time. Today EPROMs are mostly replaced by EEPROMs (Electrically Erasable Programmable ROM) .

EEPROMs can be programmed and erased electrically in-circuit, by applying special programming signals. The ATmega328P contains 1 kibibyte of data EEPROM memory. It is organized as a separate data space, in which single bytes can be read and written. The EEPROM has an endurance of at least 100000 write/erase cycles. Naturally you can read it as often as you like. Reading and writing to the EEPROM is rather slow (some milliseconds per byte). EEPROM is slow to write and read, but has an endurance of 100,000 cycle

It is important not to write accidentally in a loop to the the EEPROM! The maximum write cycles could be reached very quickly.

Information on using EEPROMs in Arduino can be found in the Arduino reference.

"Just do it" Memory 1:

We want to read and write to the internal EEPROM. For this we read a text from the serial monitor. Use Arduino UNO or Mega (Leonardo and Teensy are problematic, because they use Serial over USB) and test the program.
Enhance the program, so that passwords below 12 characters and above 16 characters are not possible.

    /* jdim1_EEPROM_read_write.ino *
     * Asks a password over Serial and stores it in EEPROM
     */

    #include <EEPROM.h>

    char password[30];      // array to store the password
    char in_char = 0;       // incoming serial character
    word eeprom_addr = 0;  // EEPROM address
    byte pw_index = 0;
    byte pw_length = 0;

    void setup() {
      Serial.begin(9600); // initialize serial
      Serial.print("Please type a password (min. 12 bytes): ");
      while (true) {
        if(Serial.available() > 0) {
          in_char = Serial.read();      // read a character until
          if (in_char == '\n') {        // ENTER is pressed
            password[pw_index] = '\0'; // end the pw string with NULLBYTE
            pw_length = pw_index;
            break;                      // and get out
          }
          password[pw_index] = in_char; // store the character
          pw_index++;                   // increment where to write next
        }
      }
      Serial.println(password);
      // write the password to EEPROM beginning with address 0
      for (eeprom_addr=0; eeprom_addr<pw_length;eeprom_addr++) {
        EEPROM.write(eeprom_addr, password[eeprom_addr]);
      }
    }

    void loop() {
      for (eeprom_addr=0; eeprom_addr<pw_length;eeprom_addr++) {
        in_char = EEPROM.read(eeprom_addr);
        Serial.print(in_char);
      }
      Serial.println();
      delay(2000);
    }

When an Arduino program starts, a hardware timer is started. With the millis()-function we can get the time in milliseconds from the beginning. Use two times the following command to find how many milliseconds are needed to write the hash to the EEPROM.
```
    Serial.println(millis());
```

Our password program is very nice, but it is not secure at all!

ESP8266 Arduino boards store normally the cleartext wifi ssid and password in the Flash. If an intruder gets the hardware (e.g. finedust sensor in the garden), he can "lend" the hardware, get the password (near the ASCII text "ssid" or "SSID") and place the hardware back, so that no one will notice.

We will use an md5 hash in this example, so that no cleartext is shown. Even if md5 is not absolutely secure, it will prevent cleartext, and if the source text of your program is not known make it very difficult to get the password. We don't use a better encryption to keep the program simple.
Note that the ESP32 has cool hardware possibilities to calculate complex hashes.

Install the md5 library from: https://github.com/tzikis/ArduinoMD5/. Download the zip fileandinstall it by clicking on Sketch > Include Library > Add .zip library....

Now run the following code to write a password once to the EEPROM (note the password on a piece of paper!!).
Write a new program to enter a password over Serial, calculate the hash and send a message when the hash is equal to the hash from EEPROM. This program does not write to the EEPROM!

    /* jdim1_EEPROM_write_md5_password.ino
     *  
     * Write the md5 hash of a password to the EEPROM
     * 
     * Install the md5 library from:
     * https://github.com/tzikis/ArduinoMD5/
     * Download zip and then: Sketch -> Include Library -> Add .zip library...
     */

    #include <EEPROM.h>
    #include <MD5.h>

    char password[30];      // array to store the password
    char in_char = 0;       // incoming serial character
    word eeprom_addr = 0;  // EEPROM address
    byte pw_index = 0;
    byte pw_length = 0;     // length of the clear password
    byte hash_length;
    unsigned char* hash;
    char *md5str;

    void setup() {  
      Serial.begin(9600); // initialize serial
      Serial.print("Please type a password (min. 12 bytes): ");
      while (true) {  
        if(Serial.available() > 0) {
          in_char = Serial.read();      // read a character until 
          if (in_char == '\n') {        // ENTER is pressed
            if ((pw_index < 12) || (pw_index > 16)) {
              Serial.println("Your password length must be between 12 and 16"
                             " character!");
              pw_index = 0;
            }
            else {
              password[pw_index] = '\0'; // end the pw string with NULLBYTE
              pw_length = pw_index;
              break;                      // and get out
            }  
          }
          else {
            password[pw_index] = in_char; // store the character
            pw_index++;                   // increment where to write next
          }  
        }
      }  
      Serial.println(password);
      // Calculate the hash
      hash=MD5::make_hash(password); //generate the MD5 hash for our string
      md5str = MD5::make_digest(hash, 16); //generate the hex encoding of our hash
      free(hash);    
      Serial.print("Hash is: ");
      Serial.println(md5str);
      hash_length = strlen(md5str);
      Serial.println(millis());
      EEPROM.write(0, hash_length);
      // write the password to EEPROM beginning with address 0
      for (eeprom_addr=1; eeprom_addr<hash_length+1;eeprom_addr++) {    
        EEPROM.write(eeprom_addr, md5str[eeprom_addr-1]);
      }
      Serial.println(millis());
      free(md5str); // give the memory back (optional only needed in loop)
      // only to test if properly written:
      Serial.print("Number of bytes (address 0 in EEPROM): ");
      Serial.println(EEPROM.read(0));
      Serial.print("Hash in EEPROM: ");
      for (eeprom_addr=1; eeprom_addr<hash_length+1;eeprom_addr++) {
        in_char = EEPROM.read(eeprom_addr);    
        Serial.print(in_char);
      }  
    }

    void loop() { }

Find and note solutions to make the finding of the hash in EEPROM more difficult.

The 32 kibibyte Flash (wiki)

The on-chip program (instruction) memory uses the Flash technology and is In-System reProgrammable (ISP).

The AVR's Flash memory uses Atmels high density non-volatile memory technology and is specially designed to hold the program data (hex file). It can be written/erased only 10000 times (EEPROM 100000 times).

All AVR instructions are 16 or 32 bits wide. The Flash is organized in 16-bit words, so we get place for about 16k instructions.

"Just do it" Memory 2:

How many bits does the program counter PC needs to address the whole Flash of an ATmega32?
How many bits does a memory counter would need to address the Flash of an ATmega32 byte-wise?
Calculate the word and the byte address FLASHEND for an ATmega256 chip.

The Flash is divided in two main sections, the boot program (bootloader) section and the application program section including the interrupt vector table. Both sections have lock bits, that allow write and read/write protection, so reverse-engineering is not easily possible with a locked chip.

Interrupt vector table

At the beginning of the Flash resides the interrupt vector table. We will look at this table in the next chapter. After powering up the hardware loads the address 0x0000 into the program counter and executes a RESET interrupt. The two first words in Flash will tell where the main program is located.

Bootlader section

For In System Programming we can use a programmer like the ATMEL AVRISP mk2. The SPI serial interface is used to program the chip and a 6-pin or 10-pin header connects the programmer to the ATmega chip.

If we don't have a programmer, a second Arduino board can be used to replace the programmer (look here).

To program an Arduino board using a programmer, you have to hold the Shift-key before clickung on Upload (or use Sketch > Uploading Using Programmer). The list of supported programmers is found under Tools > Programmer. You can even use a second Arduino as external programmer (look here).

Arduino was so successful, because we don't need a programmer! There are special commands in Assembly language to write to the Flash! These commands are only used by a bootloader program that was programmed with a programmer to a special memory section at the end of the Flash memory. The bootloader program can reprogram the Flash. Such a bootloader program can be a security problem in IoT devices!

The chips on the Arduino boards are preprogrammed, and have already a bootloader program. If we want to replace a chip in a board, or want to build our own Arduino board, we have to program the bootloader by ourself. This can be done with Tools > Burn Bootloader in the Arduino IDE.

More information can be found at https://www.arduino.cc/en/Tutorial/ArduinoISP.

Use Flash in Arduino

The Flash is not be meant to be changed by the program, but it is possible to read data from the Flash with the Assembly language command lpm (load program memory, indirect addressing). Because we have a huge amount of memory it could be interesting to write constants (e.g. constant text) to the Flash to save SRAM space. If we do this in Assembly language it is important not to overwrite the program, or the bootloader! In Arduino the compiler helps to avoid these errors.

We have two possibilities to use the Flash:

Storing string constants only in Flash memory.
If we use a Serial.print("Text")-function the text is stored in the program, but it is loaded to SRAM when the program runs. To avoid this and store the text in Flash only we can use the F function:
```
    Serial.println F("this text goes to Flash only!!");
```

A second more flexible possibility is to use the program memory directive PROGMEM for the compiler. Naturally the data must be constant (can not change during the program). We have to include a library (#include <avr/pgmspace.h>) and can then use the functions pgm_read_byte(), pgm_read_word(), and pgm_read_dword() to read 1 byte, 2 byte or 4 byte. Here is a little example how to do this:

    /* test_PROGMEM.ino */

    #include <avr/pgmspace.h>

    PROGMEM const int digits[] = {0,1,2,3,4,5,6,7,8,9};

     void setup() {
      Serial.begin(115200);
      for (int i = 0; i<10; i++) {
        int digit = pgm_read_word(&digits[i]);
        Serial.println(digit);
      }
    }
void loop() {}

The number of bytes used in Flash are shown in Arduino after compiling in the output window (File > Preferences > Show verbose output during: both off!).

Let's look at the Flash used by another program not explicitly using PROGMEM:

/* Flash_test_no_PROGMEM.ino */

int digits[] = {0,1,2,3,4,5,6,7,8,9};

 void setup() {
  Serial.begin(115200);
  for (int i = 0; i<10; i++) {
    int digit = digits[i];
    Serial.println(digit);
  }
}
void loop() {}

There is a difference in the global variables used (SRAM) but not in the Flash size. Why is this? The C or C++ compiler (gcc) does optimization. What optimisations is done by passing a flag to the compiler. In Arduino by default the flag -Os is passed. -Os stands for size optimisation. So the Assembly code generated already does use the Flash with lpm for us. Let's try what happens if we pass -O3 for speed optimisation (can be changed in the file platform.txt for Uno or boards.txt for Teensy in the arduino/hardware folder).

For our example with PROGMEM:

The example without PROGMEM:

Now we see a difference in the Flash size.
But we also see a large overhead of additional Arduino code and the toll of using a high-level language instead of the Assembly language. In Assembly language we could write the same program by using less than 100 bytes!

The 2.25 kibibyte SRAM (wiki)

The static RAM (random-access-memory) uses flip-flops to store the bits. SRAM is volatile meaning that data is eventually lost when the memory is not powered.

SRAM is fast, so most operations of our µC are happening in SRAM. The Harvard architecture with it's separated memories and buses for program and data allows instructions to be executed with a single level pipelining. During excecution of an instruction, the next instruction is pre-fetched from the Flash, so allowing instructions being executed in one clock cycle.

We get 5 different sections in our SRAM:

The fast-access 32 GPR (General Purpose working Registers) with 8 bit allow single-cycle arithmetic logic unit (ALU) operations. In one clock cycle, two operands are taken from the GPR, the operation is executed, and the result is stored back in the GPR. Eight of the 32 GPR can be used as (four) 16-bit register. Three of them as indirect address register pointers (X, Y, Z) for data space addressing.and one (Z) address pointer for look up tables in the Flash (assembler command lpm).
Following the 32 GPR we have 64 SPR (Special-Purpose Register) called by Atmel I/O register. These register are control registers, where virtually every bit controls functions of the internal µC hardware. But also the status register or the stack pointer can be found in these 64 Byte. The SPR's can be accessed directly even 32 of them bitwise (assembler commands in, out, sbi, cbi)
For newer ATmega controller like the ATmega328 or the ATmega32u4 with more features, 64 SPR were not enough, so that the memory space from 0x60 - 0xFF (160 byte) is now reserved as extended SPR space. Accessing these register is slower as the data memory commands must be used.
The real data memory beginning with the address 0x100 (2 kibibytes) is accessed direct or indirect (byte-wise) with the load and store commands (assembler commands ld, lds,ldd, st, sts, std).
For interrupts and functions a place in memory is needed to store the return address. This place is called stack and resides normally at the end of the SRAM. This is necessary because the stack addresses are decremented when the stack grows. We will look at this more in detail in the next chapter.

Assembly language

To better understand the access to SRAM memory, and the different addressing modes, let's have a brief glance at the Assembly language.
As we don't want to code directly in binary or hex (which was done 50 years ago), we use the (low-level) Assembly language. Assembly language uses one command per machine instruction. The commands mnemonic is often an acronym or shortcut of the command (e.g. brne for BRanch if Not Equal). Apart the commands, assembler directives, macros, and symbolic labels of program and memory locations are used to facilitate the programming. The source code is a text-only file. This text file is converted into executable machine code by a utility program called assembler.

An overview of all the Assembly language commands can be found here and the complete instruction set here and here.

One instruction consists of the opcode or mnemonic and the operands. We get commands without operand, with one operand or two operands. Here three examples:

	opcode	operand(s)	description
no operand:	`cli`		clear the global interrupt flag
one operand:	`inc`	`r16`	increment the working register 16
two operands:	`ldi`	`r17,120`	load immediate decimal 120 to working register r17

With two operands we have always first the destination and than the source:

Most commands need only one word (16 bit) in Flash. An exception is e.g. the jump command (jmp k) that need 2 words. This can be seen in the Flash vector table where two words are reserved per interrupt to place a jump instruction.

Some commands we will need in our following exercise:

opcode	operand(s)	description	operation	flags	clocks
sbiw	Rd, K	Subtract Immediate from Word	Rd + 1:Rd ← Rd + 1:Rd - K	Z,C,N,V,S	2
ldi	Rd, K	Load Immediate	Rd ← K		1
push	Rr	Push Register on Stack	STACK ← Rr
pop	Rd	Pop Register from Stack	Rd ← STACK		2
sbi	A, b	Set Bit in I/O Register	I/O(A, b) ← 1		2
cbi	A, b	Clear Bit in I/O Register	I/O(A, b) ← 0		2
rjmp	k	Relative Jump	PC ← PC + k + 1		2
rcall	k	Relative Call Subroutine	PC ← PC + k + 1		3 / 4(1)
brne	k	Branch if Not Equal	if (Z = 0) then PC ← PC + k + 1		1 / 2
ret		Subroutine Return	PC ← STACK		4 / 5(1)

The shortcuts for the operands are:

Rd: destination (and source) working register
Rr: source working register
R: result after instruction is executed
K: constant data
k: constant address
b: bit GPR or SPR (3 bit: 0-7)
s: bit in the SREG (3 bit:0-7)
X,Y,Z: indirect address register (X=r27:r26, Y=r29:r28, and Z=r31:r30)
A: SPR address
q: displacement for direct addressing (6 bit: 0-64)

The shortcuts for the flags (see SREG later in this chapter): C: Carry flag, Z: Zero flag, N: Negative flag, V: two’s complement overflow flag, S: sign flag (N ⊕ V), H: Half carry flag, T: Transfer flag, I: global Interrupt flag.

Assembly code needs less space and can be much faster than code from a high-level language. Because today's controller have much more memory and are higher clocked, than some years ago, Assembly code is used less and less. The big disadvantage of Assembly code is to not being portable. Assembly code runs only on the controller for which it's written.

"Just do it" Memory 3:

We will test a blink.asm program written in AVR Assembly language. First run the standard Arduino blink program (Blink.ino. use LED_BUILTIN) on an Arduino Uno (File > Examples > 01.Basics > Blink) and note how many Flash memory is needed.

Create an own folder for this exercise! All files must reside in this folder. The Assembly code is a text file with the ending .asm. So copy the code below to a text only file using your favorite editor and name the file blink.asm (not .txt ! for Windows user, activate the option of displaying the real ending in file explorer). We need a software to "assemble" the text file and create a hex file with the machine code. For this we will use the free assembler from Gerhard Schmidt:
http://www.avr-asm-tutorial.net/gavrasm/index_en.html (Thanks Gerhard :)).
Go to the newly created directory using the cd command (Windows: first open terminal with cmd.exe). Check with ls -l if everything is as expected (Windows: dir). Assemble the file with the following command:

gavrasm blink.asm.

Now we get a hex file (blink.hex), but also a list file, with the Assembly codes and the commands in hex. Add a printout of both files to your documententation.
The assembler directives with a leading dot (big letters) are no commands. They help to get the addresses right. Labels with a colon (big letters) correspond to addresses. The assembler directives LOW() and HIGH() convert decimal numbers to binary numbers (in hex, look at the list file).

      ;*******************************************************************************
      ;*    blink.asm
      ;*******************************************************************************
      .NOLIST
      .INCLUDE "m328Pdef.inc"         ;include AVR definitions file
      .LIST
      .CSEG                           ;code segment: all lines from here go to Flash
      .ORG    0x0000                  ;organize addr. 0x0000 for RESET: program start
      RESET:  rjmp    SETUP           ;jump to SETUP (skip the ISR vector table)
      .ORG    INT_VECTORS_SIZE        ;organize address after vector table
      SETUP:  sbi     DDRB,5          ;PB5 = OUTPUT (Mega PB7)
      LOOP:   sbi     PORTB,5         ;LED on
              ldi     YL,LOW(1000)    ;parameter for subroutine Y = 1000
              ldi     YH,HIGH(1000)   ;(Y is external 16 bit counter)
              rcall   DELAY           ;call subroutine DELAY with parameter 1000ms
              cbi     PORTB,5         ;LED off
              ldi     YL,LOW(1000)    ;delay(1000)
              ldi     YH,HIGH(1000)   ;
              rcall   DELAY           ;
              rjmp    LOOP            ;endless loop
      DELAY:  push    XL              ;save the 4 needed register to stack, so
              push    XH              ;that they are not changed by the subroutine
              push    YL              ;the 2 double reg. X and Y (16 bit) are now free
              push    YH              ;for local use
      DELAYE: ldi     XL,LOW(4000)    ;X = (t-tT)/4tT = 1ms/4*62,5ns ~ 4000
              ldi     XH,HIGH(4000)   ;initialize internal loop with 4000 for 1 second
      DELAYI: sbiw    X,1             ;decrement internal loop (counter X)
              brne    DELAYI          ;get out from internal loop if X = 0
              sbiw    Y,1             ;decrement external loop (counter Y)
              brne    DELAYE          ;get out from external loop if Y = 0
              pop     YH              ;recover the 4 register from stack
              pop     YL
              pop     XH
              pop     XL
              ret                      ;return to main loop (return address on stack)
      .EXIT                            ;end of program

To program our hex file, we will use the programmer included in Arduino named avrdude. We can find the command line we need in our Arduino terminal window at the bottom. For this we have to go to File > Preferences and enable Show verbose output during: for compilationand output. Recompile your Blink.ino sketch and search for the last command line with avrdude in it.
In Linux:
Copy the command line from the Arduino terminal window to a terminal and change only the path to your folder and the name of the blink file. Execute the command to program.
In Windows:
To avoid problems with the path and blank spaces in Windows, we copy the files avrdude.exe, libusb0.dll and avrdude.conf to our folder. They can be found in the Arduino folders C:\Program Files (x86)\Arduino\hardware\tools\avr\bin and C:\Program Files (x86)\Arduino\hardware\tools\avr\etc. Now we strip our command to:

avrdude -v -patmega328P -carduino -PCOMx -b115200 -D -Uflash:w:blink.hex

and execute it after changing the directory to our folder (command cd).
Note how many Flash memory is needed by the assembler program.

Inline assembler

It is also possible to use assembler inside the Arduino (C) code. Here an example with the Assembly instruction nop (no operation) which does nothing, but to kill time (exactly 1/16 µs with a 16 MHz crystal):

    asm ("nop \n");

The Assembly code is encased in parenthesis preceded by the compiler keyword asm or __asm__. The assembler instructions are enclosed inside quotations and terminated with the escape sequence for the linefeed character, '\n' because the avr-as assembler used by Arduino requires a single instruction per line.

By looking in the Arduino libraries we see that inline assembler is used to define among other things new functions. In the header file hardware/arduino/avr/cores/arduino/Arduino.h we find the following code:

    // avr-libc defines _NOP() since 1.6.2
    #ifndef _NOP
    #define _NOP() do { __asm__ volatile ("nop"); } while (0)
    #endif

so we can could _NOP(); instead of asm ("nop \n");.

General purpose register GPR (r0-r31)

The 32 general purpose register (also called register file) are the working horses and are needed by many instructions. Especially all arithmetic and logic instructions pass by these register. In the instruction set Rd (destination) and Rr (source) are used for the GPR. A typical command is the ADD or ADC command:

Mnemonic	Operands	Description	Operation	Flags	Clocks
add	Rd, Rr	Add without Carry	Rd ← Rd + Rr	Z,C,N,V,S,H	1
adc	Rd, Rr	Add with Carry	Rd ← Rd + Rr + C	Z,C,N,V,S,H	1

Special-purpose register SPR

We have already seen the SPR DDR, PIN and PORT, and know that we can manipulate them directly from Arduino (using the whole port or setting and clearing bits with masks). This makes e.g. sense if we want to use a whole port as in our SSD example. It can reduce and simplify the source code.
But another reason can be that we want to use features of our ATmega controller that are not implemented in Arduino (e.g. differential input for the Analog to Digital Converter), or want access to data we don't see in Arduino, like the Status REGister SREG or the Stack Pointer SP.
Often each single bit in the SPR's has it's role to play and must be manipulated separately. For the first 32 SPR this can be done as seen with the Assembly commands sbi and cbi. For the other 32 SPR's and the extended SPR's masking must be used.

A list of the SPR can be found at the end of the data sheet of each controller.

To better understand SPR's, let's take a closer look at one of them:

The Status REGister `SREG` (wiki)

It's one of the most important register. Often it is named also flag register because it marks important states with bits (flag shown, bit set; flag not shown, bit reset).

It is register mostly used by hardware to signal information about the results of operations (bit 0-5). What flags are affected by an operation can be seen in the instruction set. Additionally we have a user flag (T) to use in programs to mark special events and the Interrupt flag to allow or disable interrupts globally. For each flag of the status register exist two commands to set or reset the flag (e.g. set and clt for the t flag).

Status register SREG

7	6	5	4	3	2	1	0
Interrupt	Transfer	Half carry	Sign	oVerflow	Negative	Zero	Carry
`I`	`T`	`H`	`S`	`V`	`N`	`Z`	`C`

Example 1:

For a 16 bit addition per example we will add first the two low bytes with an ADD command. This will set the Carry flag if a carry occurs. The two high bytes will be added with an ADC command such adding the Carry bit if needed:

    ...
            ldi     r16,LOW(1000)   ;16 bit addition first number in r17:r16
            ldi     r17,HIGH(1000)  ;
            ldi     r18,LOW(2000)   ;second number in r19:r18
            ldi     r19,HIGH(2000)  ;
            add     r16,r18         ;result in r17:r16
            adc     r17,r19
    ...

Example 2:

A branch if equal command will branch (jump) to another address if the operation before the breq command is equal, meaning the Zero flag was set. Equal is tested by subtracting two numbers. They are equal if the subtraction is Zero.

    ...
            cpi     r16,100         ;compare r16 to 100
            breq    NEWLABEL        ;branch if equal to label NEWLABEL
            dec     r17             ;decrement r17
            breq    R17ZERO         ;branch if zero to label R17ZERO
    ...

"Just do it" Memory 4:

We want to use the T-flag (user flag) in Arduino and test our masking skills. Add the two missing lines (mask the SREG register with Arduino commands (no assembler) and test the program. Document the source and the output.

    void setup() {
      Serial.begin(9600);
      Serial.println(SREG,BIN);
                                 // Set the T-flag
      Serial.println(SREG,BIN);
                                 // Reset the T-flag
      Serial.println(SREG,BIN);
    }

    void loop() {}

Find the absolut address in SRAM of the SPR SREG in the ATmega328 data sheet.

"Just do it" Memory 5:

This is a good moment to revise some of our skills from the first module. We know the pulse with modulation PWM from the chapter alternating current in ELEctronic FUndamentals. The PWM is a handy way to get a direct current (DC 0-5V) from our Arduino if no Digital to Analog Converter (DAC) is at hand. The width of the pulse determines the height of our DC voltage.
The same method can be used to produce alternating voltages. Let's try with a sine wave. We save 128 values of a sine wave in an array (such arrays can be produced by an online calculator, look here). The command analogWrite() in Arduino will produce a PWM signal corresponding to the value in the array. Test the following program with a Teensy 2.0. Use an oscilloscope to measure the frequency of the PWM (document the oscilloscope screen).

    // sinewave_pwm_analogWrite.ino (https://www.arduino.cc/en/Tutorial/PWM)
    // https://www.daycounter.com/Calculators/Sine-Generator-Calculator2.phtml

    const byte PWM_pin = 14;    // PB5 on Teensy

    byte  sine_wave[128] = {
      128,134,140,146,152,158,165,170,176,182,188,193,198,203,208,213,
      218,222,226,230,234,237,240,243,245,248,250,251,253,254,254,255,
      255,255,254,254,253,251,250,248,245,243,240,237,234,230,226,222,
      218,213,208,203,198,193,188,182,176,170,165,158,152,146,140,134,
      128,121,115,109,103,97,90,85,79,73,67,62,57,52,47,42,37,33,29,
      25,21,18,15,12,10,7,5,4,2,1,1,0,0,0,1,1,2,4,5,7,10,12,15,18,21,
      25,29,33,37,42,47,52,57,62,67,73,79,85,90,97,103,109,115,121
    };

    void setup() {}

    void loop() {
      for (int i=0; i<128; i++) {
        analogWrite(PWM_pin, sine_wave[i]);
        delay(10);
      }
    }

We will get two frequencies, the PWM frequency (digital signal) and the frequency of the analogue sine wave. We learned, that RC filter help to eliminate unwanted frequencies. To do this both frequencies can not be too close because the sharpness of an RC filter is not very high. To get a good result, we chose the frequency of the sine wave 100 times lower, than the PWM frequency. To get that frequency, exchange the delay() command with the delayMicroseconds() command and calculate the right amount of microseconds to get a frequency of about 30 Hz (not audible :().
We will now use a low pass filter to eliminate the higher PWM frequency. The cutoff frequency should be 100 Hz, and we will use a capacitor of 100 nF. Calculate the value of the resistor (f_c = 1/(2πRC)). Build the circuit and document the 30 Hz sine wave with the oscilloscope screen.
Our ATmega32u4 can do better! Fast PWM on Teensy 2.0 is cool because it is very simple to initialise and needs no interrupts. And we can boost out PWM frequency up to 16 Mhz/256 = 62.5 kHz. Because Timer0 is used by Arduino for the millis()-function, we will use the 16 bit Timer1. We have to set the right bits in TCCRnA and TCCRnB register. That's all. The byte in OCR register (8 bit PWM) defines the pulse width from 0 (0 %) to 255 (100 %). Timer1 has 3 PWM outputs (14, 15 and 4). We will use OCR1A on Arduino pin 14 (PB5 on Teensy). Replace the code in setup() and loop() and save the sketch under a different name. Change the RC low pass filter (fc = 4.4 kHz) and the delayMicroseconds()- function to get a 440 Hz sine wave! Document the new calculations and the oscilloscope screen. Add a loudspeaker (+ amplifier) to listen to the sound.
```
    ...
    void setup() {
      pinMode(PWM_pin, OUTPUT);
      // Fast PWM for Teensy Timer 1 (3 outputs)
      TCCR1A = 0xA9; // COM1A1,COM1B1,COM1C1,WGM10;
      TCCR1B = 0x09; // WGM12 (Fast PWM 8 Bit) CK/1  -> 62.5 kHz
      OCR1A = 0;     // pwm off
    }

    void loop() {
      for (int i=0; i<128; i++) {
        OCR1A = sine_wave[i];
        delayMicroseconds(yourNumberHere);
      }
    }
```

"Just do it" Memory 6:

We want to extend the program to get a function generator. Create three additional tables with 128 bytes (saw tooth, rectangle, triangle or arbitrary). Add two switches to your board, to select one of the four sounds. Use the switch statement to test your inputs. Watch the four wave forms on the oscilloscope and listen to the different sounds. Document the program and the screens.

Extended special-purpose register SPR

In newer controller, the 64 SPR were not enough, because of their enhanced features and supplementary hardware (e.g. USB for Teensy/Leonardo (ATmega32u4)). The addresses from 0x60 to 0xFF (96-255) in SRAM are used for the supplementary SPR register. Their access in Assembly language is slower because the commands in and out (1 clock cycle) can't be used, but instead the commands load ld and store st (2 clock cycles).

Data memory

The data memory area is the biggest block of volatile SRAM memory, some times called heap. It is normally used to store variables, tables etc. At a given time, some parts of the heap are in use and some are free (unused and thus available for future allocations). It is the work of the programming person to manage this memory allocation.

Mostly in Arduino we use static variables and memory management is no big concern, except the restricted size of the SRAM. But if we want or have to allocate memory dynamically this can be done in C. For more information you can read: https://en.wikipedia.org/wiki/C_dynamic_memory_allocation. The improper use of dynamic memory allocation is frequently a source of bugs (security issues, program crashes and segmentation faults), so it is best avoided if possible.

If the lack of data memory is a problem, it is better to change the microcontroller. New boards with better controller and more memory are often available for the same price (e.g. Arduino Uno vs ESP-boards).

Stack (wiki)

The stack is a part of the memory always needed if we use subroutines (functions) or interrupts (interrupt service routines (ISR)) in our program.

Even if the stack is a part of the SRAM, because of it's importance in programming it will get it's own chapter :).

Tutorials: Microcontroller systems (MICSY)

Memory

Quick links to the subchapters

Introduction

RISC vs CISC

Havard vs Von Neumann

Microcontroller (µC) (wiki)

Memory

The 1 kibibyte EEPROM (wiki)

"Just do it" Memory 1:

The 32 kibibyte Flash (wiki)

"Just do it" Memory 2:

Interrupt vector table

Bootlader section

Use Flash in Arduino

The 2.25 kibibyte SRAM (wiki)

Assembly language

"Just do it" Memory 3:

General purpose register GPR (r0-r31)

Special-purpose register SPR

The Status REGister SREG (wiki)

"Just do it" Memory 4:

"Just do it" Memory 5:

"Just do it" Memory 6:

Extended special-purpose register SPR

Data memory

Stack (wiki)

Interesting links

The Status REGister `SREG` (wiki)