Brother, can you spare a GPIO?

The core recommendation of this post will be familiar to an experienced embedded systems engineer, yet it is worthy of note and a useful trick for engineers learning their trade. The “trick” in question:

Design embedded systems with accessible spare microcontroller GPIO(s)

Why?

If the focus is firmware development, then real time responsiveness is likely critical to the performance of some portion of the system’s design. Perhaps a microcontroller’s Interrupt Service Routine (ISR) must respond to a hardware signal within 10 usecs, or a calculation must complete within 1 msec. With a spare GPIO and an oscilloscope, the firmware engineer now has a fast and low overhead method to measure key performance metrics. How might we use this trick? Examples include:

  • Measure the performance of firmware calculations.
  • Confirm the performance of an ISR’s response time, including jitter.
  • Measure key RTOS performance metrics, such as task context switching time and associated jitter.
  • Measure CPU idle time, indirectly measuring CPU usage

To illustrate this trick, we will measure the FreeRTOS task context switch time between two tasks where the task switch is due to a change in semaphore status. The specific target will be the ESP32 and its two-core design. This target recently caught my attention with the following note in the ESP32’s IDF SDK v1.0 release:

FreeRTOS: Task unblocking on other CPU now happens instantaneously

Really? As an engineer, I couldn’t help but wonder about the use of the word “instantaneously.” Is the unblocking speed really instantaneous? With tools in-hand and to help illustrate the usefulness of this post’s “favorite trick,” we seek an answer.

To help answer the question, a baseline measurement is needed. In our test code there are two tasks: a “release” task and a “wait” task. The release task periodically raises our spare GPIO level and then immediately “gives” a semaphore a token. The “wait” task waits forever on the same semaphore and immediately lowers the spare GPIO level when the tasks executes. The code in question is listed below.

Release Task Code:

void release_task(void *pvParameter) {
   //setup the test GPIO before use
   gpio_pad_select_gpio(TEST_GPIO);
   gpio_set_direction(TEST_GPIO, GPIO_MODE_OUTPUT);
   while (1) {
      vTaskDelay(3);
      gpio_set_level(TEST_GPIO, 1);
      xSemaphoreGive(m_sema);
   }
}

Wait Task Code:

void wait_task(void *pvParameter) {
   while (1) {
      xSemaphoreTake(m_sema, portMAX_DELAY);
      gpio_set_level(TEST_GPIO, 0);
   }
}

Main code snippet (baseline):

   xTaskCreatePinnedToCore(&release_task, "release_task", 512, NULL, 5, NULL, 0);
   xTaskCreatePinnedToCore(&wait_task, "wait_task", 512, NULL, 5, NULL, 0);

In this baseline code, both FreeRTOS tasks execute on the same core. Connecting an oscilloscope to our test spare GPIO, we see the following results:

Baseline Results
Baseline Results

These results demonstrate a consistent behavior with the ESP32 and FreeRTOS taking about 7.6 usecs to release the semaphore and task switch to the “wait” task running on the same core as the “release” task. There is also GPIO API overhead in this measurement which we assume is negligible.

We can also examine the “jitter” or variation in the time measurement. Most modern oscilloscopes contain a persistence feature. By zooming in to the falling edge of the pulse and turning on persistence, we can also visualize this measurement’s jitter. Here are the results for this baseline setup:

Baseline Jitter
Baseline Jitter

The baseline jitter measurement shows a variation of approximately 10 nanoseconds.

Now, the software is modified with each task running on a separate ESP32 core. The main code snippet is changed to:

xTaskCreatePinnedToCore(&release_task, "release_task", 512, NULL, 5, NULL, 0);
xTaskCreatePinnedToCore(&wait_task, "wait_task", 512, NULL, 5, NULL, 1);

Once again, we confirm our results on the oscilloscope:

Different ESP32 Cores Results
Tasks on Different ESP32 Cores

The results are surprising. The total task context switch time has increased to 12 usecs! The author tried different core settings and the results were consistent:

  • If both tasks were on the same core, regardless of which core, the measured time was 7.6 usecs.
  • If the tasks were on different cores, regardless of which core was used for either task, the measured time was 12 usecs.

FWIW: The “jitter” measurements in the multi-core measurement were nearly identical to the single core measurement.

We have now used our “spare” GPIO to measure a key RTOS performance metric using FreeRTOS and the ESP32. These measurements have only created more questions, especially regarding the contradiction between the ESP32-IDF release notes’ use of the word “instantaneous” and our test measurement results. Imagine if the target embedded system did not have a spare GPIO to help enable such detailed measurements? The firmware engineer working on such a system would likely be in the dark, wandering the streets, and asking:

Brother, can you spare a GPIO?


Followup questions to consider:

  • Is the Visual GDB ESP32 package up-to-date?
  • Is there an inherently longer context switch between ESP32 cores?
  • If this measurement is “instantaneous”, how long was the task context switch between core prior to version 1.0 of the IDF?
  • In these examples, the ESP32 wifi and bluetooth stacks were not enabled. What will the impact be on such measurements when wifi is enabled? Will the jitter increase?
  • In comparison to the FreeRTOS semaphore, how fast might other inter-task communication mechanisms be?  (Queues, etc)

Equipment and targets used during the creation of this post:

3 comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Cove Mountain Software

Subscribe now to keep reading and get access to the full archive.

Continue reading