Real Time solutions on AT91SAM SoC
This page aims at presenting the two most important real-time Linux solutions, their setup and the expected performances on AT91SAM SoCs. The study covers the following two real-time Linux solutions:
- The board used are the AT91SAM9G20EK, AT91SAM9M10-G45-EK and AT91SAM9263-EK.
- The kernel versions used are the 3.0.10 for the PREEMPT_RT patch and 2.6.38 for Xenomai.
- The root filesystem was generated using Buildroot and was mounted over the network through NFS.
- The toolchain used was gcc-4.4-2010q1 from CodeSourcery.
Stress during benchmarks
RT benchmarks are only meaningful under heavy stress to guarantee to find worst case latencies. The following programs are used to stress the system:
- hackbench to stress the scheduler
- netcat to stress the Ethernet interface and generate a lot of interrupts
- ls and dd for global stress
Here is the script used to create load (doload.sh):
#!/bin/sh
(while true; do cat /usr/share/netcat/netcat.data; sleep 7; done | netcat -vv -l -p 5566 ) &
a=$!
dd if=/dev/zero of=/dev/null &
b=$!
while true; do killall hackbench > /dev/null 2>&1; sleep 5; done &
d=$!
while true; do /bin/hackbench 1 > /dev/null 2>&1; done &
e=$!
while true; do ls -lR / > /dev/null 2>&1; done &
f=$!
sleep $1
kill $a $b $c $d $e $f
When developing real-time applications with a system such as Linux, the typical scenario is the following:
- An event from the physical world happens and gets notified to the CPU by means of an interrupt
- The interrupt handler recognizes and handles the event, and then wake-up the user-space task that will react to this event
- Some time later, the user-space task will run and be able to react to the physical world event
Real-time is about providing guaranteed worst case latencies for this reaction time, called latency.
All the following benchmarks will measure this latency.
There is latency at different stage of the system: first the interrupt latency itself, then the time spent in the interrupt handler, then the latency of the scheduler and finally the time spent in the scheduler:
For better performances, the task can be done in the kernelspace or in an interrupt handler. The drawback in theses cases is that the developpment is more difficult and the code have a restraint accces to kernel or userspace facilities.
Short presentation
- PREEMPT_RT is a long-term project led by Linux kernel developers Ingo Molnar, Thomas Gleixner and Steven Rostedt.
- The goal is to gradually improve the Linux kernel regarding real-time requirements and to get these improvements merged into the mainline kernel
- PREEMPT_RT development works very closely with the mainline Linux development
- Many of the improvements designed, developed and debugged inside PREEMPT_RT over the years are now part of the mainline Linux kernel
- The project is a long-term branch of the Linux kernel that ultimately should disappear as everything will have been merged
- Development of real-time applications and development of drivers within the PREEMPT_RT patch set applied on the Linux kernel is exactly the same as on a standard Linux kernel. There are no special APIs, no special library. The PREEMPT_RT patch set simply offers more reliable worst-case latencies and more configurable system behaviour (mostly through the configuration of interrupt threads priorities).
Setup
- PREEMPT_RT is delivered as a patch against the mainline kernel
- Best to have a board supported by the mainline kernel, otherwise the PREEMPT_RT patch may not apply and may require some adaptations
- Many official kernel releases are supported, but not all. For example, 2.6.31 and 2.6.33 are supported, but not 2.6.32. (At the end of November 2011 the versions available were 2.6.22, 2.6.24, 2.6.25, 2.6.26, 2.6.29, 2.6.31, 2.6.33, 3.0 and 3.2)
-
-
-
CONFIG_ATMEL_TCLIB
(located in drivers/misc
menu)
You are now running the real-time Linux kernel.
Of course, some system configuration remains to be done, in particular setting appropriate priorities to the interrupt threads, which depends on your application.
Git users can also clone this git tree: git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git
Benchmark
Latency from external signal (through GPIO)
This benchmark reproduces a common use case: a signal comes though a GPIO and after this event a signal is output on another GPIO.
All the code runs in userspace using the sysfs gpio interface.It shows the worst case we can expect when the code handling the event is located in an userspace application. Lower latencies can be achieved by handling directly the event inside the kernel, but implementing application-specific code within the kernel is usually much more difficult. What is measured here is the full scheduling latency plus the time spent in the task to set the GPO including the switch in kernel space:
For AT91SAM9M10-G45-EK and AT91SAM9G20EK boards, GPIOs from the ISI connector are used:
- PB26 is used as input,
- PB10 is used as output,
Both signals are connected to an oscilloscope which is setup in overlay mode in order to get the worst latency between input and output. The oscilloscope triggers on high to low transition on PB26.
The external signal is generated using another board with this kind of command line:
echo 75 > /sys/class/gpio/export
echo out > /sys/class/gpio/gpio75/direction
while true; do echo 1 >/sys/class/gpio/gpio75/value; ./microsleep ; echo 0 >/sys/class/gpio/gpio75/value; ./microsleep ; done
microsleep is just a simple C program calling nanosleep(), for the tests the sleep value was 30 ms.
Here is the script used to setup the GPIO (prepare_gpio.sh) on the target board:
#!/bin/sh
echo 74 > /sys/class/gpio/export
echo out > /sys/class/gpio/gpio74/direction
echo 90 > /sys/class/gpio/export
echo in > /sys/class/gpio/gpio90/direction
echo both > /sys/class/gpio/gpio90/edge
Then the following command line:
./prepare_gpio.sh ; chrt -f 99 ./rttest_gpio & while true; do ./doload.sh 65 ; done
The source code for rttest_gpio.c is available here
rttest_gpio.c
This code is compiled with this simple command line:
arm-none-linux-gnueabi-gcc -Wall -o rttest_gpio rttest_gpio.c -lrt
On the following oscilloscope snapshot, we can see what we get after 12 hours on AT91SAM9M10-G45-EK with a 3.0.10-rt27 kernel PREEMPT_RT_FULL selected:
- The input signal is in green and the output signal is in yellow .
- As the oscilloscope triged on the falling edge of the input signal, we barely see this signal at the left of the snapshot.
- The medium green band is due to a small noise on the reference ground.
- Time base is 1ms and 1V by square.
On the following oscilloscope snapshot, we can see of what we get after 4 hours on AT91SAM9M10-G45-EK with a 3.0.10-rt27 kernel without PREEMPT:
- The input signal is in green and the output signal is in yellow.
- The input signal have a 30ms period. The output signal fill this 30ms and overlap the next input period. So we can only conclude that maximum latency is at least 30ms.
- The medium green band is due to a small noise on the reference ground.
- Time base is 5ms and 1V by square.
On the following oscilloscope snapshot, we can see of what we get after 11 hours on AT91SAM9G20-EK with a 3.0.10-rt27 kernel PREEMPT_RT_FULL selected:
- The input signal is in yellow and the output signal is in green.
- As the oscilloscope triged on the falling edge of the input signal, we barely see this signal at the left of the snapshot.
- Time base is 500us and 1V by square.
On the following oscilloscope snapshot, we can see of what we get after 11 hours on AT91SAM9G20-EK with a 3.0.10-rt27 kernel without PREEMPT:
- The input signal is in yellow and the output signal is in green.
- The input signal have a 30ms period. The output signal fill this 30ms and overlap the next input period. So we can only conclude that maximum latency is at least 30ms.
- Time base is 5ms and 1V by square.
Kernel version used was 3.0.10-rt27, with PREEMPT_RT_FULL and without PREEMPT:
|
|
|
Max Latency for |
Board |
PREEMPT_RT_FULL |
NO PREEMPT |
AT91SAM9M10-G45-EK |
618us |
>30ms |
AT91SAM9G20-EK |
608us |
>30ms |
|
|
|
Timer latency
This benchmark measures the latency timer using
cyclictest a high resolution test program.
What is measured here is the full scheduling latency plus the short time spent in the task to read the current time:
There are serveral ways to test the latency:
- Using timer_settime (high-resolution POSIX timer), option -s with cyclictest
- Using setitimer (high-resolution interval timer), no option with cyclictest
- Using nanosleep (high-resolution sleep), option -n with cyclictest
- Using clock_nanosleep (high-resolution sleep with clock selection), option -n -s with cyclictest
Here are the results for AT91SAM9G20 and AT91SAM9G45:
- AT91SAM9G20 - 3.0.12-rt30+ PREEMPT RT - $cyclictest -m -a -t -n -p99
- Command used - $cyclictest -m -a -t -n -p99
- Benchmark duration: 44 h 30 min
- Number of samples: 160247021
- Min Latencies: 00016 us
- Avg Latencies: 00087 us
- Max Latencies: 00177 us
- AT91SAM9G20 - 3.0.12-rt30+ PREEMPT RT - $cyclictest -m -a -t -p99
- Command used - $cyclictest -m -a -t -p99
- Benchmark duration: 16 h 40 min
- Number of samples: 060037201
- Min Latencies: 00076 us
- Avg Latencies: 00213 us
- Max Latencies: 02136 us
- AT91SAM9G20 - 3.0.12-rt30+ PREEMPT RT - $cyclictest -m -a -t -s -n -p99
- Command used - $cyclictest -m -a -t -s -n -p99
- Benchmark duration: 9 h 47 min
- Number of samples: 035257192
- Min Latencies: 00020 us
- Avg Latencies: 00100 us
- Max Latencies: 50229 us
- AT91SAM9G20 - 3.0.12 - $cyclictest -m -a -t -p99
- Command used - $cyclictest -m -a -t -p99
- Benchmark duration: 11 h 09 min
- Number of samples: 031919110
- Min Latencies: 00027 us
- Avg Latencies: 00380 us
- Max Latencies: 39823 us
- AT91SAM9G45 - 3.0.12-rt30+ PREEMPT RT - $cyclictest -m -a -t -p99
- Command used - $cyclictest -m -a -t -p99
- Benchmark duration: 54 h 29 min
- Number of samples: 196190335
- Min Latencies: 00070 us
- Avg Latencies: 00219 us
- Max Latencies: 01106 us
- AT91SAM9G45 - 3.0.12-rt30+ PREEMPT RT - $cyclictest -m -a -t -s -n -p99
- Command used - $cyclictest -m -a -t -s -n -p99
- Benchmark duration: 33 h 1 min
- Number of samples: 118900140
- Min Latencies: 00021 us
- Avg Latencies: 00106 us
- Max Latencies: 00264 us
- AT91SAM9G45 - 3.0.12-rt30+ PREEMPT RT - $cyclictest -m -a -t -s -p99
- Command used - $cyclictest -m -a -t -s -p99
- Benchmark duration: 29 h 22 min
- Number of samples: 105720602
- Min Latencies: 00100 us
- Avg Latencies: 02441 us
- Max Latencies: 03861 us
- AT91SAM9G45 - 3.0.12 - $cyclictest -m -a -t -p99
- Command used - $cyclictest -m -a -t -p99
- Benchmark duration: 11 h 02 min
- Number of samples: 030618980
- Min Latencies: 00032 us
- Avg Latencies: 00503 us
- Max Latencies: 94991 us
Short presentation
- Xenomai Started in 2001 as a project aiming at emulating traditional RTOS.
- The initial goal is to make it easier to port programs to GNU/Linux by providing skins mimicking the APIs of traditional RTOS such as VxWorks, pSOS+, and VRTXsa as well as the POSIX API, and a “native” API.
- It was initially related to the RTAI project (as the RTAI / fusion branch), but it is now independent.
- Real-time applications in Xenomai cannot use the Linux device drivers, so special device drivers must be written for the devices that are involved with the real-time tasks. Those drivers cannot benefit from the existing hardware support in Linux, and must be rewritten independently.
- Real-time applications in Xenomai are generally written against the native skin (a Xenomai API) or the POSIX skin (which reproduces the traditional API in Linux but binds it to Xenomai calls)
Setup
- Download Xenomai sources at http://download.gna.org/xenomai/stable/
- Download one of the Linux versions supported by this release (see
ksrc/arch/arm/patches/
)
- Since version 2.0, split kernel/user building model, kernel uses a script called script/prepare-kernel.sh which integrates Xenomai kernel-space support in the Linux sources.
- Kernel
-
-
- Fast Swicth Context Extension (FCSE):
CONFIG_ARM_FCSE_GUARANTEED
-
-
- High Resolution Timer Support
CONFIG_HIGH_RES_TIMERS
(see PREEMPT_RT part)
-
CONFIG_ATMEL_TCLIB
(see PREEMPT_RT part)
- Userspace
/home/gclement/$ export DESTDIR=/home/gclement/rootfs/
/home/gclement/$ arm-linux-gnueabi-gcc `xeno-config --skin=posix --cflags` -o rttest_xeno rttest.c `xeno-config --skin=posix --ldflags`
/home/gclement/$ file rttest_xeno
rttest_xeno: ELF 32-bit LSB executable, ARM, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.31, not stripped
- Another option which is more straightforward is to use Buildroot which now integrates support for Xenomai
Benchmarks
Benchmarks are done using the latency program provided with Xenomai. It comes in 3 flavors:
- User task: periodic user-mode task
As for timer latency benchmark done on with cyclictest, what is measured here is the full scheduling latency plus the short time spent in the task to read the current time:
- Kernel Task: in-kernel periodic task
What is measured here is the full scheduling latency plus the short time spent in the task to read the current time, but here the task is in the kernel space so there is no context switch between kernel and user space:
- Timer IRQ: in-kernel timer handle
What is measured here is only the interrupt latency and the time spent in the interrupt handlers:
It has been tested on 3 boards, with default configuration plus the following ones:
-
CONFIG_HIGH_RES_TIMERS
-
CONFIG_ATMEL_TCLIB
-
CONFIG_ATMEL_TCB_CLKSRC
-
CONFIG_XENOMAI
-
CONFIG_XENO_DRIVERS_TIMERBENCH
-
CONFIG_ARM_FCSE_GUARANTEED
-
CONFIG_XENO_HW_UNLOCKED_SWITCH
Xenomai 2.6 was used with a 2.6.38 kernel.
The commands used for testing were the following:
- For User task:
while true; do ./doload.sh 65 & ; done & latency
- For Kernel task:
while true; do ./doload.sh 65 & ; done & latency -t1
- For Timer IRQ:
while true; do ./doload.sh 65 & ; done & latency -t2
Here are the results for AT91SAM9G20-EK and AT91SAM9M10G45-EK:
- AT91SAM9G20-EK - 2.6.38.8-ipipe+ - $latency -h -H10000 -P99 -T36000 -q
- Command used - $latency -h -H10000 -P99 -T36000 -q
- Benchmark duration: 10 h 00 min
- Number of samples: 35999927
- Min Latencies: 8.720 us
- Avg Latencies: 32.945 us
- Max Latencies: 106.589 us
- AT91SAM9G20-EK - 2.6.38.8-ipipe+ - $latency -h -H10000 -t1 -P99 -T36000 -q
- Command used - $latency -h -H10000 -t1 -P99 -T36000 -q
- Benchmark duration: 10 h 00 min
- Number of samples: 35999817
- Min Latencies: 3.876 us
- Avg Latencies: 19.065 us
- Max Latencies: 42.542 us
- AT91SAM9G20-EK - 2.6.38.8-ipipe+ - $latency -h -H10000 -t2 -P99 -T36000 -q
- Command used - $latency -h -H10000 -t2 -P99 -T36000 -q
- Benchmark duration: 10 h 00 min
- Number of samples: 35999845
- Min Latencies: 0.000 us
- Avg Latencies: 6.607 us
- Max Latencies: 21.240 us
- AT91SAM9M10G45-EK - 2.6.38.8-ipipe+ - $latency -h -H10000 -P99 -T36000 -q
- Command used - $latency -h -H10000 -P99 -T36000 -q
- Benchmark duration: 10 h 00 min
- Number of samples: 36022992
- Min Latencies: 6.720 us
- Avg Latencies: 34.560 us
- Max Latencies: 118.080 us
- AT91SAM9M10G45-EK - 2.6.38.8-ipipe+ - $latency -h -H10000 -t1 -P99 -T36000 -q
- Command used - $latency -h -H10000 -t1 -P99 -T36000 -q
- Benchmark duration: 10 h 00 min
- Number of samples: 35999920
- Min Latencies: 1.087 us
- Avg Latencies: 19.414 us
- Max Latencies: 47.028 us
- AT91SAM9M10G45-EK - 2.6.38.8-ipipe+ - $latency -h -H10000 -t2 -P99 -T36000 -q
- Command used - $latency -h -H10000 -t2 -P99 -T36000 -q
- Benchmark duration: 10 h 00 min
- Number of samples: 35999830
- Min Latencies: -3.835 us
- Avg Latencies: 6.343 us
- Max Latencies: 21.106 us
For AT91SAM9263-EK, each test ran during at least 3 hours.
|
|
|
Max Latency for |
Board |
User task |
Kernel Task |
Timer IRQ |
AT91SAM9263-EK |
105us |
62us |
26us |
|
|
|
- Books
- Building Embedded Linux systems 2nd Edition