Real-Time Performance for Multi-Core Designs

May 20, 2020

In Multi-Core-Systemen ist kaum vorhersagbar, wie sich die Kerne gegenseitig beeinflussen, wenn auf gemeinsame Ressourcen zugegriffen wird. Bisher wurde sicherheitshalber das gesamte System gesperrt, solange kritische Codesegmente ausgeführt werden. Aber das muss nicht sein, denn das geht auf Kosten des Echtzeitverhaltens.

It was some time ago that multi-core CPUs superseded single-core processors in desktop computers. Since then, they have caught on in large areas of embedded computing. In terms of their price-performance ratio, the benefits are considerable. As a result, high-performance embedded CPUs with just one core are becoming increasingly scarce. This trend is manifesting itself in areas such as functional safety, which has traditionally opted for single-core processors.

Safety-related systems are highly dependent on the deterministic behaviour of algorithms. This affects not only the outcome of the computation, but also the timing. It needs to be possible to specify a maximum runtime by which the algorithms are guaranteed to have run through to the end. This is the most important criterion in terms of real-time capability. When the scheduler's behaviour is known, the maximum runtime (WCET (worst case execution time)) on single-core processors can be ascertained. One complication is the effect of interrupts, so these are generally disabled, at least while critical code is running. Other influences include caches and the behaviour of memory controllers. Generally speaking, however, the WCET can be calculated using analysis tools. Often, a combination of static code analysis and dynamic runtime analysis is used.

Multi-core processors, on the other hand, experience significant interference between software running on different cores. This interference can be caused by the hardware or the software, and software running on one core can significantly slow the execution of critical code on a different core. It either becomes impossible to establish the WCET or the WCET has to be specified with such a large safety margin that the benefits of a multi-core system are cancelled out. Different industry sectors have reacted in different ways to this situation. For example, since 2013, PikeOS has had SIL 4 certification in accordance with EN 50128 for multi-core platforms for railway applications, whereas the avionics sector, to date, has had all the cores on an MCP deactivated bar one. Changes do now appear to be under way in the aviation industry, however. In fact, a group of certification experts (CAST: Certification Authorities Software Team) has published a position paper that sets out the conditions for real multi-core support. 1

The operating system plays a key role in the behaviour of a multi-core system. PikeOS's flexible time partition management capability allows the time at which an application is to run on a particular core to be precisely specified, as well as its runtime. However, this does not guarantee that the relevant algorithms will be able to run through to the end within their specified time window. In this scenario, system calls must be considered to be critical, especially if they are made on multiple cores at once. One thing to note in particular is that resources must be protected against competing access. Usually, a central lock is activated for this purpose in the kernel space when a system call has been made, so that software can only run on one core during the critical path. In this situation, all other cores must actively wait (spinlock). This design benefits from the fact that it is straightforward to implement. If capacity utilisation is high due to the increased use of system calls, however, it does have drawbacks in terms of scalability.

Granular Locking for improved Performance

For the aforementioned reasons, PikeOS 5.0 uses multiple, resource-specific locks (fine-grained locking) instead of a single, central lock. With this approach, partitions are not sharing any resources and do not cause any conflicts. Additionally, the pieces of code that are protected by locks are significantly shortened.

Figure 1: System-wide locks cause an overhead that becomes stronger the more cores the system has

Figure 1 shows what happens when a central lock is used. Application A, running on the first core, sends a SYSCALL_A to the kernel. The kernel space KA is then entered and the central lock CL is requested there. Because this is currently free, the CL responds immediately and the system call can be executed in the context of the kernel. The lock is then released and the application can be notified of the (successful) execution. In this example, application B runs on a second core and sends the SYSCALL_B shortly after application A. The kernel space is also entered here and the central lock CL requested. However, this is still locked, so the kernel space KB is actively blocked by a spinlock. Once it has been released by the KA, the CL can be unlocked and the KB can therefore execute the actual syscall. The runtime for the application running on the second core is significantly increased.

Figure 2: Symmetrical situation due to resource-specific locks

Figure 2 shows what happens when multiple, fine-grained locks are used. The procedure is essentially the same, except that the kernel requests different locks. In this scenario, the KB therefore does not need to wait for a central lock to be released. The situation is therefore largely symmetrical. The effect would be significantly amplified by the addition of further cores – in other words, a system with a central lock scales very badly.

It should be noted that, when resources have been explicitly divided between partitions, it is still possible that more than one core will request the same lock even when fine-grained locking has been used. This can be adjusted in detail in the configuration by the system integrator.

Linux operating systems have also previously undergone a similar development (for more information, see the history regarding removing the big kernel lock2). However, PikeOS has a micro-kernel with a codebase that is many times smaller. On the other hand, PikeOS has incorporated the partitioning concept directly into the kernel design, meaning that parts of the kernel memory are allocated exclusively to particular partitions and the resources are stored in the memory strictly according to partition and task. This has made defining the local locks easier.

In addition to the reduced WCET, dispensing with the central spinlock has resulted in improved performance because there is no longer any need to dedicate any processor time to active waiting and the application is therefore available directly.

Faster Certification thanks to verified Tools

Another important new improvement that comes with PikeOS 5.0 concerns the build system and the amount of work involved in certifying a complex system. The hypervisor concept enables multiple applications to be run on one machine. These applications are strictly isolated from one another by means of resource and time partition mechanisms. Dedicated communication channels are still possible, however. Segregated from one another, all applications exist in the form of completely separate files. Applications can be in the form of either an individual binary file native to PikeOS or a complex operating system with its own, separate file system. When the operating system starts up, config files are used to control which partition an application runs in, which resources are available and what files and file system they can access.

Two config files are available to the system integrator here:

1. The virtual machine initialisation table (VMIT): Configuration of all partitions, permission to access resources and communication channels

2. ROM file system configuration (RBX): Configuration of all (executable) files that are to be included in the ROM file system

Both config files are in XML format. The VMIT compiler and ROM image builder convert these files into binary format, which is read in and processed by a component of the PikeOS operating system during its runtime. The binary form was chosen because it is less resource-intensive and does not involve a complicated XML parser certification process. Certification does, however, require the correctness of the binary files to be verified. Until now, this involved a complicated process that required additional tools.

Additionally, any change to an XML file forces a repeat of the process. But with the release of PikeOS 5.0, a VMIT compiler and a ROM image builder are available as fully verified tools. This means that the correctness of the files can be established directly based on the original XML files. This considerably speeds up the certification workflow.

Further Improvements

PikeOS 5.0 also features the following new additions and adjustments:

Faster access to kernel drivers
All driver classes also available in the kernel space
Supports file systems in the kernel space
DAL-B certification kit available for PowerPC
Partition callback hooks allow error counters to be implemented, in turn enabling security auditing
The APEX API complies with the latest version of the ARINC 653 standard, Part 1 Supplement 5 and Part 2 Supplement 4 (December 2019).
New BSP: i.MX 8 BSP
Supports the ARMv8 generic interrupt controller (GIC) v3

SYSGO has its own Linux distribution called "ELinOS", which many customers use as a Linux partition directly on PikeOS. The following changes are available in ELinOS version 7 for PikeOS:

Toolchain gcc v8.3
binutils v2.31
Linux Kernel 4.19 with real-time extensions
Standard library glibc v2.2
The host tools support 64-bit native versions on Windows and Linux

Version 7 of the "CODEO" development environment that is required for PikeOS and ELinOS features a number of changes that make it easier for the user to implement project configurations:

The ROM file system is now easier to create.
The validation can now be configured individually on a project-by-project basis
The ROM structure of the PikeOS boot files can now be displayed.
The structure of a binary virtual machine initialisation table can now be displayed in the XML format.
GIT team support is included as standard
The PikeOS monitor displays information about shared memories.
Shared memories are automatically identified on a PikeOS target.
New drag & drop view

»Ready for Take-off«

Version 5.0 of PikeOS boasts significant advancements in terms of both performance and scalability compared to the previous version. Access to kernel drivers is now faster and the system is significantly more scalable with regard to multi-core architectures thanks to the introduction of fine-grained locks. Additionally, all driver classes and file systems are now available in the kernel.

In terms of certifiability, PikeOS 5.0 complies with the CAST-32A position paper and is therefore ready for the use of multi-core processors in the aviation industry. The verified configuration tools make certification of complex systems easier and faster. A DAL-B certification kit is now available for the PowerPC architecture. Further architectures and further certification kits that comply with IEC 61508, ISO 26262 and EN 50128 will follow shortly.

References

(1) https://blog.sysgo.com/cast-32a-multi-core-ready-to-become-airborne

(2) https://de.wikipedia.org/wiki/Big_Kernel_Lock

Back to the Overview

Professional Articles