Make or Buy - Development on Bare Metal vs. RTOS
When developing embedded systems that are to be real-time capable, one of the first and most important questions is whether the applications should run under a real-time operating system (RTOS) or whether a bare-metal solution should be developed. Bare-metal programming is generally understood to mean that an application is written directly on the hardware without using an external programming interface, i.e. an operating system. Applications access here directly hardware registers of microcontrollers. Here one helps oneself with approaches such as endless loops, which execute tasks with fixed computing time. This sequential execution is only deviated from when an interrupt event occurs. This bare-metal development approach for embedded systems is therefore also known as super-loop.
In contrast to bare-metal programming, RTOS-based embedded systems use an operating system kernel with scheduler or device drivers between hardware and application code. This has the advantage that multithreading becomes possible, which is particularly relevant if instead of a few clearly defined tasks, which are well definable in terms of resources, many tasks are to run on a hardware, of which individual ones can be prioritized comfortably and one bundles the performance of several CPU cores. In other words, real-time operating systems with their schedulers allow processes to be executed in parallel and, above all, flexibly and prioritized, and assume responsibility for the functionality of the overall system.
In general, the operating system kernel and device drivers form an interface between the actual application code and the hardware of the microcontroller. In the case of SYSGO's PikeOS, however, this is somewhat different: Here, a very small microkernel, which brings only the bare essentials and is therefore convenient to certify, is used, while certain drivers run in user space. The drivers are usually stored in a driver partition and addressed using techniques such as interpartition communication or queuing ports. The operating system kernel can also be implemented as a separation kernel and hypervisor, as in SYSGO's PikeOS, which enables strict separation of multiple applications, as required for security- and safety-critical systems, for example.
Bare-metal programming has the advantage that embedded software solutions can be planned in detail for a specific use case on the smallest possible level without having to accept the overhead and possible errors of an operating system. In the optimal case, this results in a customized solution that performs tasks reliably and in a way that conserves resources. This always makes sense if the tasks to be executed are manageable, so that one can ensure the greatest possible degree of deterministic behavior of a system. Use cases can be found above all where controllers can be used.
On the other hand, bare-metal programmers also experience the difficulty of managing high levels of complexity. If projects that are already not so small unexpectedly become larger in the course of the implementation phase, for example if new features are added that were not taken into account in the preliminary planning, it can become difficult to maintain an overview. If, in addition, programming is done in assembler and the documentation for a board is insufficient or the hardware abstraction layer is missing, even an experienced programmer can start sweating. One can accuse operating systems of overhead and distrust their functionality, but they undoubtedly reduce the programming effort by providing functionalities for memory management and also process management and by allowing programming with a high-level language like C, which brings libraries that make the programmer's life easier, but which cannot be used in bare-metal programming. It should not be concealed that also that can be sometimes effortful, e.g. by having to block resources manually with semaphores. However one receives in each case a large measure of organization liberty. This becomes clear when you realize that separating resources in time and space makes it possible for security- and safety-critical applications to run alongside Linux/Android partitions on one hardware.
Weighing up when bare metal programming makes sense and when it doesn't is not always easy, even if there are protégés among those for and against who categorically advocate one or the other. But some (also non-technical) aspects besides those already mentioned, should be considered: In addition to the consideration of how much effort it means to write programs directly for the corresponding hardware and the aspect of being responsible for an application without any ifs and buts, both positively and negatively, it can become a project risk if employees are absent for a longer period of time at short notice. Finding a replacement quickly is not easy with C, Rust or Python programmers, but critical when assembly skills are required. This is especially the case when a project is far advanced and in-depth expertise is needed. On the technical side, reliability considerations are often brought into play. It is certainly true that with increasing complexity, the chance for error-proneness generally increases and, in the best case, a 100 percent third-party application can be monitored, but errors can also occur in bare-metal programming. The advantage of an actuator system is then that certain functionalities are field-tested and follow strict safety standards such as DO-178C in aviation or ISO 26262, and therefore allow a reasonable degree of confidence in the functionality.
Networking and Security
Nowadays, as the degree of networking increases, the submodules of an application or an embedded system are also increasingly coming into focus, whether for maintenance reasons or to monitor the efficiency of a system remotely, so the changeability of existing applications is playing an ever greater role. Certainly, there are modules that will not be networked in the future, for example because one does not want to expose them to a danger in terms of IT security, but to be honest, one will have to say that this will often no longer meet the requirements in the future, so that it will become necessary to modify the systems or the code. This can lead to difficulties because, in the case of embedded bare-metal architectures, the timing and response times are fixed by the structure of the code and must necessarily change when functional changes or additions are made, with the result that the effort and duration of the modification increase. In contrast, an RTOS usually allows great flexibility in networked systems and often comes with security functions built in ahead of time. The question of security will, it must also be said, become louder and users as well as programmers will be confronted with the question of whether they want networked or non-networked systems. For the first case, RTOS like PikeOS from SYSGO offer pre-certified artifacts and support new security concepts like the zero trust approach.
In addition to basic OS functionalities such as scheduling, including time partitioning, a modern RTOS also offers the possibility of (hard-) real-time prioritization of tasks and thus enables deterministic response times for applications. Although interrupts can be used to implement a form of preempting on bare-metal systems, a mature RTOS offers the greatest possible convenience and a rich range of functions here. In return, bare-metal programming offers advantages when a very short and deterministic boot time is required. An RTOS will usually require seconds for the boot process.
Abstraction facilitates Integration
A major advantage of an RTOS is the ecosystem that is available in most cases. Many real-time operating systems have convenient-to-use middleware stacks such as file systems, USB or TCP/IP that can be easily integrated. The same applies to device drivers or other third-party components. Using an RTOS turns these components into plug-and-play components in the software and can drastically speed up software development. The decision to use third-party software can therefore be an important indicator that an RTOS should be preferred over a bare-metal scheduler.
Attempting to design a bare-metal system with such middleware would be very time-consuming and error-prone. In addition, an RTOS makes the development of portable and reusable code much easier, since an RTOS-based solution usually results in firmware with clearly defined tasks that is suitable for reuse. Also, the abstraction between hardware and application provided by an operating system results in hardware changes or integration of third-party modules becoming much easier. In addition, an RTOS is the only option if you want to use modern, powerful processors with MMU, since bare metal does not support virtual addressing.
Safety, Security and Certification
A modern RTOS plays to its strengths when the system to be developed is safety-critical and extensive and requires certification according to national or industry-specific standards, whether in aviation, railroads, medical or industrial applications. Small changes to the code of a bare-metal solution result in the need for a complete, time-consuming and therefore expensive recertification. With an RTOS, the effort is much less because only individual parts require reconsideration. In addition, once parts have been certified, they can be used again, thus generally reducing certification costs.
Since critical systems usually contain less critical components such as a communications stack in addition to the core functionality, a separation kernel- and hypervisor-based RTOS is often recommended here. With such a real-time operating system, a strict separation of critical and non-critical applications can be achieved through resource and time partitioning, so that mixed-criticality systems can also be implemented on only one hardware. The quite understandable concern that interference can arise that compromises safety is countered by proven practice based on strict safety standards, which attests to an extremely high level of reliability for such systems on this basis. Individual partitions can host different guest operating systems, allowing less critical applications to run on, say, an embedded Linux such as ELinOS, while critical applications run on ARINC (aerospace), AUTOSAR (automotive) or the native RTOS. With SYSGO's PikeOS, POSIX partitions can also be implemented to handle mid-critical tasks for which Linux would be a bit too risky.
During partitioning, a static allocation of all available and temporary resources takes place. Each application is guaranteed access to the allocated resources, but has no access to resources on other partitions. The strict enforcement of separation guarantees that all applications are completely separated from each other and can only communicate with each other via the hypervisor or specifically configured channels. This eliminates the possibility that a bug in a non-critical application can propagate through the system and affect critical applications as well. For example, PikeOS allows a Linux-based subsystem and a safety-critical application running its own proprietary operating system to run on a single CPU platform. All partitions run in user mode and do not affect the stable kernel mode.
Because of the strict separation of applications, separation kernel and hypervisor-based approaches also have significant advantages in certifying critical systems. For example, certification artifacts can be reused, and when changes are made, only the new or modified partitions need to be recertified, which both speeds time to market and noticeably reduces certification costs. In addition, PikeOS itself is certified to EAL 3+, so that a high level of IT security can be reliably implemented in addition to functional safety. This is particularly important for applications that communicate with the outside world. Bare metal applications can be secured behind an edge perimeter, but once this hurdle has been cracked, there are no further security concepts that can prevent worse. Heuristic analyses or mitigation approaches such as Control Flow Integrity are particularly useful here.
With its partitioning, PikeOS also offers a relatively simple migration path from bare metal implementations to RTOS-based systems. First, existing bare-metal applications can be installed in their own partitions and then use API calls to communicate with the kernel and, through it, with other applications.
Communicating Systems need RTOS
Ultimately, the decision for bare metal or an RTOS is a classic make-or-buy decision. Economic aspects play a role as well as the question in which direction embedded systems will develop, especially with regard to the degree of networking. Developing small and simple systems on bare metal makes sense, but an RTOS has economic advantages in terms of return on investment (ROI) and a lower total cost of ownership (TCO) as well as technological (in terms of foreseeable, future developments). In addition, the reuse of existing code and, where applicable, certification artifacts means that market readiness is usually fast. For the same reasons, new functions can also be integrated quickly, economically and functionally safe with an RTOS and in particular a hypervisor-based RTOS. In addition, RTOS-based systems are prepared for future developments in terms of IT security, without which functional safety can no longer be guaranteed in today's networked environments.
For developers, the use of an RTOS initially means a learning effort, but this pays off very soon. Appropriate tools and middleware significantly reduce the development effort. This means that the wheel does not have to be reinvented every time and project risks can be mitigated. Finally, for developers of embedded systems, an RTOS shows its true strength in questions of multithreading, since the range of functions brought along with it makes programming much more convenient, while scheduling, when working exclusively with interrupt routines, is difficult to implement in terms of determinism and reliability. With an integrated hypervisor, mixed-criticality systems are also possible, on which, for example, a distance meter can run safely next to an infotainment system, as in a car.
More information at www.sysgo.com/pikeos