IT Security Architecture for the next Generation of Control and Safety Technology

IT Security Architecture for the next Generation of Control and Safety Technology

Mar 02, 2021

With the increasing use of networked digital technologies in safety-critical areas of railway operations, the requirements for cyber security, which must be considered together with functional safety, are growing. In the HASELNUSS project, a safety platform is being developed that allows the joint operation of safety and security applications on one system.

Digitalisation and networking have now reached safety-critical areas of operational technology (OT) such as the railway sector, where hardware and software monitor and control physical devices. In control and safety technology, so-called object controllers monitor and control the points, signals or axle counters located in the field.

The previously used self-contained and vendor-specific systems, typically characterised by proprietary, monolithic and expensive systems, are increasingly being replaced by standard hardware and software technologies, COTS (Commercial-Off-The-Shelf) and IP-based communication. This increases cost-effectiveness, profitability and maintainability, and enables new customer services in the shape of digital services. However, it also increases the risk of IT attacks. It is no longer sufficient to consider only the functional safety of OT systems (cf. DIN EN 5012 [11]), but cyber security must also be considered. The joint consideration of safety and security is now also required in standards [2, 3].

However, the integration of security mechanisms into a safety-certified OT system is a major challenge. 9] For example, the railway safety standard EN 50128 [4] requires all software components to be certified to the highest safety integrity level (SIL) unless proof of non-reactivity can be provided. IT security functions are mostly realised with open source solutions (for example OpenSSL), which were developed with possibly less stringent development and verification methods than highly safety-critical software components (which are SIL 4 certified, for example). As a result, they cannot easily be certified according to the highest SIL levels. Security mechanisms also have a different life cycle, as newly found vulnerabilities have to be patched. This must be done independently of the safety components to avoid costly recertification. Furthermore, the systems must be protected against manipulation. An attacker with access to the system for example could try to manipulate the firmware.

This article provides an overview of the IT security architecture developed in the HASELNUSS research project (Hardware-based Security Platform for Railway Control and Safety Technology) [5], which is adapted to railway OT systems. Based on a hardware security anchor and special software components, safety and security measures can be realised on a hardware platform.

Safety Requirements

With the DIN preliminary standard VDE V 0831-104131, a standard was developed for the first time that provides recommendations for the definition of IT safety requirements for electrical railway systems. The approaches of IEC 62443 for industrial communication networks were adopted. Based on this pre-standard, 14 IT security requirements were identified for the development of the HASELNUSS architecture. [6] The following requirements were considered particularly relevant: secure storage of cryptographic secrets, assurance of system integrity and detection of manipulations, secure software update mechanisms, detection of attack attempts via the network, secure communication between endpoints, and the possibility of operating both safety and security applications together on one system so that they do not influence each other.

HASELNUSS Architecture

This section describes the HASELNUSS architecture. 8] First, the basic concept is described and how it fulfils the previously described security requirements. Subsequently, the individual components are explained in detail in the following sections.

Basic Concept

The HASELNUSS architecture makes it possible to operate security measures on safety OT systems such as object controllers as shown in Figure 1. However, it can be transferred to other safety-critical systems. It consists of three essential components: a hardware platform with a hardware security module in the form of a Trusted Platform Module (TPM) 2.0, a MILS (Multiple Independent Levels of Safety and Security) operating system and various security applications.

The TPM serves as a security anchor and enables, among others, the secure storage of cryptographic keys (for example, to secure communication connections), Measured Boot to detect manipulations of the system software or Remote Attestation to detect manipulations by authorised external parties. The start of the MILS operating system is monitored by Measured Boot and then enables the joint operation of safety and security applications. Security applications are, for example, anomaly detection procedures that detect attacks via the network, secure software update protocols or a classic firewall. Certified safety applications can run in parallel to the security applications, for example an object controller that communicates with railway interlockings via the RaSTA (Rail Safe Transport Application) protocol [7]. The individual layers of the architecture are described in detail below.

MILS Platform

The MILS (Multiple Independent Levels of Security) platform enables certification of systems with different criticality. [10] Software partitions are strictly separated in space and time and any communication between partitions is controlled. Besides hardware components such as CPU (Central Processing Unit), MMU (Memory Management Unit) or 1/0 MMU, the central software component is the Separation Kernel (SK). This provides separate security domains (so-called partitions), manages the hardware and enforces security guidelines (policies) for information flow, access control and resource availability. Applications can be isolated in different partitions so that there are no interdependencies. However, strictly controlled communication channels can be configured if required. The separation and control mechanisms of the SK are inescapable and implemented in such a small and simple way that the correctness of the code can be easily verified.

Figure 1: HASELNUSS Architecture

In the HASELNUSS architecture, safety and security applications are assigned to individual partitions. The SK ensures the separation of the partitions and distributes the available resources to the individual partitions. Compliance with real-time requirements for safety applications can be configured accordingly. Specifically, the functionality of an object controller is implemented as a SIL4-certified safety application that runs together with non-certified security applications on a hardware platform. In order to have sufficient resources for security applications in addition to the safety applications, the hardware platform used must be sufficiently fast. Existing safety certifications of the safety application can thus remain in place, even if security applications are updated.

The SK represents the "single point of failure" in the MILS architecture and must guarantee the highest safety level required for the system according to EN 50128 [4]. The SKl111 used in the HASELNUSS architecture is certified according to SIL4.

Hardware Platform

The hardware platform of the HASELNUSS architecture shown in Figure 1 consists of the CPU, the main memory, a TPM and interfaces to Ethernet and field elements such as signals, railway gates, etc. The TPM represents the security anchor with which the boot process of the platform is logged in a traceable and tamper-proof manner and can be verified later. This process is called Measured Boot. All components are "measured", including the SK and the partitions. "Measure" means that a cryptographic hash value, similar to a checksum, is created by each component. A component "measures" the next component in the boot sequence, stores the measured value securely in the TPM and then starts the next component. The stored measured values can then be compared with reference values to detect manipulations.

A TPM 2.0 driver for PikeOS serves as an interface for all partitions to communicate with the TPM. The TPM 2.0 Software Stack (TSS 2.0) is used as middleware by applications to communicate with the TPM using the TPM driver.

Figure 2: Exemplary implementation of the HASELNUSS architecture on a hardware platform with three CPUs

In order to implement the HASELNUSS architecture certified to SIL4, fail-safety must be ensured via redundant hardware. Figure 2 shows an exemplary implementation with three CPUs. Two Safe Computing Units, each with one CPU, exclusively execute safety applications. Both perform exactly the same calculations. The two HW Safety Monitors synchronise the two CPUs and compare the results of the calculations. In case of a deviation of the calculated results, the system is set to "Failure Mode" and thus becomes inactive. The 1/0 CPU and the TPM are part of the peripherals and are responsible for the security applications. Only the 1/0 CPU has 1/0 access and can communicate with the rest of the hardware in the system. The TPM requires 1/0 and has a unique secret from which the initial keys are derived. Furthermore, the TPM has a real random number generator that can be used to generate additional secure keys. For this reason, two different TPMs always generate two different keys. Therefore, the TPM cannot be designed redundantly and is directly connected to the 1/0 CPU.

Anomaly Detection

In recent years, an increasing number of attacks on critical infrastructures have been observed that require detailed knowledge of the attacker about the structure and behaviour of the attacked system. Such incidents, known as semantic attacks, exploit system-specific protocols to provoke dangerous conditions in the controlled system and cause damage.

Semantic attacks on railway signalling technology are the switching of occupied points, the clearing of occupied track sections or the setting of protective signals to run. From the IT security's point of view, there are numerous possibilities for attackers to carry out such attacks in a digitalised and networked signalling system if the infrastructure does not have appropriate protective measures. The special feature of semantic attacks is that the attacker injects commands into the signalling communication that cannot be easily distinguished from legitimate commands. Therefore, conventional firewalls and transport layer-oriented intrusion detection systems fail to detect such an attack.

However, an effective protective measure can be anomaly detection systems that understand the semantics of the controlled cyber-physical system (in this case, the signalling technology) and can thus put each command into the current context. By taking into account the complex interrelationships in the infrastructure, anomaly detection is able to recognise and filter dangerous communication introduced by the attacker. Two models are integrated in the HASELNUSS architecture to map the signalling technology for anomaly detection.

The first model uses artificial neural networks (ANNs) that learn the typical behaviour of the infrastructure (e.g. signals, switches, track vacancy signals) in a station using a training data set based on that and predict future behaviour. If the predicted behaviour does not match with what is actually observed, an anomaly, i.e. a semantic attack, can be assumed. By defining the visual threshold to distinguish between normality and anomaly, the error rates of the system can be adapted to the respective requirements.

In contrast to ANNs, whose learning process is subject to a certain degree of randomness, the second model is based on a deterministic, rule-based approach. The dependencies of the field elements from the interlocking logic are mapped in a distributed system on the object controllers of the field elements. By using additional communication channels between the field elements, they are enabled to assess for each incoming control command whether it is a semantic attack.

Remote Attestation

Remote Attestation is used to verify the integrity of object controllers. In the process, the event logs created by Measured Boot on started components and SK partitions and the integrity anchor in the TPM are digitally signed with a private cryptographic key known only by the TPM and transmitted to a verifier. This is located in the interlocking layer, the Maintenance and Data Management (MDM). The verifier evaluates the event logs by checking the digital signatures of the TPM (with the public key) and comparing the measured components with known reference measurements from a whitelist. If the signatures are valid and the verifier knows all components, then it can be assumed that the target system, the object controller, is in a state of integrity.

Further Security Applications

In addition, further security applications can be implemented in the partitions. These include, for example, a firewall that monitors network traffic at the transport level and can filter for IP addresses and ports if necessary. Furthermore, a partition can also take over the termination of a VPN tunnel, so that no additional hardware needs to be set up for such functionality.

An essential security application serves secure software updates. The SK used offers the possibility of replacing partitions with others at runtime. In the HASELNUSS architecture, the secure update process is based on the TPM. Updates are encrypted and integrity-protected by the MDM so that only the TPM with the keys stored in it can check the integrity and decrypt the update. The update is installed in a new partition and the partition loader terminates the old partition and activates the new partition, so that no reboot is necessary for updates.

Conclusion

The safety certification of classic components in control and security technology does not take security measures into account and is not able to evaluate the security of LST systems with regard to an active attacker. However, systems that are "safe" are not necessarily "secure". Safety-critical applications must therefore be equipped with the necessary IT security functions. Until now, the implementation of these functions has been strictly separated from the actual application - sometimes even on physically separate platforms - to facilitate the approval process. The HASELNUSS architecture shows for the first time that IT safety functions and safety-critical applications can be implemented on the same platform, thanks to strong separation through a MILS architecture, and can therefore be more strongly integrated. The HASELNUSS architecture will be tested in practice in the coming year.

Authors

Christoph Krauß, Head of Department
Maria Zhdanova, Research Assistant
Michael Eckei, Research Associate, all Cyber-Physical Systems Security, Fraunhofer Institute for Secure Information Technology SIT, Darmstadt
Stefan Katzenbeisser, Chair of Computer Engineering, University of Passau
Markus Heinrich, Research Associate, Security Engineering, TU Darmstadt
Don Kuzhiyelil, Research Engineer, SYSGO GmbH, Klein-Winternheim
Jasmin Cosic, IT Security for Operational Technology and Processes
Matthias Drodt, Head of IT Security for Operational Technology and Processes LST/TK/ATO, both DB Netz AG, Frankfurt am Main

Sources and Literatures

[1] DIN EN 50129: Bahnanwendungen - Telekommunikationstechnik, Signaltechnik und Datenverarbeitungssysteme – Sicherheitsbezogene elektronische Systeme für Signaltechnik, 2019.

[2] CENELEC. PD CLC/TS 50701 Railway Applications - Cybersecurity, 18.09.2020.

[3] DIN VDE V 0831-104: Elektrische Bahn-Signalanlagen – Teil 104: Leitfaden für die IT-Sicherheit auf Grundlage IEC 62443.

[4] E DIN EN 50128/A1 VDE 0831-128/A1:2019-09, Bahnanwendungen, Telekommunikationstechnik, Signaltechnik und Datenverarbeitungssysteme - Software für Eisenbahnsteuerungs- und Überwachungssysteme.

[5] HASELNUSS Projekt Webseite: haselnuss-projekt.de

[6] M. Heinrich, T. Vateva-Gurova, T. Arul, S. Katzenbeisser, N. Suri, H. Birkholz, A. Fuchs, C. Krauß, M. Zhdanova, D. Kuzhiyelil , S. Tverdyshev, C. Schlehuber. (2019). Security Requirements Engineering in Safety-Critical Railway Signalling Networks. Security and Communication Networks, vol. 2019, Article ID 8348925, 14 pages. doi.org/10.1155/2019/8348925

[7] DIN VDE V 0831 -200: Elektrische Bahn-Signalanlagen – Teil 200: Sicheres Übertragungsprotokoll RaSTA nach DIN EN 501 59 (VDE 0831-159).

[8] H. Birkholz, C. Krauß, M. Zhdanova, D. Kuzhiyelil, T. Arul, M. Heinrich, S. Katzenbeisser, N. Suri, T. Vateva-Gurova, C. Schlehuber. (2018). A Reference Architecture for lntegrating Safety and Security Applications on Railway Command and Control Systems. Zenodo. doi.org/10.5281/zenodo.1314095

[9] C. Schlehuber, M. Heinrich, T. Vateva-Gurova, S. Katzenbeisser, N. Suri. (2017). Challenges and approaches in securing safety-relevant railway signalling. IEEE European Symposium on Security and Privacy Workshops (EuroS&PW).

[10] S. Tverdyshev, H. Blasum, B. Langenstein et al., ,,MILS Architecture," EURO-MILS, 2013.

[11] SYSGO. Pike05 Separation Kernel. www.sysgo.com

Copyright

Deine Bahn 12/2020

Back to the Overview

Professional Articles