Engineering Trustworthy Intelligent Software Systems

Cette page dédiée présente l’un des thème / domaine stratégique actuellement en discussion dans le cadre de notre Programme de financement de recherche stratégique. L’ensemble des thèmes / domaines en discussion est indiqué sur la page du programme. Chaque page dédiée (y compris celle-ci) peut être à un niveau de détail et de maturité variable. Pour participer à la discussion sur ce thème ou en proposer de nouveaux, veuillez utiliser ce formulaire. Si vous souhaitez être tenu.e au courant des développements autour de ce thème / domaine, inscrivez-vous ci-dessous.

Engineering Trustworthy Intelligent Software Systems

Description et justification du domaine

Intelligent software systems (ISS) are systems of systems that integrate ML systems with other software systems to enable intelligence. Other than employing ML, these systems introduce new properties and attributes and need to be Trustworthy.

Increasingly, various industry sectors and government areas in Québec and Canada (e.g., healthcare, transportation, information technology, commerce, energy networks) rely on ISS to augment their decision-making abilities. However, trust is still one of the main obstacles towards adopting these systems in practice; it is also the main competitive advantage for those who can provide it. Recently, concerns about whether we can trust an intelligent software system have been discussed in all media outlets, with evidence showing that we may not be ready yet. Take for example, the case of the Uber self-driving car that ran into a pedestrian even though the car’s sensors detected her presence. Unfortunately, the focus in these discussions seems to always be geared towards explainability, an important quality attribute of an ISS that can help in accountability but only focuses on the after effects. The goal of this research program is to enable engineering intelligent software systems that we can trust and not only explain.

In an initial attempt to define a trustworthy system, we define it as a system that is trusted by its users and other stakeholders (e.g., regulation parties). Trustworthiness is beyond correctness, performance, explainability, or reliability. It is an umbrella of quality attributes that include ethics, fairness, legality, safety, security, privacy, availability, usability, business integrity, and much more. It represents all the system’s attributes that a system must faithfully adhere to in order to gain the trust of its users and other stakeholders.

Engineering trustworthy intelligent software systems is the systematic process of defining trustworthiness, collecting requirements from stakeholders, architecting and building trustworthy ISS, verify and validate, as well as releasing, deploying, monitoring, and continuously improving such systems. It is aligned with SEMLA’s mission of building complex intelligent software systems and all three pillars of the Canadian government’s science, technology and innovation strategy (i.e., people, knowledge, and innovation). However, engineering trustworthy intelligent software systems is challenging and requires a shift in software development and assurance practices. SEMLA and its many collaborators wish to develop methods, tools, and technologies allowing the development of trustworthy intelligent applications.

Here are some of the major challenges of engineering trustworthy intelligent software systems: Trustworthiness is ill-defined. There is no clear mapping of the quality attributes of trustworthiness to software application components, practices and patterns. Moreover, there are no quantitative measures for trustworthiness or the attributes that make a system trustworthy. Accordingly, there is a need to define trustworthiness from a software perspective and to provide a set of quantitative measures to evaluate the system’s trustworthiness.
Unlike traditional software, the behaviour of software systems powered by ML is not entirely specified and coded but is learnt from data, and hence unpredictable and continuously evolving. This paradigm shift resulted in non-deterministic software systems that are difficult to reason about their behavior. Such systems are intrinsically challenging to architect, test and verify.
An intelligent software system is a system of systems that requires coordination between all the systems. The uncertainty in different components are multiplied when these components work collaboratively, which leads to additional challenges of ensuring trustworthiness of such complex systems.
Intelligent software systems usually need to be deployed on heterogeneous environments. In addition to being deployed on centralized servers (e.g., cloud), intelligent software systems are usually also deployed on edge devices (e.g., IoTs), in order to enable decisions to be made efficiently by placing the computing near where data is collected and decisions are needed. This poses challenges on the architecture and deployment strategies and models adopted to build and deploy such applications. For example, for a healthcare device to collect patient data and make efficient decisions, a trustworthy intelligent software system should be deployed physically close to the healthcare device, in order to (i) reduce latency and allow real-time operations of edge systems, as the time for sending requests to a machine learning service and receiving the results is saved; (ii) reduce bandwidth and data communication costs, as the transfer of significant amount of real-time data is not required; (iii) improve security and privacy of data, as data is not transferred over the internet and not stored in the cloud; (iv) increase availability of the provided services, as the provided services are still available when connection to other services become unavailable.

To achieve the intended goal, this research needs to provide answers to the following research questions:

What is a trustworthy intelligent software system and how can we measure trustworthiness? In order to engineer trustworthy ISS, we first need to define what is a trustworthy ISS which currently lacks a clear definition. We will apply successful software engineering practices and define the quality attributes under the trustworthiness umbrella and develop quantitative measures to assess these quality attributes.
How to architect and build a trustworthy intelligent software system? To build trust in ISS, we should ensure that the decisions made by its components (including its ML-powered components) are reliable and fair, and that they will not cause harm. Moreover, recent regulations such as European Union’s General Data Protection Regulation (GDPR) enforce the right to explanation, requiring that ML models, and in turn the critical systems that integrate them be explainable and interpretable as well as tamper-proof and secure.
How can trustworthiness be verified? While critical decisions are made based on these systems, we currently lack methods to verify the trustworthiness of the results that they generate. Rising to this challenge requires expertise in both ML and software engineering. Software testing techniques (e.g. property-based testing, metamorphic testing) can be leveraged to check for incorrect behaviors in ML components.
How to continuously deploy, operate, monitor, and evolve trustworthy intelligent software systems? After an ISS is built, it is essential to ensure that it is reliably deployed, operated, and monitored in different environments. ISS typically contain large-sized deep learning models, thus deploying such ISS on edge devices usually involve compression and optimization of the models for the target devices. How to ensure trustworthiness of such systems after the models are compressed remains an open challenge. In addition, as the behaviors of an ISS are usually unpredictable, operations and monitoring of the ISS should consider such unpredictability while ensuring its trustworthiness. Further, we need to be able to continuously evolve ISS based on the feedback from monitoring and preserve its trustworthiness.

Ajout 14/07 : While AI technologies today are reshaping our worlds, doubts about how much we can trust such systems are still on the rise. Unfortunately, up till now developers of AI-enabled systems have limited support on how to bridge the trust gap by design. For this reason, the focus on enabling software developers to build trustworthy intelligent systems should be given the utmost priority.

Mots-clefs :

intelligent systems, artificial intelligence, trustworthy AI, privacy, security, explainable AI, edge AI, legislation, fairness, bias, ethics, transparency and traceability, machine learning, software engineering, system engineering, systems of systems, DevOps, continuous deployment, smart devices, Machine Learning, AI Systems, Trust, Compliance, Ethics, Explainability, Audibility, Reliability, Privacy, Security, Software Engineering / ML-Ops / AI-Ops, intelligent systems, artificial intelligence, trustworthy AI, privacy, security, explainable AI, edge AI, legislation, fairness, bias, ethics, transparency and traceability, machine learning, software engineering, system engineering, systems of systems, DevOps, continuous deployment, smart devices.
(ajout 22/07) CTI (Cyber Threat Intelligence) – Renseignement – espionnage – prédiction – Résilience des systèmes

Organisations pertinentes :

SEMLA, Mila, Polytechnique Montréal, Université de Montréal, Concordia University, University Ottawa, Queen’s University, York University, University Alberta, NII, Propulsion Québec, ENCQOR, FRQNT, Prompt, InnovÉÉ, CRSNG, BMW Group, Bombardier, SAP, Ericsson, IBM, Banque Nationale, Thales, CISCO, TDBank, BMO, Airudi, Radio-Canada, Airudi, MoovAI, HAXIO Machine Vision and Robotics, Banque Nationale (BNC), IBM, Thales, Bombardier, SAP, Mitacs, CRSNG.
(Ajout 22/07) Les organismes étatiques et les associations d’entreprises.

Personnes pertinentes suggérées durant la consultation :

Les noms suivants ont été proposés par la communauté et les personnes mentionnées ci-dessous ont accepté d’afficher publiquement leur nom. Notez cependant que tous les noms des professeur.e.s (qu’ils soient affichés publiquement ou non sur notre site web) seront transmis au comité conseil pour l’étape d’identification et de sélection des thèmes stratégiques. Notez également que les personnes identifiées durant l’étape de consultation n’ont pas la garantie de recevoir une partie du financement. Cette étape sert avant tout à présenter un panorama du domaine, incluant les personnes pertinentes et non à monter des équipes pour les programmes-cadres.

Giuliano Antoniol
Foutse Khomh
Mohammad Hamdaqa
Michalis Famelis
Marios Fokaefs
Ettore Merlo

Programmes-cadres potentiels

To achieve the goal of engineering trustworthy intelligent software systems, we divide our work into four work packages (WPs) based on the research questions listed in the previous section.

Define quality attributes for ISS and design quantitative metrics for evaluation of the quality attributes, with special focus on these new system attributes that emerge from integrating ML systems in the software design process (e.g., software ethicality, privacy, fairness). We will start from the following list of attributes:

Correctness: The main factor for trustworthiness of an intelligent software system is its correctness. For ML-powered intelligent software systems, correctness measures the probability that the ML system under test ‘gets things right’. That directly depends on the quality of the training data, the features extracted from the data for training, the characteristics of the ML models trained, and the correctness of the implementation, and the interactions between the ML components and other software components.
Robustness and security: ML models can be attacked and manipulated for malicious purposes (e.g., unauthorized access to resources) or trigger accidents (e.g., image segmentation error leading to incorrect diagnosis). Sensors and measurement devices may also be subject to noise and ML models must be robust to noisy inputs. In addition, the input data and the operations environment of ISS tend to be continuously evolving, thus a trustworthy ISS needs to preserve its performance under such evolution.
Explainability: Decisions made by ML-powered systems should be explainable, otherwise the systems can not be certified neither will it gain the trust of its stakeholders. Unfortunately, the most advanced and accurate ML techniques (such as Deep Learning (DL)) are also the least interpretable ones.
Fairness: ML models should not be biased towards certain subgroups; for example, they should not discriminate on the grounds of sex, gender, age, ethnic or racial status. Testing for fairness is essential and challenging. Research shows that even when fairness is a design goal, developers can easily introduce discrimination in software.
Privacy: The confidentiality and privacy of user’s data are among the most important challenges in applying machine learning in critical domains such as healthcare, e.g., in data acquisition, reporting, and potential re-identification of patient data.
Ethicality: How can we translate high-level principles like “trustworthy,” “fairness” and “explainable” into clear ethical engineering requirements? How can we ensure appropriate levels of trust in an ML-powered system? It is critical to develop techniques and tools allowing for the audit of these systems.
Transparency: To gain trustworthiness, the decision making process of an ISS should be transparent to its stakeholders. The transparency includes traceability of AI decisions to the data source, and model, and the business logic built on them. Such transparency also requires that the ISS be monitorable.
Efficiency (speed, size, energy consumption): For cloud-based ISS, efficiency is usually not a major concern as it can be mitigated by increasing the capacity of the hardware running the ISS. However, as the computing power and storage capacity of edge devices is limited, ISS deployed on edge devices ISS needs to be efficient in terms of speed, energy consumption and size.
Availability: For cloud-based ISS, the services are usually provided by a number of instances (with duplications) at the same time. Failure of a single instance usually does not cause the failure of the provided services. However, in an edge environment, the services are usually deployed on single devices, thus ensuring the availability of such services becomes particularly challenging.

Design architectures for trustworthy ISS based on the quality attributes defined in WP1.

We will define a set of design patterns and best practices that focus on the newly identified attributes of trustworthiness.

We will define components, their interactions and constraints to support the trustworthiness quality attributes and then use the defined metrics from WP1 to quantitatively evaluate different architectures and optimize them.

We will define a reference architecture (composing components and their interactions) for trustworthy ISS based on the results of evaluating different architectures.

Devise new testing and verification techniques for trustworthiness.

We will devise software testing mechanisms that can verify correctness, robustness, security, privacy, and fairness/bias of ISS. Key underlying techniques will include evolutionary computation and ML to optimize search toward specific cases (data inputs) where the prediction results violate requirements.

We will provide explainability mechanisms (based on model inference, program comprehension, and debugging techniques) that can shed light on how and why predictions are made, under which conditions they can be trusted, both in the learning phase as well as in production.

We will provide ML model adaptation approaches that can diagnose the root-causes of poor predictions and will provide automated techniques to repair models, for example by synthesising new data to retrain models.

Build an automated pipeline to deploy, operate, monitor, and continuously improve ISS running in different deployment environments (e.g., centralized servers and edge devices).

We will develop reliable methodologies and techniques that deploy ISS in different platforms, including optimizing/compressing ML systems for running on edge devices and ensuring the trustworthiness of the ML systems are not impaired when they are optimized/compressed.

We will develop approaches for profiling and scheduling the digital twin services (i.e., services provided by both the centralized servers and edge devices), to plan the workload allocation that takes into consideration all the quality attributes listed above (e.g., data security and privacy) and achieves the highest level of system performance.

We will develop monitoring techniques that ensure high observability and transparency of ISS while imposing minimum overhead on the monitored systems (in particular, those deployed on edge devices) and minimum data communication between edge devices and centralized servers. Such monitoring would ensure the healthiness of the code, data, and model, and provide feedback/alerts in cases of failures, anomalies, or data/model drifts.

We will develop methods and techniques to continuously repair, maintain, and improve the ISS based on the feedback obtained from monitoring, to improve its reliability and robustness. In particular, when data/model shifts are detected, the techniques will automatically and continuously update the model with minimum footprint on the running systems. When failures or anomalies are detected, the techniques will automatically repair or mitigate such failures/anomalies or provide instant alerts or suggestions to the operators.

This project can have wide range industrial impact. Other than all the organizations and institutes mentioned in the proposal, all my current collaborators will be interested in this project.

AI and Cloud Provides: IBM, Microsoft, Google, Facebook, etc.
Autonomous Systems Providers: (e.g., Self Driving Cars Providers, Robotics, etc.)
Intelligent Decision Making Providers for services such as insurance technology, banking, and health informatics, etc.
Governments (Government of Canada)
All the software development community

ML systems are and will be pervasive. We need ways to ensure the robustness, resiliency and trustworthiness of AI enabled systems. Hurles are many just consider the 2015 NIPS Google seminal paper.

Intéressé.e? Entrez votre courriel pour recevoir les mises à jour en lien avec cette page :

Historique

13 juillet 2021 : Première version

15 juillet 2021 : Ajout de compléments d’information, mots-clefs, organisations et personnes pertinentes.

22 juillet 2021 : Compléments d’information section “Contexte” (mots clefs, organisations et personnes pertinentes)

Engineering Trustworthy Intelligent Software Systems