We consider Large-Scale, Complex IT Systems (LSCITS) to be a mesh of different systems, each of which is a technical or a socio-technical system in its own right. Adding functionality to an LSCITS may involve developing some new software and composing this with newly-procured COTS and with existing systems. LSCITS may therefore be in a state of continual change, with systems and processes added and removed in response to changing organisational needs and ongoing technological advances.
This state of continual change, where parts of the system have no control over changes going on, is the primary contributor to system complexity. It also means that the notion of there being discrete phases in the life-cycle of such systems is a significant over-simplification. Rather, there is a continuous cycle of procurement, development, deployment and decommissioning with component systems being regularly modified and replaced. As a consequence, there cannot be an over-arching system design process where the overall system is designed and developed top-down. Each phase in the procure-develop-deploy cycle includes a range of overlapping processes which interact with each other and with processes in other phases. Thus, procurement decisions influence system development and evolution, design decisions place constraints on how the system will be deployed, deployment issues affect future procurements, and so on.
From a business point of view, the key LSCITS challenges are: (a) ensuring that the system and the organisation's goals remain in alignment as changes are made to both the organisation and the system; (b) ensuring that, as changes are implemented, the system is compliant with external laws, regulations and the expectations of customers and wider society; (c) reducing the time required to make changes to the system in response to external factors; and (d) deploying changes across a large population of users without serious disruption to normal business activity.
There is therefore a fundamental tension between change and stability in LSCITS. Change is inevitable because of factors outside of the system; however, stability in both functionality and system properties is required to ensure that business is not unduly disrupted by change and that the system's properties (availability, reliability, performance, etc.) satisfy both customers and external regulators. Fundamentally, many of the problems that arise in large software projects are a consequence of this tension -- either because there is a premature commitment to stability and requirements are inappropriate, or because changes to the socio-technical environment of a system and their relationship with the technology are not properly understood.
The business challenges identified above lead to a large number of related research questions. Our resources are limited so the specific research issues that we will initially focus on in the LSCITS research programme are:
(1). What can we learn from agile approaches to software development? How can these agile approaches be used in conjunction with other techniques and methods used in LSCITS engineering (e.g. formal verification of system properties)? How should approaches developed to support socio-technical and high-integrity systems engineering be influenced by agile methods?
(2). What information, methods and techniques are required to help us become better at predicting the consequences and costs of change? How can we relate organisational and system structures? How can we assess where process changes are organisationally and socially viable?
(3). How do we deliver safety, security and dependability in a complex distributed system where different parts are designed and operated by independent organisations? How do we maintain certain system properties (notably dependability, security and safety) in the face of continual change? How can we reduce the time and costs of convincing organisations and regulators that these properties have been maintained?
(4). How can we cost-effectively gather information about organisations, culture and operational processes so that it can be used to inform procurement, development and deployment processes? How can we use this knowledge of organisations, cultures and operational processes to assess whether or not a system or component is likely to be integratable with other systems?
(5). How do we design for failure (in complex systems of systems, we assume that a top-down approach to fault and failure avoidance is impractical)? What information do system operators need to facilitate recovery? What level of redundancy and diversity in system components is necessary? How do organisations cope with systems failure, can systems be designed to reduce the likelihood of organisational failure?
(6). To what extent can the configuration and re-configuration of an overall system that is required to deliver changes be automated? What automation techniques and mechanisms may be effective for different types of system?
This is far from a complete set of research questions but it illustrates the nature of research questions for LSCITS engineering. We have identified a wide range of additional thematic research in the proposal and, as discussed there, it is our intention to engage both the EngD community and other researchers in complementary research activities.
Overall, the consortium includes a spread of expertise that can cover the broad questions raised and the proposed work at two or more levels of the LSCITS stack is relevant to each of the sets of research questions set out above.
The relationship between the expertise of partners and the above research agenda is:
Predictable Software Systems (PSS, led by Oxford). This focuses on the application of formal methods and model checking and the synthesis of systems. The challenges addressed include verification for adaptive systems and the scalability of verification to industrial-scale systems -- Issue (1) in the above list, quantitative evaluation of embedded systems (2), and security/dependability (3).
High Integrity Software Engineering (HISE, led by York). This is primarily concerned with processes for developing and validating high integrity systems. It relates specifically to the sets of questions posed in (1), (2) and (3) above but is also concerned with the questions raised in (5) and (6).
Novel Computation Approaches (NCA, led by Bristol). The primary concern of this work is the development of robust and scalable adaptive and decentralized approaches to self-organising, self-managing, and self-repairing LSCITS. This relates primarily to questions (5) and (6) above, but is also relevant to question (3).
Socio-Technical Systems Engineering (STSE, led by St Andrews). This is primarily concerned with operationalising approaches to understanding complexities inherent in socio-technical systems and in understanding how organisations and people cope with failure. It relates specifically to the questions raised in (1), (2), (4) and (5).
Complexity in Organisations (CiO, led by Leeds). This work is primarily concerned with understanding failures and how organisations and people cope with failure and with how the use of a system relates to broader issues of organisational change. It therefore relates specifically to the questions raised in (1), (4) and (5) above.
HEALTH & SOCIAL CARE