WHY DO SO MANY large-scale IT projects go wrong? How can we manage and control the behaviour of systems involving tens or hundreds of thousands of human users, interacting with tens of hundreds of thousands of interconnected computers, such as modern-day financial markets or future digitally enabled national healthcare systems? What happens when these systems grow to involve millions of people and machines, all interacting and adapting?

These are the kind of questions addressed by researchers in the UK national research and training programme - The Large-Scale Complex IT Systems (LSCITS) Initiative. From 2007 until early 2014, the UK Engineering and Physical Sciences Research Council (EPSRC) funded the LSCITS Initiative, which was aimed at developing a new community of researchers and practitioners skilled in dealing with the challenges of ultra-large-scale software dependent socio-technical systems. The funing paid for research conducted in a network of six UK universities by leading academic faculty members, PHD-qualified research fellows and a group of more than 50 PhD level research students, the last of whom is set to conclude their studies in 2016.

The LSCITS Initiative was set up in response to the finding of a consultation process that the UK Government conducted in the late 1990's and early 2000's, polling the views of leading information and communication technology (ICT) companies with a presence in the UK. The key message from these companies was that, with the spread of the internet and the world wide web, previously separate ICT and software dependent sytems would be evermore connected, evermore networked and hence would form supersystems - so called 'systems of systems' - that were critically dependent on the safe and reliable functioning of the ICT, and yet could have major economic and social impacts if they ever went wrong.

The scale and complexity of these socio-economically critical supersystems was increasing rapidly, and there was a genuine concern that the avility to manage and predict their behaviour would not keep pace. This could lead to situations in which the only way we would learn that we did not understand them would be when they suddenly malfunctioned or broke down, with potentially disastrous results. That is, there is a risk that we could find ourselves reliant on ultr-large-scale IT systems that we do not fully understand because of their complexity, and that we cannot effectively manage. Complexity in IT systems stems from their increasing size, the increasing involvement of many different organisations in their construction and use, and the increasing rate of business and social change that these systems have to accommodate. To manage and control complexity, there is a need for better technical tools and methods of systems development. There is also a need for better understanding of the human, social and organisational issues that affect the procurement, development, deployment and use of large-scale complex IT systems.

The LSCITS Initiative's coordinated national network linked industry and academia, providing the skills and knowledge appropriate to dealing with the problems of current and future large-scale complex IT systems across their lifecycles. The Initiative's training programme was intended to contribute to the next generation of systems engineers and technology innovation leaders.

The aim of the LSCITS Initiative was to improve existing approaches to complex systems engineering and develop new sociotechnical approaches that help to understand the complex interactions between organisations, processes and systems. It set out to explore and address issues in:

  • System understanding - the principal functional and non-functional properties of large-scale complex IT systems can often not be completely understood by existing reductionist approaches to engineering and systems management.

  • System interaction - systems interact with their operational environment in many different ways, and the nature of those interactions can change dramatically over time.

  • Systems and organisations - large-scale complex IT systems are specified, developed, used and maintained within organisations or within networks of organisations that may themselves be thought of as complex, adaptive systems. The development, deployment, evolution, and use of these IT systems is thus influenced by human, organisational, business, legal, social, and political factors.

The inherent tensions between stability and change in large-scale complex IT systems requires an approach to research that includes both of these perspectives:

  • Stability - the system's essential properties must be maintained, its key variables kept within the limits of system viability, and its goals must be kept in step with the goal of the organisation or network of interconnected organisations that it serves.

  • Change - agile reaction and adaptation are desirable, reducing the time required to make appropriate changes in response to external pressures and perturbation, and to deploy these changes across organisations. Furthermore, in very large systesm, random failures of individual components or constituents is commonplace, simply because of the very large numbers involved - so-called 'normal failure' - and these failures have to be accommodated. The responses of the system must be robust to such failures.

One of the key research questions is then, how are the essential large-scale complex IT system properties maintained in the face of ongoing change and normal failure? Although there is still no simple answer to this, it is clearly necessary to reason at different levels, such as the implementation level, the intermediate level of development and operational processes, and the higher level of organisational dynamics.

In total, approximately 250 person-years of research will have been conducted by members of the LSCITS Initiative by the time that the programme concludes.

From the outset, the LSCITS Initiative has been committed to a full programme of dissemination and public engagement activities, helping the relevant community of researchers and practitioners, as well as the wider general public, to understand why the Initiative was set up, what it is doing, and what issues remain as significant challenges. Team memvers have addressed this commitment in myriad different ways. For example, LSCITS Initiative researchers have given public lectures, keynote presentations, seminars and media interviews; published papers; and also released teaching materials and research software under creative commons open-source licences.

The LSCITS Initiative team at the University of St Andrews has led the creation of the St Andrews Socio-Technical Systems Engineering Handbook, an online archive of key papers and overview/commentary articles in sociotechnical systems engineering (available from http://archive.cs.st-andrews.ac.uk/STSE-Handbook/index.html).

Via releases under creative commons licences on the SourceForge and Github source-code repositories, the University of Bristol group has made available the software necessary to replicate and extend their published experiments studying interactions between automated trading systems and human traders, and also exploring the dynamics of markets populated entirely by automated trading systems of varying types. Marco de Luca’s open Exchange system is available from (http://sourceforge.net/projects/open- exchange/), Dr John Cartlidge’s Exchange Portal system is available from (http:// sourceforge.net/projects/exchangeportal/) and Professor Dave Cliff’s BSE system is available from (https://github.com/davecliff/ BristolStockExchange). The combined total number of downloads of these three software packages is now more than one thousand.

The Bristol and St Andrews teams have also released source code used in their cloud- computing research. This includes Owen Rogers’ (Bristol) software for exploring financial brokerage and options contracts in the pricing of cloud computing services (http://sourceforge. net/projects/cloudoptions/), James Smith’s (St Andrews) Cloudmonitor system for predicting power usage in cloud computing systems (https:// github.com/jws7/Cloudmonitor) and the Cloud research Simulation Toolkit (CreST), developed at Bristol by Cartlidge, Cliff and various LSCITS Initiative-funded undergraduate interns (http:// sourceforge.net/projects/cloudresearch/).

Teaching material for core taught courses on the LSCITS Initiative Engineering doctorate training programme have also been made available on the web: Ian Sommerville (St Andrews) has released the slides for his course Systems Engineering for LSCITS on the web at (http://www.software-engin.com/teaching/ systems-engineering-for-lscits), and the slides for his course Socio-Technical Systems Engineering at (http://www.software-engin. com/teaching/socio-technical-systems-engineering). Cliff has added several pages to the LSCITS Initiative website from which it is possible to download the slides, reading lists and video lectures for the Technology Innovation LSCITS Engd module, which includes guest lecturers from industry talking about related technology innovation at Google, Hewlett Packard Labs and also in start-up companies, see: (http://lscits.cs.bris. ac.uk/techinnovation.html).