Graph analytics and mining
Making sense of the information contained in networks is currently an expensive and relatively inefficient process. The development of new computing platforms would have the advantage of improving the process’s performance and making it accessible to as many people as possible.
Portrait / project description (completed research project)
Computing is usually performed with a network whose state does not vary during the entire process. Yet new connections are constantly appearing, and these are only taken into account when a new computing process is launched. So there is a delay before they are reflected in the result. This project aims to include new connections in currently running computing processes. Their effect therefore becomes apparent almost immediately: if, for example, the journey time on a section of motorway increases following an accident, this can be taken into account immediately and the system can suggest a faster alternative route. Furthermore, the researchers are working to make this mode of computing available on conventional servers and not only on expensive supercomputers.
Background
Network computing seeks to retrieve information using the connections established between the entities in a network. The selection of advertisements presented to Facebook users, for example, is based on their total connections within this network. Having been the subject of research from the earliest days of information technology, this field has been experiencing a revival since the advent of large networks. But new computing platforms are needed to exploit their data.
Aim
The aim of this project is to develop a flexible platform for high-performance computing on large networks. In particular, it will support what are known as dynamic networks, whose structure can evolve during the computing process, and will also function on conventional computers.
Relevance/application
If dynamic computing on very large networks becomes available to more people by means of a platform that works on conventional computers, it will be more affordable and accessible to more laboratories and companies.
Results
The original plan was to explore graph analytics on different platforms, with the intent to explore whether it was possible to build a single platform that would provide good performance on all these platforms, including all combinations of in-core and out-of-core processing and single machine and cluster platforms.
While good progress was made on this front with the 2017 Usenix ATC best paper award and more generally in a thesis in the framework of this project, it quickly became obvious that that goal was too ambitious to be accomplished in the time frame of the project. As a result, the original goals were modified in a number of ways. The original focus on graph analytics was widened to also include graph mining, and this is pursued in a 2021 Eurosys paper.
Most importantly, the storage system underlying the original out-of-core graph processing system proved to have much wider applicability beyond graph processing, and was successfully used as an underlying storage system for general big data processing in a 2018 Eurosys paper. A very surprising outcome of this line of work was that it can be used with great success for distributing database workloads, as demonstrated in a 2020 ASPLOS paper and in a second Ph.D. thesis in the framework of this project.
Finally, we pursued an independent line of inquiry into scheduling for modern multicore computers, which among other things led to a very well received 2018 Usenix ATC paper comparing the most prevalent schedulers used in industry.
Original title
Building Flexible Large-Graph Processing Systems on Commodity Hardware