Data centres: efficient performance monitoring
To enhance user experience, datacentres monitor millions of resource usage series, resulting in big data to gather useful insights. Dapprox derived methods and tools to predict performance anomalies in real time by selecting a key subset of data and proposing solutions to better manage resources.
Portrait / project description (completed research project)
Dapprox is a set of methods and software tools for fast and approximate analyses of resource usage series in real time. The goal of Dapprox was to predict potential anomalies (and propose solutions) by simultaneously taking into account accuracy requirements, maximum delays and available resources. Dapprox first looks for characteristics that are common across servers over time, and then processes only subsets of “key” data in a way that does not sacrifice the accuracy of the results. Particularly, Dapprox can dynamically select and process the optimal amount of data, based on common structures that change over time. Dapprox comprised three work packages: dependency-aware predictive analytics for forecasting, approximate streaming analytics for live data and datacentre anomaly management.
Background
To ensure quality of service and system reliability, datacentres monitor and collect performance logs from many virtual and physical computing resources. The sheer quantity of data generated is so large that it is nearly impossible to always correctly analyse it in real time. Existing analyses tend to be unsophisticated and slow, which leads to delays in addressing performance anomalies and significantly degrades end-user experience.
Aim
The goal was to analyse performance data to better manage computing resources in cloud datacentres and thus to enhance user experience. But rather than analysing all of the data, approximate analytics were developed – i.e. methods and tools based on subsets of data – to predict complex patterns of resource usage series and so-called critical states. Also tools for real-time processing and anomaly analysis were created. Finally, anomaly management policies for cloud datacentres should be proposed.
Relevance
The project developed practical solutions to exploit the value of big data in performance logs from today’s cloud datacentres, to efficiently and approximately process jobs on big data platforms and to enhance users’ computing experience in the cloud.
Dapprox was expected to benefit datacentre practitioners, researchers and users of big data analytics and cloud computing platforms. Because the approach is based on the generic structure of big data, the techniques should be widely applicable to different types of big data (e.g. data from Internet of Things devices) and to different system scenarios (e.g. energy-optimised datacentres).
Resultats
The project advocated to do selective processing on the big data, by leveraging the spatial and temporal dependency, and reserve the computational resources to the critical data. Another reason for such selective data training is due to the amount of “dirty” data that is in the big data sets. Thus strategies to selectively choose informative and accurate data to train robust analytical models were derived. The results also confirmed that the insights of big data come at the expense of privacy, showing a strong trade-off between the data utility and the level of privacy preservation. To address those challenges, the following objectives were achieved:
- Making big data processing lighter and faster: these issues were addressed via strategies of low bit representation, intelligent data subsampling, and hierarchical modelling that is specific to time series models.
- Making the big data processing predictable stochastic models to predict the latency of big data applications, being simple data sorting or complex analytics were developed. Via such a model, one can then make a calculable trade-off between the model accuracy and model training time (and resources required).
- Making big data processing privacy preserving: differential private algorithms that present the privacy leakage through the big data and its analysis were derived. Together with the latency models, one can further expand the criteria portfolio in designing big data analytics, i.e., accuracy, latency and privacy.
- Making big data analytics distributed: various distributed and decentralised learning algorithms such that big data analytics can take place everywhere, more precisely on the premise where the data is collected were investigated.
The last work package on sharing the data centre traces via machine learning models has received attractions from Dutch national science foundation and the industry to commercialise the solution. A tabular data synthesiser such that proprietary data obtained by commercial companies can be shared with the public without worrying about the leakage of data privacy was developed. This development was unexpected and has opened up a new direction for the follow-up project, called tabular data synthesiser, which was funded by Dutch national science foundation to commercialise this idea.
Original title
Dapprox: Dependency-ware Approximate Analytics and Processing Platforms