Flood detection: automatic geotagging of crowdsourced videos
When extraordinary incidents take place, like a flooding after much rain, people record the event with smartphones. Such eyewitness videos may contain interesting information for incident management. This project aimed at identifying relevant videos and preparing their content for use by crisis managers.
Portrait / project description (completed research project)
Humans decide quickly and efficiently whether videos contain relevant information. In a crisis situation, however, there is not sufficient time to watch hours of video recordings. The aim of this project was therefore to develop artificial intelligence methods to evaluate and filter videos. Selected videos were then compared with known data and images from a region and correctly positioned and aligned. Depending on the content and settings, these steps produced more or fewer false selections. Accordingly, the information was visually prepared in such a way that the content of the videos could be spatially and thematically related to other relevant data sets and is, thus, potentially usable for crisis management.
Background
Online platforms may collect and provide video recordings of a wide range of different incidents. However, the identification and communication of relevant content for selected purposes is an open problem. Methods that automatically assess and process eyewitness videos are intended to facilitate the use of such information.
Aim
The objective was to develop and test methods and algorithms to select and prepare information from eyewitness videos to support different applications, for example, crisis management. A challenge was the assessment of the videos for relevance, and then to analyse them regarding content and correctly position and align them geographically.
Developing appropriate presentation for visual processing ensures that the results can be efficiently and beneficially integrated into operational procedures.
Relevance/application
The availability of current relevant information is crucial. For example, crisis managers can draw on it to make the right decisions fast and avoid decisions with negative and costly consequences. Developing information interpretation algorithms, improving precise georeferencing of information and suitable visual communication of findings are methods that are in demand in any application area where (spatial) information from a range of sources needs to be integrated.
Results
Initial expert interviews explained crisis management operation and detailed research questions. Regarding relevance, it could be shown that reliably located video content is potentially relevant and that contextualising videos with other mapped domain data is beneficial.
Video classification algorithms are mostly trained on labelled data sets. To make them more robust to unseen videos, algorithms that perform unsupervised learning of intuitive physics and reason over object-centric decomposition using unlabelled videos were developed and tested. Unlike prior approaches, these methods learn in an unsupervised fashion directly from raw visual images to discover objects, parts, and their relations. They explicitly distinguish multiple levels of abstraction and improve over other models at modelling synthetic and real-world videos of human actions (Stanić et al. 2019, Stanić et al. 2021).
To locate video content more precisely, subparts of the visual localisation pipeline were investigated. Fine localisation was improved with image pose estimation based on a Structure-from-Motion approach relying on approximate position knowledge and reference images (Rettenmund et al. 2018). Tests with different videos showed that the quality of pose estimation is influenced by the differences in viewpoints and the changes of appearance of the environment. The processing pipeline was then adapted and extended (Meyer et al. 2020) to improve robustness regarding the changes in environment. Changes in the viewpoint remain a challenge.
To contextualise video imagery with other domain data and considering the multi-granular nature of the events, visualisations and interactions allowing visual integration of spatial data that holds relevant information at several levels of scale were developed (Hollenstein & Bleisch 2021). Further, a multi-perspective interface for mentally linking street-level images and mapped data was designed and tested (Hollenstein & Bleisch 2022).
Original title
EVAC – Employing Video Analytics for Crisis Management