Effective surveillance solutions need to make complicated scenes simple to understand and act on. In practice, that requires gathering huge amounts of details in video and/or audio streams from what can potentially include a lot of sources. The more devices there are in the system, the greater the amount of potential details-of-interest that there could be.
Ultimately, effective surveillance is about acting upon the details in a scene that matter to system administrators and operators. But what if administrators or operators do not know what information they need details about — or even what they are looking for in a scene, for instance, what is out of the ordinary? The details that matter might relate to a specific person, object or a movement in a scene or type of event that has never previously occurred.
Modern surveillance systems generate an overwhelming (and mostly unused) amount of data. This is especially true when recording video in 24/7 operations, which is essential to capturing evidence, incidents and events. It is not only hard to pick out what really matters in a scene, but also extremely time consuming. Making data more identifiable and actionable is a key problem to solve. Applying metadata descriptors to describe key details in a scene allows data to be more identifiable and actionable.
This is why metadata is the foundation for gathering intelligence from surveillance video and/or audio streams. Metadata provides a fast way to find, evaluate, and act on the singular details that matter the most through one, hundreds or thousands of video and audio footage streams. Metadata is now an essential part of effective security and business operations.
But what is metadata?
A definition is needed. As Google searches will reveal, metadata is typically referred to as ‘data about other data.’ In the context of video surveillance, that would translate to ‘data about video data’. But that can be very broad. To be more specific you need to consider the scene details that matter. These describe the “where, what and how” details about changes to a scene in video a stream.
Video metadata accurately describes the details that matter in a scene in terms of where those details are located, what they are and how they move in a scene.
That means attributes for metadata can describe all sort of details about moving objects of interest, e.g.:
- Location, time, colors, sizes, shapes, coordinates, tracks, volume decibels, speed, voice, duration in scene, direction of travel
In addition, more foundational details like video stream description, codec, time stamps, device identity, etc. can be added.
As well as more foundational details like
- Video stream descriptions, codec, time stamps, device identity, etc.
All of the above are ‘meta’ descriptions of details in or related to a scene.
Based on AI machine and deep learning, meta descriptions can be more (or less) granular. What that means is that the meta descriptions can describe attributes at a high level or at deeper levels. This allows for classifying a group of pixels as a person, animal, vehicle or other pre-defined object classes. Or being more precise with more refined descriptions of people or objects, for instance:
- Sub-type:
- Vehicle
- Car, bus, bicycle, etc.
- License plate
- Model and make
- Vehicle
- Color
- Red, yellow, blue, green, etc.
- Movement characteristics
- Type of movement
- Speed
- Location coordinates
The value of metadata
Metadata not only provides details about people, objects and events in a scene. It also allows large amounts of video and recorded footage to be quickly grouped, sorted, searched, recovered and used. As a result, the overall use cases for metadata fit into three areas:
- Real-time alarm triggering and notifications
- Post event forensic searching
- Statistical analysis and reporting,
Adding intelligence to scenes with metadata
Metadata essentially assigns digital meaning to each video frame about the objects and events within it. In other words, it adds interpretation or intelligence about the scene rather than just the raw video footage which needs to be processed manually by an operator.
Once software can interpret scenes in this way, it can understand the scene details and enable the scene to be acted upon in real-time via events, after events (post-event), via manual search or simply analyzed for statical analysis. This enables the use of metadata to design baselines that define what is ‘normal’ for any scene feed from any individual camera. In turn, this allows software to recognize any degree of deviation, anomaly or specific behavior or activities, etc. as well as predict what will happen in that scene to a specific probability.
Meta data allows many more and new use cases, for example:
- Performing a post event search: e.g. find persons with red clothes in a scene
- Performing an automation rule: e.g. open the barrier for a blue car with Texas license plate number XYZ123
- performing statistical analysis: e.g. count how many cars moved in a specified direction on a road
Video metadata adds immense value to a video management system. In fact, its true potential is realized when applied to multiple inputs - spanning visual, audio, activity, and process-related inputs. Things like RFID tracking, GPS coordinates, tampering alerts, meter readings (e.g. temperature or chemical levels), noise detection, and point of sale transactional data. In the management of any site, these are all high value data sources. They all can be aligned based on their timestamps. Unifying metadata from different sources means gaining much more insights than one can ever get from each (isolated) system alone.
The focus is on interoperability. The IP world can generate yet another big benefit. Open-protocols and industry standards are again essential, allowing for a seamless metadata integration. Massive amounts of data from all kind of systems will help us to gain a faster, deeper and wider understanding of everything that surrounds us.