Video analytics — the automated analysis of terabytes of video content — has a proven track record helping investigators to glean information from surveillance cameras, recognize faces in a crowd, or zoom in on the license plates of suspects. However, researchers know they need more advanced capabilities and software algorithms to go beyond detection and tracking and really understand the relationships between objects in video footage.
Does this open the door for VARs and ISVs? Primary obstacles to the successful use of aggregated surveillance data remain geolocation and processing, both of which provide opportunities for IT solutions providers.
The abundance of video data requires a new system of analytics, with the capability to extract information, generate data sets and identify patterns.
One opportunity is in software development that it is line with the advances in video camera technology, according to Jie Yang, program director at the National Science Foundation, who states that there is now much more digital video data in nonstandard formats — petabytes of it – coming from cell phones, digital cameras, tablets, surveillance systems, and unmanned aerial vehicles.
So-called computer vision research, which focuses on image processing, combined with computer graphics and natural language expertise, will move video analysis beyond object detection to help analysts determine the relationships between the objects, Yang says.
One such forward-looking project is the Visual Cortex on Silicon, an NSF-backed research project led by Penn State University, which seeks to design a machine-vision system that operates much like human vision, allowing computers to record and understand visual content much faster and more efficiently than current technologies.
The Intelligence Advanced Research Projects Activity (IARPA) is involved in a five-year research project called Aladdin Video. Using a mix of audio and video extraction, knowledge representation and search technologies Aladdin processes video, starting with an automatic tagger that looks at the video in the analyst’s queue and creates a sophisticated search index for that information library.
Jill Crisman, IARPA program manager for both Finder and Aladdin Video, explains, “It is sort of like a giant card catalog of information about the videos. Aladdin applies metatags to online video in order to describe its content and assist in multimedia event detection, creating what Crisman calls “a text document for video.”
Another IARPA program, called Finder, is designed to help analysts locate non-geotagged imagery, whether photographs or video. The Aladdin Video program seeks to improve search capabilities for specific events so that analysts can more quickly find the videos most relevant to their needs.
“The goal of the Finder program is to develop tools to help analysts locate where in the world images or video were taken,” explains Jill Crisman.
Ultimately, researchers want to be able to manipulate video the same way they manipulate other data. “We’re looking to exploit pixels and turn them into data that can be combined with (other sources),” said Ken Rice, chief of the ISR integration division at the National Geospatial-Intelligence Agency, at a recent National Institute of Standards and Technology symposium.
One major hurdle is the variety of algorithms currently used to process video. Rice says there could be as many as 20 algorithms to process video coming from Predator unmanned aerial vehicles.
“What we really need is an open framework for video processing,” which should include a way of characterizing what has been done to the pixels during processing, Rice says. That way, experts can factor into their analysis any uncertainty introduced by changes to the video.
The eyes may deceive, but advanced research into video analytics is closing the gap between what analysts see and what they know, and it is creating a new avenue for solutions providers.