Artificial Intelligence
(Artificial
intelligence for video surveillance utilizes computer software programs that
analyze the audio and images from video surveillance cameras in order to
recognize humans, vehicles, objects and events.)
Today, there are in-depth discussions about what ‘might be’ possible
with AI, machine and deep learning-based analytics products on the market. Much
of the conversation centres on practical ways to utilize deep learning and
neural networks and how these techniques can improve analytics and
significantly reduce false-positives for important events. When talking to end
users about analytics, it becomes clear that many still don’t take full
advantage of analytics. In some cases, this can be due to a history of
false-positives with previous generation technology, but in others, it can
simply be a case of not believing that reliable analytics are achievable for
their unique needs. The good news is that with AI, or more accurately, deep
learning and neural networks, we are making great strides towards increasing
accuracy.
In the past, developers tried to define what an object is, what motion
is and what should be considered an interesting motion that we want to track
versus false positive “noise” that should be ignored. Some classic examples are
wind blowing leaves on a tree, rain, snow or a plastic bag floating by. For far
too long, something as simple as motion detection has been plagued by
false-positives generated by wind. Users could try and reduce the sensitivity,
and eliminate the false motion in a light breeze and still see false positives
with motion events triggered in a wind storm. Using neural networks and deep
learning to define objects such as humans, cars, buses and animals means that
traditional analytics can now focus on objects. These new algorithms have been
taught to recognize a human being by seeing thousands of images of humans. Once
an algorithm has learned these examples, it can apply this advanced knowledge
to existing analytics. For example, if a car crossed a line at an entry gate it
might be acceptable. However, if a person passes that same line, we might want
an alert. A shadow should be ignored and swaying tree branches shouldn’t be
considered. A car driving in the wrong direction warrants an alert, however
people moving about freely is fine. All of the traditional analytics regarding
motion such as appear / disappear, line crossing, object left behind and
loitering will be more accurate and capable of additional refinement using AI
and the power of object recognition.
Practical Applications and Limits for Security and Surveillance
Machine and deep learning-based analytics are actively being marketed
and productized by many vendors with Hanwha Techwin poised to deliver its first
AI-based cameras this year. Existing data models can be purchased from third
party companies, but the question remains: How accurate are they? Additionally,
many of these new AI-based analytics are offered as ‘post processes’ run on a
server or workstation, when a majority of customer use cases demand real-time
feedback.
It takes a significant amount of computational power for a machine to
“self-learn” in the field, so “controlled learning” is best left to powerful
server farms in R&D laboratories and universities for now. Real-time
analytics also require that video streams be uncompressed to be analysed. While
this might make sense for installations with a few cameras connected to a
server, it’s clearly an unacceptable resource drain to decode (open) each
compressed camera stream when hundreds of cameras are involved. Once an
algorithm and data set are created, it can be ‘packed up’ and embedded on the
edge to perform real-time detection and alerts for the specific conditions it
has been trained to recognize. It will not self-learn, but it can provide a
useful function for recognizing known objects or behaviours with minimum
errors.
Having deep learning analytics on the edge —analysing the image before
it is compressed and sent to storage— is clearly what everyone has in mind when
they imagine the usefulness of such technology to alert staff and make
decisions in real time.
Value Proposition of Deep Learning-based Analytics
The more we automate video processing and decision making, the more we
can save operators from redundant and mundane tasks. Computers capable of
sophisticated analysis, self-learning and basic decision making are much better
and faster at analysing large volumes of data while humans are best utilized
where advanced intelligence, situational analysis and dynamic interactions are
required. Increased value comes from the efficiencies gained when each resource
is used most effectively. The goal is to help operators make better-informed
judgements by providing detailed input and descriptive metadata. A primary
focus for AI in cameras today is to reduce, if not eliminate, false positives
in existing analytics. Powerful object recognition allows us to completely
remove common noise events caused by wind, rain and snow. AI-based cameras can
precisely recognize objects such as vehicles and people, as well as capture
identifying characteristics such as colours (up to two per object), carried
items such as bags, backpacks, hats and glasses and even approximate gender and
age. With object recognition, we can make existing analytics much more
powerful—send an alert when a car crosses a line, while allowing humans to
cross with no alert. By capturing additional descriptive metadata, AI-enabled
cameras will make a forensic search within supported VMS more powerful than ever
before.
We see airport security, subways and other mass transit customers as
early adopters of this technology. Detecting objects left behind is crucial in
the world we live in today. The benefits to retail organizations wanting to
optimize their business operations is equally important. It may even be
possible to rank reaction to products based on learned postures and skeletal
movements as the technology continues to evolve. AI-based cameras enable
systems integrators to sell value far beyond a security system. Integrations
with other systems, even customer relationship management (CRM) systems will be
not only possible, but desirable as a way to enhance metrics for retail
establishments.
What to Expect in the Near Term?
Beyond enhanced accuracy of existing analytics, solutions for the near
future might feature hybrid approaches where on-the-edge hardware combines with
server software to deliver a powerful combination of analysis and early warning
and detection capabilities. Comprehensive facial recognition and comparisons
with large databases will not be practical on the edge anytime soon, so that is
a good example of where servers can do some heavy lifting. Object recognition
and classification is perfectly suited for in-camera analytics—the camera knows
it is a face and can pass that metadata directly to a server which can further
interpret the unique characteristics and cross reference with database entries.
We can easily imagine that AI-based analytics might take the same progression
as traditional analytics. The first implementations were server-based, then
there was a migration to a hybrid approach where some algorithms were in the
camera, but a server was still required for in-depth analysis, and ultimately
the entire analytic process could be hosted on the edge.
The Future Looks Bright for Image Classification
Being able to identify what’s going on in a still or moving image is one
of the most interesting areas of development in machine and deep learning. With
moving images, analysis goes beyond simply classifying objects, to classifying
the behaviours and moods of those in the frame. Already, vendors are talking
about ‘aggression detection’ as a desirable feature, but it’s easy to imagine
that the cost to an organization of a false positive or false negative in such
a scenario could be very high. For that reason, Hanwha is taking a conservative
approach as it develops its own algorithms in-house with teams of experts.
Whether it’s big players like
Facebook, Google, or Nvidia, significant investments are being made in AI and
machine learning to classify images and objects as well as text and speech.
Some of this technology development will trickle down into the security
industry, and much of it will have to be custom developed to suit the needs of surveillance
workflows. It may be ok to mis-label a person’s face in Facebook, but security
organizations should not be willing to make such a mistake. This is one of many
reasons why our industry must insist on higher standards.
Customer expectations for AI
and all its variants are understandably high. However, careful consideration is
needed when choosing any AI type solution.
Thanks to Security Today and Hanwha
Techwin, our R&D department.