James Whitebread

Masstech CTO James Whitebread has been looking at AI, and how services such as facial recognition can help find and monetise your video content.

What are AI services?

Artificial Intelligence (AI) services are content enrichment and intelligence services, which can vary widely from those which help to identify people within a video through to identifying the sentiment of that video. The variety and depth of AI services has grown exponentially over the past 2-3 years.

AI services can be layered together to perform a wider analysis of a given subject. In the case of the media and entertainment space, analysis of a video can provide useful data such as cast, location, object information, sentiment and conversation. This can be useful in allowing better search and identification of assets to then be repurposed, and further along the supply chain this data could be used to help in the user discovery process, i.e. what one might watch on an OTT service.

What are examples of commonly used AI services?

While AI is extremely prevalent today, it’s only in the past 2-3 years that the number of service providers has increased. Before this, AI was commonly used by security services across the world for many years.

One area there has been great development in, is the introduction of “pre trained” services that have been trained to perform a highly specific task. For example, in the context of facial recognition, the AI has been trained to identify a human face. The process often starts with locating eyes within a wider image, and on successful detection further identifying other elements of a facial structure before continuing to perform deeper analysis and even identification of the face in question. [1]

You’ve possibly already heard of some of the most popular AI services in use today;

Amazon Rekognition and Microsoft Machine Vision are both machine vision services that provide functionality such as object detection, facial detection and much more. Speechmatics provides deep voice recognition allowing customers to understand the context and language of a piece of media, and Google’s Video AI provides video focused machine vision allowing deep video analysis, again for items such as object and location detection. Some providers, such as GrayMeta, are able to aggregate a number of these services across multiple service providers.

Using AI for facial recognition
AI can find and identify people within a video using facial recognition

How can customers utilise these services?

Many of these services are provided by the major providers as an API or command line interface, meaning that services operated at scale require some additional knowledge and integration with business systems to be able to drive the machine learning/AI services. 

Far fewer AI services are available as a fully featured UI. This is often because a direct interface would not be a practical method for customers to upload and manage content to be processed by AI services at scale. UI based AI services typically lend themselves to small buckets of content. 

This creates a challenge for customers to not only adopt AI services but to then make them part of a larger workflow or pipeline.

What’s the link between customer content and AI services?

Customers of AI services typically have their media content (such as video) in media and/or storage management systems, with content likely being stored in on premise systems, such as disk or tape storage. 

In order to process content through an AI service, content typically needs to be offered up as a proxy video or audio file at a lower, more practical bitrate, or indeed as a still image. This requires the creation of proxies or image files, which must then be passed to the AI service; the resulting data is then connected to and interpreted through an appropriate mechanism.

How do we empower our customers in Kumulate

One of our beliefs is to bring the services to where our customer can practically use them. We know that our customers want to use AI to allow them to more easily find the appropriate content within their archives, to be able to edit and process content within the archive, and to be able to provide enriched metadata onto their own platforms. They then provide the enriched metadata to customers such as those operating OTT platforms, who will use it to greatly improve their customer experience.

Search: All of the data we collect and enrich for our customers is utilised within our search process. Rich metadata on location, object or even technical metadata is all searchable in the Kumulate platform.

Display / Use: Once we collect the data from the AI service, not only is the data searchable, but we also present the data back on the video timeline, helping customers to understand where objects are detected, where people are within the video and displaying automatically created subtitles matched to the video timeline.

Onward Delivery: All of this metadata can be wrapped up, for example, as part of a video distribution workflow. This allows video content to be delivered with enhanced metadata, whether that’s e.g. a subtitle track, or providing an Amazon XRAY type service for OTT platforms. 

Masstech has worked and continues to work with leading AI service providers such as Microsoft, Amazon, Gray Meta, Veritone and Speechmatics in order to bring enrichment and analysis services to where our customers store and manage their content. Today, multiple customers are already utilising this ability to enrich content through machine vision and voice recognition services.

Contact us to find out more about our AI integrations.


[1] https://searchenterpriseai.techtarget.com/definition/face-detection#:~:text=Face%20detection%20algorithms%20typically%20start,nose%2C%20nostrils%20and%20the%20iris.

[2] https://aws.amazon.com/rekognition/

[3] https://azure.microsoft.com/en-gb/services/cognitive-services/computer-vision/

[4] https://www.speechmatics.com/product/features/

[5] https://cloud.google.com/video-intelligence

Follow by Email