Maximizing the Potential of Legacy Content in New Media Asset Management Deployments

Media asset management systems provide excellent tools for managing new content that you ingest and create, but they are often limited in their ability to manage legacy content that existed before the asset management system was commissioned. The ability to use unstructured metadata, including related documents and closed captions, can provide the asset management system with rich, otherwise-unavailable metadata for legacy content. With very little effort, these additional files can significantly increase the searchable metadata, ensuring that assets can be quickly located and leveraged whenever they are needed.

Introduction

Almost 20 years ago, Bill Gates wrote an essay titled “Content Is King.” He may not have originated the phrase, or meant it to apply strictly to the media industry, but those words resonate strongly with us today.

As we look at the ever-changing media landscape, nobody knows for sure all of the means by which we’ll monetize media distribution in the future. What we do know is that consumer demand for great content shows no sign of abating and that the legacy production archives of media enterprises provide a wealth of content that can be tapped for new, revenue-expanding distribution opportunities. While newly-released content has high initial demand among consumers, older content maintains a lot of value. A survey of the Apple iTunes Top Movie Chart1 as of May 2015 shows that while the top 50 purchases are dominated by movies released in 2014 and 2015, the next 100 purchases are split almost equally between recent releases and those that were released in 2013 or earlier (Fig. 1). Furthermore, the reuse of archived content in new productions (when rights permissible) can offer significant cost and time efficiencies compared to the acquiring similar material. Reuse is also vital in productions that rely on historical media to tell their story.

Fig 1. iTunes top 150 movie data as of May 15, 2015, shows that older content maintains considerable viewer interest and revenue opportunities.
Fig 1. iTunes top 150 movie data as of May 15, 2015, shows that older content maintains considerable viewer interest and revenue opportunities.

For these and many other reasons, it is no surprise that content preservation is of great concern in the media and entertainment industry. While much attention is being given to the development of standards for the archiving of content going forward, the ability to optimally use legacy archived content is equally important. Media asset management (MAM) systems provide excellent tools for managing new content, but a key challenge is how to best manage legacy archives that existed before the asset management system was commissioned. This raises two key questions: 1) How do we efficiently transition legacy digital archives to maximize their potential in a new MAM deployment? And 2) How do we associate the legacy assets’ metadata efficiently so that the legacy assets are easily accessible?

Transitioning Legacy Archives to New MAM

The first step in being able to better use legacy, digitally archived media in a new MAM system is getting the old assets into the system. Integration interfaces between MAM systems and storage management systems enable newly-deployed MAM to directly access existing archived assets without changing the old storage management solution, bypassing the need to transition the archived media itself. While this offers an efficient short-term approach, it precludes people from taking advantage of advances in storage management, such as new technologies for future portability or greater storage density. As such, the deployment of a new asset management system is often accompanied by a change or update of the related storage management system.

Tape access standards, such as the Linear Tape File System, and advanced media exchange file formats and containers, enhance asset portability and interchange, ensuring that content and metadata are readable by a breadth of systems beyond the one that created the asset. But such standards are relatively new, and they bring no benefit to handling the vast amount of media assets that were archived before their adoption. As such, a different approach is required to address legacy archives and leverage any metadata stored in an associated database instead of with the media files.

Traditional approaches to such transitions have involved bulk migration and conversion of all archived assets—exporting the media and metadata from the existing archive management and storage system, performing transcodes and conversions as necessary, and then writing it back to the new system. Such processes can clearly be time consuming, particularly when spanning tens or hundreds of thousands of assets. Additionally, access to the assets may be limited until they are available in the new system.

An optimal approach would provide direct access to legacy database data and existing storage media (such as existing digital tape storage) without requiring prior conversions. Desired or required transformations (such as transcoding or repackaging to different formats, or rewriting to newer tape standards) would be performed on demand as the content is needed. Doing so avoids the extensive time and equipment needed to convert all the assets—many of which may not even be needed for months or years—at once. Assets can be automatically transitioned as needed, while the remaining assets can be left as is, or processed as a background task. However, the process must not be visible to operators—they should not need to know whether the content resides on the old or new storage system, nor the details of the storage format.

Efficiently Adding Metadata to Legacy Content

Getting the legacy content into the MAM system is critical, whether it is coming from existing file-based digital archives or being ingested from sources such as video tapes. However, the ability to effectively manage content in order to access assets quickly and easily is as important as getting the legacy content into the MAM system in the first place. Like so many other aspects of media management, the key to solving this problem is metadata. More and better metadata not only makes the content easier for someone to find, but also enable the system to better automate processes to preserve, move, transform, and publish the content.

The perception is that metadata is extremely structured, existing in fields that have strictly defined vocabularies or other rules. However, metadata can be unstructured. Unstructured metadata has distinct advantages over structured metadata. While structured metadata that is added by the capture device or upstream process can be extracted inexpensively, manually-added  structured metadata, and derived structured metadata, such as metadata added by speech and face recognition systems etc., can become extremely expensive to add, either because of the specialized knowledge required, or because of the costs of the automated systems. Multiply these direct or indirect costs across the massive volumes of media often involved in the transition of legacy assets, and the expense can be prohibitive.

Unstructured metadata, such as documents or other text-based assets, is an extremely efficient and cost-effective alternative to expensive structured metadata. Essentially, such “big metadata” can be any files that provide extended information on the asset. By taking advantage of the rich information sources that are already available, we can exponentially increase the amount of metadata that we can use to describe an asset, and better use its information that may be important in the future. Let’s examine a few examples of unstructured metadata sources and how they can dramatically enhance the discoverability and usage of legacy (and new) assets.

Example 1: Television News Scripts

For stories within a news broadcast that contained video, the scripts from the newsroom computer system can be associated with the video assets. These scripts may include not only the story text that is read by the presenter on air but also the transcripts of interviews and other sound bites, graphic cues, editing notes, script metadata, and more (Fig. 2).

News Scripts
Fig 2. News scripts as sources of unstructured metadata.

When people need to find this asset, they have a wealth of additional information that they can search, such as the following:

  • Presenter text: The text that the presenter or journalist reads on air, both as an introduction to the video and while the video is being played back
  • Sound bites: The transcription of the interviews or other audio included in the edited video
  • Location: The location that is typically part of the graphic cues contained within the script
  • Interviewee names and titles: The names and descriptions, such as the titles, of the individuals who were interviewed in the story
  • Approval information: The names of the people who approved the script, video, or both
  • Journalist: The name of the journalist who composed the story
  • Air date: The date when the story was aired, which might be different from the date when the video was created or last modified
  • Newscast: The newscast where the story was broadcast

In addition to allowing the news organization to further use or monetize their existing news video content, these data also enable them to reuse existing content for new productions. Historical interviews with a news figure can be used to generate retrospectives on that individual, or archival footage of a location can be used instead of sending a news photographer out to capture new video.

Example 2: Sports Match or Game Information

Because sports matches contain hours of footage and a variety of similar shots, they can be time consuming to log. However, much of the information that would be manually logged is accessible through a variety of readily available sources. Box scores and match summaries are available for most major sports leagues going back decades, providing information on game scoring and game time at a minimum (Fig. 3). The increased focus on player analytics has caused an explosion in the amount of game information available, with some game logs tracking each play and each player’s involvement in it.

Fig 3. Sports box scores as sources of unstructured metadata.
Fig 3. Sports box scores as sources of unstructured metadata.

The addition of this type of game information has a relatively minor effect on finding game highlights, which are usually easy enough to identify. Where this increased detail has a positive effect is in finding segments that focus on certain personnel, game situations, or peripheral details. For example, if people wanted to find all instances in which a basketball player attempted a shot from a particular area of the court, they could find this information in the game log along with the associated time codes. Or if there was a scandal around an official who had been accused of fixing games, someone could search for that official’s name and retrieve the video for all games that they had been involved in, along with time codes for fouls or penalties that occurred during the games.

Example 3: Production Scripts and Work Orders

Movies and television programs also have scripts whose content can be used to locate the corresponding appropriate asset. Production scripts contain not only the dialogue that is spoken by the actors but also the stage instructions that are followed. The complete script can be associated with the completed program or movie, or sections of the script can be associated with particular video segments, enabling editors to efficiently find the right video segment by searching on the character names, dialogue, or information that might be contained in the stage and production instructions (Fig. 4).

Fig 4. Production scripts and work orders as sources of unstructured metadata.
Fig 4. Production scripts and work orders as sources of unstructured metadata.

The budgetary and logistical information generated during the planning phase of a production can also provide useful background data on the program. Internally, it can document the processes and approvals that were used in the creation of the program. More importantly, the logistical information might include the names of the actors who appeared in the production, the locations where the footage was shot, and the crew and equipment that were involved in the capture, editing, and finishing of the program.

Attaching this information to finished material makes it easy to find programs that were directed by a particular individual or programs featuring a particular actor. At a more granular level, an editor who needs coverage shots from a particular location can search across the daily work orders to locate shots from that location.

Example 4: Governmental Meeting Minutes

Recordings of local, regional, or national government bodies can be difficult to log because doing so demands that person responsible for this task has significant domain knowledge. This expertise is necessary not only because the topics, vocabulary, and rules of order may be foreign to casual viewers but also because it may be difficult to recognize at the time which portions of the session will be interesting in the future. Because almost all government bodies keep detailed minutes, many with transcriptions, these documents can be attached to the video asset and can be used to find anything that was said within the session (Fig. 5).

Fig 5. Government meeting minutes as sources of unstructured metadata.
Fig 5. Government meeting minutes as sources of unstructured metadata.

This is useful not only for finding the session sections that have an easily understood importance but even more so for unearthing key moments whose importance wasn’t recognized at the time. For example, a local politician may suddenly achieve national prominence, driving up interest in speeches that he delivered at a local level. Similarly, a vote or referendum that seemed relatively minor at the time might have longer-term effects, making it more significant in retrospect.

Example 5: Closed Captions or Subtitles

Many countries now mandate that broadcast—and in some cases, online—programming include closed captions. Whether the captions are produced by the content’s creator or by the organization broadcasting the material, closed captioning generally contains the entire transcription of the audio from the program, providing subsequent asset managers with the data and benefits achievable with speech-to-text systems but without all of the processing and added costs incurred by those systems.

For media organizations publishing content in multiple languages, the text for these additional languages would be stored in subtitle or caption files. This extends the searchability of the asset across more than one language, because a separate subtitle or caption file for each language should be attached to the video asset. Because closed captioning and subtitling are referenced to the video’s time code, they have the added benefit of enabling the asset management system to locate the exact spot in the video where the word or phrase was used.

Example 6: Content (Movie, Program, or Book) Summaries

Many programs have a short abstract as part of their structured metadata that was generated by the content creator or the content publisher. This description is used by cable providers, television guides, and other informational systems. However, internet-based services such as the Internet Movie Database can provide a wealth of additional detail that can be easily attached to the content. Because these services often allow consumers to edit their entries, some care must be taken to ensure that the material is accurate.

Example 7: Rights Information

While rights information is sometimes straightforward enough to be entered in a structured metadata field, it can be more complex. The ability to use the asset may depend on a matrix of different criteria:

  • Specific calendar dates
  • Locations
  • Publisher affiliations (network affiliation or ownership)
  • Type of publishing channel (broadcast, Web site, or over-the-top television)

If rights information is attached to the asset, this information becomes searchable. The asset management system may also be able to extract information from the file, validate against this information, or both before publishing the asset.

Conclusion

Managing media assets can be an intimidating task, particularly when it includes a large amount of legacy content. Modern media asset portability standards provide future interoperability between systems, but the transition of earlier file-based archives into a new MAM system deployment is most efficiently handled through a direct-access approach in which assets are updated on demand rather than in bulk. Enabling people to subsequently access these assets easily and efficiently requires associating as much metadata as possible with the assets to make them more discoverable. Unstructured metadata provide an efficient way to do so for a variety of uses, significantly increasing the amount of rich, searchable metadata for these assets without requiring extensive manual processes or expensive systems to extract derived metadata.

References

  1. Apple iTunes Top Movie Chart, iTunes application, May 15, 2015.

Presented at the SMPTE Sydney 2015 Technical Conference & Exhibition, 14-17 July 2015. Copyright © 2015 by SMPTE.

 

Savva Mueller, Director of Product Management, is a widely recognized expert in the newsroom computer systems. Mueller had more than 15 years of experience in product management, quality assurance, field support, and training with Avid and Tektronix when he joined Masstech in 2012 to lead the Enterprise business unit. He is responsible for driving growth in the media asset management and archiving business, developing product roadmaps, evaluating product portfolio, and defining feature requirements. Mueller graduated with a degree in mass communications from the University of Wisconsin-La Crosse. He also worked in production and editing at WTMJ-TV and WKBT-TV.