Video Search: Full of Surprises

On Netflix, I recently watched the movie “Kissing Jessica Stein” where the main character, a straight woman, has a lesbian relationship. Afterwards, Netflix kept recommending lesbian movies to me. I watched the movie because I like the actress Jennifer Westfeldt not because I am partial to lesbian movies. But Netflix’s algorithms concluded that I liked lesbian movies.

YouTube also has problems with its algorithms. A search for “Lady Gaga official video” produces less than perfect results. Along with Lady Gaga official videos featuring Lady Gaga, you also get Lady Gaga parodies, an official video by 77 Bombay Street called “I Love Lady Gaga”, and covers of official Lady Gaga videos performed by all and sundry.

I am reminded of that famous quote by Forest Gump “Life is like a box of chocolates, you never know what you are going to get”. And apparently so is video search.

How do Video Search Engines Work?

To illustrate two different approaches let’s look at Netflix and YouTube.

The Recommendation Model

Netflix uses a recommendation model. In fact, 75% of the Netflix videos people watch are a result of a recommendation.

Netflix produces these recommendations by feeding metadata into algorithms and combining this with human coding . In an August, 2013 interview with Wired Magazine two of Netflix’s top engineers revealed how they control what you watch.

Netflix uses behavioural metadata. It analyzes your browsing, playing and searching activity including the time, date and device you used. This data is fed into algorithms that infer what you like by comparing your activity to others with similar patterns. The assumption is that users with similar patterns will like the same videos.

Analyzing user behaviour is an easy thing for computers to do. But describing and analyzing content is something best done by humans. So Netflix has over 40 freelancers who hand tag TV shows and movies. They are film and TV enthusiasts who objectively describe and categorize Netflix content based on detailed criteria.

In the future Netflix plans to use context in its recommendations. For example viewing behavior may change based on time of day, day of week, location and the device used. What you choose to watch on your phone at 5 p.m. on your way home on the bus from work may be quite different from what you watch at home at 10 p.m. on your tablet.

Netflix primarily operates on recommendation. People use search only when Netflix is not able to show them what to watch.

The High Tech Search Model

Google Videos is the largest video search property on the web. And YouTube is the worlds’ second largest search engine. Google’s vast amounts of search data and its years of experience give it a decided advantage in the world of video search.

Google believes in using technology to solve complex problems. So it is no surprise that Google uses advanced technology to make videos easier to index for search.

Google uses speech recognition technology to make captions and transcripts for videos. This creates text files that Google can then use to index videos for search. However, there are limitations to speech recognition such as poor sound quality, languages not recognized or multiple speakers whose speech overlaps.

Adding videos to its index is a challenging task. Google analyzes the text on the page where a video resides and draws conclusions about the content of the video. If a video is on a page with completely unrelated content then Google will have a hard time knowing what the video is about.

Google can analyze the keywords used to describe a video but if they are inaccurate, missing or unreadable to the search engine the video may be difficult to index properly. Videos in Flash or JavaScript it may not be visible to the search engine at all.

Google tries to overcome some of these challenges by using its knowledge of searches completed by others. You will notice when you type a search into the search box a list of terms others have used appears below. But this does not solve many of the problems inherent with video search.

How will video search work in the future?

The problem with video is that it is much more nuanced than text. The content of a video is subjective and requires human judgement to properly categorize.

For a search model (Google), as opposed to a recommendation model (Netflix), a blend of conventional search and social search is needed. Once videos are indexed using the keywords and other metadata they would be further filtered using the behaviour of the crowd. If videos returned by a search are skipped or not watched by a majority of users doing that search there is a good chance they were not relevant to the search. In the Lady Gaga official video example it would be expected that a majority of users would skip cover tunes of official Lady Gaga videos if they had searched for Lady Gaga official videos.

Google will undoubtedly continue to move in this direction in the future.

I am currently working on a project called MondoPlayer which plays continuous streams of videos from all over the web. It uses a combination of keyword and social search for users who search. I have experience first hand the enormous challenges of providing satisfying video search results.

So next time you search for a video, you will have a new appreciation for how challenging a task this is.

How do you think video search will evolve?

Leave a Reply