viisights has created a unique technology for automatic extraction of meaningful data from any kind of video content. Our technology is based on a combination of approaches from such fields as artificial intelligence, computer vision, deep learning and natural language processing. viisights platform automatically splits any video content into scenes, for each scene it extracts the textual, vocal and visual elements and applies advanced machine learning algorithms for understanding the data semantics. All contextual knowledge that is extracted and generated is ranked according to its importance and intensity. When the user profile is available to viisights, the personalization engine figures out what content scenes are most relevant for the user, how and when to target a personal communication and what content should be recommended for the user to watch next.

The platform includes three major components:

  • Video Processing Engine
  • Personalization and User Profiles Engine
  • Product Services


The Video Processing Engine (VPE) automatically analyzes a video content or a live stream for understanding its content. The engine output is a Video Content Descriptive Language (VCDL) that describes what actually is shown and what is the atmosphere (joy, fear) in the various scenes.

VPE main sub-components:

    Video Image/Scene Stripper (VISS)

  • Video Entities Recognition

  • This component is responsible for extracting objects (point-of-interests, clothes, accessories, drinks, etc.), actions, events and themes from the images selected by the VISS and converting them to text description

Video Entities Recognition is based on the following technologies:

  • Deep Neural Network (DNN)
  • Support Vector Machines (SVM)
  • Face Detection
  • Speech recognition
  • Optical Character Recognition

Content Similarity Detection

  • Content Similarity Vector Generator
  • Content Similarity Detection (distance calculation)
  • Used:

    • As built-in for understanding complete scene and events within the content.
    • In search and recommendations use-cases

Post-processing Logic:

  • Cleansing and Filtering
  • Consolidation and Classification
  • Language Detection
  • Data Enrichment
  • Sentiment Detection

Category Mapper & Rater:

  • Map on-screen activity to IAB and customized content categories
  • Rate content for brand and content safety
  • Rank entities and category according to their intensity and importance


  • Personalization Engine - determines the best targeting parameters for specific users in the context of video content and scenes
    • Yield Optimization: determine what the most relevant targeting data for the user is (score is generated to denotes relevancy). Decision is based on current view content , past user performance ,user-profile data, content data and campaign data
  • User Profile Engine - store and manage the user preferences regarding: contextual data, behavioral data, historical data (e.g. content that was seen, response to ad/promotions, watching habits) and demographic data, when available


The system gets an inventory of video clips, processes them and produces tags which describe the video content. These tags can be in several hierarchies as described below, and can be used for various purposes – to analyze the inventory, as part of content recommendation system and as part of targeting system.

The system analyzes three dimensions in each video clip:
  1. Visual content
  2. Auditory content
  3. Textual content:
    1. Meta-data information (e.g. content title, description, production data etc.)
    2. extracted from the visual by OCR clues
The tagging the system is creating data of the following types:
  1. Categories describing the content. The categories are based on the IAB taxonomy which is extended by customized categories created by viisights in cooperation with the customer.
  2. Items that appear in the clip which can be object celebrity, POI, category, sentiment, brand, action, event and theme.
  3. Scenes that are time slots with continues similar content in the clip

The major input to the system are the video clips. Optionally, the system can get textual input from the customer system, e.g. speech to text content, first party and third meta-data information (like content title, description, etc.). For the purpose of creating personalized tagging, the system is able to accept, as an additional input, user personal profiles, enrich them and use them for personalized tagging. If no user profile data is provided the system create a new user profile based on the it’s contextual and behavioral data.

Content Analysis Dashboard