Histogram Based Scene/Shot Detection

Implemented in Javascript for HTML5 Videos


Introduction: How the algorithm works

The algorithm works on color histograms which count the number of pixels with a specific red, green and blue intensity. Digital photo cameras often show lucidity histograms to control exposition. The simplest mode to compare two pictures is to simply subtract the pixel count of both images for a given color intensity. However that does not pay attention to the fact that count values of similar intensities may compensate each other.

A compensation of similar intensity values can be achieved by computing the weighted average of intensity values in a given region. Instead of subtracting individual intensity values the weighted averages of intensity values can be subtracted. Intensity values near the pixel in question are weighted higher whereby the weight distribution equals the half of a circle.

However subtracting weighted average values alone would be rather inefficient as that causes many similar values to become subtracted. The algorithm can be improved by subtracting an average value in one image from a decisive intensity value in the other picture. This is done two times to cross and then the maximum of both values is taken.

In order to bound the influence of steep edges which would otherwise dominate the result we have subtracted the self-variances from these crossed difference value. The self-variance is defined here as subtracting the decisive intensity value at a given position from the weighted average at the same position inside the same image.

There are for sure a lot of things which could still be improved about this shot detection as it currently is. We could subtract the values in a three dimensional color histogram rather than in separate red, green and blue histograms where the difference value of these three histograms are currently simply added together. Another idea would be to additionally consult an edge diagram as outputted by convolution filters in addition to the color histograms - or to weight the results of the color and edge histogram by region. For all these possible improvement the exisiting algorithm works quite well, definitely sufficiently well to be useful in practice.

It may be a bit unusual that the given algorithm was implemented in Javascript. However this was a requirement of the course the program was programmed for. The problem with Javascript is that the browser delays some time when seeking to a new frame so that the video can not be analysed much faster than playback.

Download / Live Test

Download & Live Test:
ShotDetection v1.0
Live Test (at best open in new tab)
*** new *** covered by our gpg-signed software/SHA512SUMS.
Author:
Elmar Stellnberger estellnb@elstel.org
Please sign our Contributor License Agreement if you want to contribute code. Otherwise we can not assimilate and re-distribute your changes here at elstel.org




User Guide

Use 'Load Video' first to select the video to be analysed. Your browser must not block pop-ups for this feature to be functional. Another option is to change the default video in the source-tag inside the <video>-tag in index.html by hand. Then simply click on 'Analyze Video' so that 'Please Wait - I am analyzing your video!' is displayed. You can then stop analysis by the button 'stop analyzing' at any time. If you click on 'Analyze Video' again it will continue the analysis where it had previously been stopped. The position up to which the video has already been analyzed can be seen in the 'Analyzation Start Time' slider once the analysis has been stopped.

You may also select an analyzation start time by hand with the slider. If you have navigated inside the video up to a position where you want to start analysis the simply press 'M0' first to remember the video position internally and then 'R' to restore it into the slider. 'M1' on the contrary does not remember the position of the video but the position of the slider. If you want to re-analyse a given video sequence multiple times with different parameters then use 'R' and 'Clear List' to restore the position and delete all shots detected so far. 'Reset' on the other hand clears the shot list and positions the analyzation start time slider at the beginning of the video.

The most important analyzation parameter is the threshold. If the threshold is surpassed by the computed difference value for the actual frame a new shot is added to the shot list. There are two handles for the threshold: 'Upper Bound for Threshold' sets the maximum value for the 'Threshold'-slider. This was necessary because depending on the value of the 'Sliding Average Size' very different values are needed for the threshold. The value of the threshold can be set up to one hundredth percent.

The second most important parameter is the 'Sliding Average Size'. Intensity values are subtracted from the weighted average intensity values in the other frame. That is to give similar intensity values a similar effect. The sliding average size gives the radius by which similar intensity values are considered for the average. However the weights are very low at the borders of the radius (half-circle). Each histogram differs between 256 different intensity values for red, green and blu. If you give a radius of 128 the intensity values at the beginning and the end of the histogram are still linked to the intensity value in the middle.

The smaller the sliding average size is the higher needs to be the threshold. For bigger sliding average sizes the mean value compensates more of the differences and you need a smaller threshold. If you uncheck the checkbox 'Use Sliding Average' then the two histograms are simply subtracted and no average value is computed. This equals to a sliding average size of one.

Another parameter is the 'Inter-Frame Gap' which can for optimal results be set to the framerate of the video (40ms for 25fps - 1000/25). A higher inter frame gap makes the analysis faster which can f.i. be used for testing purposes. Values up to 2000ms or 2 seconds can be useful. Higher values may be used for slowly scrolling landscapes or for continuously moving animations like cartoons. Action scenes are not suitable for a higher inter frame gap. You would have to use a very high threshold which would however let many scene changes go unnoticed.

A possible detriment of a small inter frame gap is that continuous changes like cross fadings can not be recognized because there is not much change between one frame and the other. For cross-fadings from black or to black the checkbox 'Recognize Black Frames' provides remedy. To jump over fadings an inter frame gap of 1000ms is sufficient as they hardly last longer than a second.

An additional feature which can be performed before the histogram analysis is 'Align Lucidity' which compensates for differences in lucidity. For dark images the histogram is centered on the left, the side of low intensity values. 'Align Lucidity' takes the area where most of the histogram values reside and expands it over the whole histogram. This only works well if this area is large enough. Otherwise there is not enough information and the resulting histogram would be very blocky. Usually the lucidity compensation has a positive effect on shot recognition because histogram analysis always compares values with same or similar intensity. The threshold which is used to cut irrelevant parts of the histogram at the very left and the very right is currently hardcoded as lucidity_threshold inside the program and can thus only be changed manually.

The lucidity threshold from when on histograms of sufficiently bright images are expanded to the whole area can be set by 'Spread-Lucidity Minimum Threshold'. 256 did not expand unless the image was fully white so that align lucidity would have no effect. However when cross-fading there is a jerky transgression from when the histogram is not expanded to where it is. The two checkboxes 'Ignore Lucidity Transgression' 'to black' and 'from black' prevent the recognition of a new shot when an image is dimmed down or up. If the 'No Ignore Threshold' is less than 100% the shot detection is no more prevented on dim-up or dim-down if the brightness difference of neighbouring frames is big enough.

However on dimming the black frame recognition may fail if there is no entirely black frame like f.i. when some text keeps to be displayed intermittently. For this case we recommend to set 'from black' for 'Ignore Lucidity Transgression' instead of 'Recognize Black Frames'.

'Stabilize Lucidity below Δ' causes that an area of always the same size is cropped at the left and right of the histogram because cropping areas of different size would lead to the recognition of many shot changes where there are none in deed. However when dimming up or down that value has to change. While 'Spread-Lucidity Minimum Threshold' hinders a too small area from being expanded over the whole histogram 'Cut-Lucidity Threshold' makes that intensity values which are below 1/16 of the mean value are cropped.

If an inter-frame gap of 500ms or 1000ms is set the beginning of a scene can still be detected with higher accuracy by setting a lower value for 'Accuracy'. If the accuracy is finer grained than the inter-frame gap then it additionally seeks for the beginning of a scene by binary or sequential search. No case is known where sequential search yields better results than the faster binary search although it in deed looks at all the frames with the given accuracy.

If you click on a scene thumbnail the video jumps to the beginning of that shot. The following and the preceding frame can be shown by the buttons '<' and '>' which are located directly at the right of the video. The accuracy by which a following and a preceding frame is selected is also set by the 'Accuracy' parameter. The whole video can be scaled by starting to drag the video area at the bottom-right. The mouse cursor will indicate you at the bottom-right that dragging is possible. The size of the preview thumbnails for shots can be scaled via 'Icon Size'.

'Minimum Shot Length' is just another doohickey. It tells how long a scene has to last at minimum. Almost all scenes last longer than a second (1000ms). However for the Big Buck Bunny film there is one scene which only last for a singleton frame (40ms). By setting a minimum shot length you can hinder the program from recognizing multiple shots when the difference in time is not sufficiently large. You may f.i. set it to 1000ms to allow for no more than one new shot to be detected every second.

The two checkboxes 'View Histogram' and 'View Histogram of previous frame' can be checked in order to show the red, green and blue histogram for the frame before and precisely at the time when a new shot starts in addition to the shot thumbnail. That way you can see the data basis for the decisions of the algorithm. 'Align Lucidity' has already been applied on the histograms that are shown this way. These histograms can explain why or why not there has been detected a new shot. Besides this the feature has been used during development.

You may finally want to save the detected shots and their starting time into a list which can be done via 'Save List of All Shots'. You may view the list via 'View List of All Shots' before. Here you can also directly paste it into the clipboard. The clipboard in turn can usually be pasted via [Ctrl]+[V] in your favourite text editor. Popup-windows must not be blocked for this functionality. You may assign a textual name to each shot in the text-edit field at the right of 'Delete this Shot' which is then later displayed in the shot list. You can manually add or delete shots by 'Create New Shot Here' and by 'Delete this Shot'.



Installation Manual

All you need to install the Sliding Average Shot Boundary Detection is a web server to host index.html. If you do not want to edit the file name of the video in the server directory manually you will also need php. Create a directory 'ShotDetection' on your web server, copy index.html, listdir.php and the videos into it and then open the program via http://localhost/ShotDetection in your browser.

In the following you have a short description on how to configure the Apache server which is the most widely used web server under Linux:

ServerName localhost DocumentRoot "/home/user/public_html" AddHandler application/x-httpd-php5 .php <Directory "/home/user/public_html/"> Options ExecCGI FCGIWrapper /usr/bin/php-cgi AddHandler fcgid-script .php AllowOverride all Order allow,deny Allow from all Require all granted </Directory> NameVirtualHost 127.0.0.1:80 <VirtualHost 127.0.0.1:80> DocumentRoot /home/user/public_html ServerName localhost:80 </VirtualHost> NameVirtualHost ::1:80 <VirtualHost ::1:80> DocumentRoot /home/user/public_html ServerName localhost:80 </VirtualHost>

Under Debian it is sufficient to copy the content above into a file under /etc/apache2/sites-enabled/. Other distributions may require you to insert this content into httpd.conf or apache2.conf. You will additionally have to install and enable a php module for Apache which can be done like follows:

cd /etc/apache2/mods-enabled ln -s ../mods-available/php7.0.conf php7.0.conf ln -s ../mods-available/php7.0.load php7.0.load or a2enmod mod_php

There are basically two methods to enable php. One is via CGI and the other via the php module of Apache. The shot detection can also be used without php. If you have no php and do not want to resign from comfortably selecting your video via a list in the GUI do the following:

ls -1 *.webm *.mkv *.mov *.mp4 >listdir.lis sed -i “s#listdir.php#listdir.lis#“ index.html

The first command has to be executed whenever you copy a new video file into your ShotDetection directory. The second command is only here to switch from php to the usage of a manual file list (the other way round: sed -i "s#listdir.lis#listdir.php#" index.html). You can also change the file name directly with a text editor inside index.html. All the video files need to reside in the same directory as index.html (If you have specified Options SymLinksIfOwnerMatch you may also link them into here.).