Main Content

Image Retrieval Using Customized Bag of Features

This example shows how to create a Content Based Image Retrieval (CBIR) system using a customized bag-of-features workflow.

Introduction

Content Based Image Retrieval (CBIR) systems are used to find images that are visually similar to a query image. The application of CBIR systems can be found in many areas such as a web-based product search, surveillance, and visual place identification. A common technique used to implement a CBIR system is bag of visual words, also known as bag of features [1,2]. Bag of features is a technique adapted to image retrieval from the world of document retrieval. Instead of using actual words as in document retrieval, bag of features uses image features as the visual words that describe an image.

Image features are an important part of CBIR systems. These image features are used to gauge similarity between images and can include global image features such as color, texture, and shape. Image features can also be local image features such as speeded up robust features (SURF), histogram of gradients (HOG), or local binary patterns (LBP). The benefit of the bag-of-features approach is that the type of features used to create the visual word vocabulary can be customized to fit the application.

The speed and efficiency of image search is also important in CBIR systems. For example, it may be acceptable to perform a brute force search in a small collection of images of less than a 100 images, where features from the query image are compared to features from each image in the collection. For larger collections, a brute force search is not feasible and more efficient search techniques must be used. The bag of features provides a concise encoding scheme to represent a large collection of images using a sparse set of visual word histograms. This enables compact storage and efficient search through an inverted index data structure.

The Computer Vision Toolbox™ provides a customizable bag-of-features framework to implement an image retrieval system. The following steps outline the procedure:

  1. Select the Image Features for Retrieval

  2. Create a Bag Of Features

  3. Index the Images

  4. Search for Similar Images

In this example, you will go through these steps to create an image retrieval system for searching a flower dataset [3]. This dataset contains about 3670 images of 5 different types of flowers.

Download this dataset for use in the rest of this example.

% Location of the compressed data set
url = 'http://download.tensorflow.org/example_images/flower_photos.tgz';

% Store the output in a temporary folder
downloadFolder = tempdir;
filename = fullfile(downloadFolder,'flower_dataset.tgz');

Note that downloading the dataset from the web can take a very long time depending on your Internet connection. The commands below will block MATLAB for that period of time. Alternatively, you can use your web browser to first download the set to your local disk. If you choose that route, re-point the 'url' variable above to the file that you downloaded.

% Uncompressed data set
imageFolder = fullfile(downloadFolder,'flower_photos');

if ~exist(imageFolder,'dir') % download only once
    disp('Downloading Flower Dataset (218 MB)...');
    websave(filename,url);
    untar(filename,downloadFolder)
end

flowerImageSet = imageDatastore(imageFolder,'LabelSource','foldernames','IncludeSubfolders',true);

% Total number of images in the data set
numel(flowerImageSet.Files)
ans = 3670

Step 1 - Select the Image Features for Retrieval

The type of feature used for retrieval depends on the type of images within the collection. For example, if searching an image collection made up of scenes (beaches, cities, highways), it is preferable to use a global image feature, such as a color histogram that captures the color content of the entire scene. However, if the goal is to find specific objects within the image collections, then local image features extracted around object keypoints are a better choice.

Let's start by viewing one of images to get an idea of how to approach the problem.

% Display a one of the flower images
figure
I = imread(flowerImageSet.Files{1});
imshow(I);

The displayed image is by Mario.

In this example, the goal is to search for similar flowers in the dataset using the color information in the query image. A simple image feature based on the spatial layout of color is a good place to start.

The following function describes the algorithm used to extract color features from a given image. This function will be used as a extractorFcn within bagOfFeatures to extract color features.

type exampleBagOfFeaturesColorExtractor.m
function [features, metrics] = exampleBagOfFeaturesColorExtractor(I) 
% Example color layout feature extractor. Designed for use with bagOfFeatures.
%
% Local color layout features are extracted from truecolor image, I and
% returned in features. The strength of the features are returned in
% metrics.

%   Copyright 2014-2020 The MathWorks, Inc.

[~,~,P] = size(I);

isColorImage = P == 3; 

if isColorImage
    
    % Convert RGB images to the L*a*b* colorspace. The L*a*b* colorspace
    % enables you to easily quantify the visual differences between colors.
    % Visually similar colors in the L*a*b* colorspace will have small
    % differences in their L*a*b* values.
    Ilab = rgb2lab(I);                                                                             
      
    % Compute the "average" L*a*b* color within 16-by-16 pixel blocks. The
    % average value is used as the color portion of the image feature. An
    % efficient method to approximate this averaging procedure over
    % 16-by-16 pixel blocks is to reduce the size of the image by a factor
    % of 16 using IMRESIZE. 
    Ilab = imresize(Ilab, 1/16);
    
    % Note, the average pixel value in a block can also be computed using
    % standard block processing or integral images.
    
    % Reshape L*a*b* image into "number of features"-by-3 matrix.
    [Mr,Nr,~] = size(Ilab);    
    colorFeatures = reshape(Ilab, Mr*Nr, []); 
           
    % L2 normalize color features
    rowNorm = sqrt(sum(colorFeatures.^2,2));
    colorFeatures = bsxfun(@rdivide, colorFeatures, rowNorm + eps);
        
    % Augment the color feature by appending the [x y] location within the
    % image from which the color feature was extracted. This technique is
    % known as spatial augmentation. Spatial augmentation incorporates the
    % spatial layout of the features within an image as part of the
    % extracted feature vectors. Therefore, for two images to have similar
    % color features, the color and spatial distribution of color must be
    % similar.
    
    % Normalize pixel coordinates to handle different image sizes.
    xnorm = linspace(-0.5, 0.5, Nr);      
    ynorm = linspace(-0.5, 0.5, Mr);    
    [x, y] = meshgrid(xnorm, ynorm);
    
    % Concatenate the spatial locations and color features.
    features = [colorFeatures y(:) x(:)];
    
    % Use color variance as feature metric.
    metrics  = var(colorFeatures(:,1:3),0,2);
else
    
    % Return empty features for non-color images. These features are
    % ignored by bagOfFeatures.
    features = zeros(0,5);
    metrics  = zeros(0,1);     
end

Step 2 - Create a Bag Of Features

With the feature type defined, the next step is to learn the visual vocabulary within the bagOfFeatures using a set of training images. The code shown below picks a random subset of images from the dataset for training and then trains bagOfFeatures using the 'CustomExtractor' option.

Set doTraining to false to load a pretrained bagOfFeatures. doTraining is set to false because the training process takes several minutes. The rest of the example uses a pre-trained bagOfFeatures to save time. If you wish to recreate colorBag locally, set doTraining to true and consider Computer Vision Toolbox Preferences to reduce processing time.

doTraining = false;

if doTraining
    %Pick a random subset of the flower images.
    trainingSet = splitEachLabel(flowerImageSet, 0.6, 'randomized');
    
    % Specify the number of levels and branching factor of the vocabulary
    % tree used within bagOfFeatures. Empirical analysis is required to
    % choose optimal values.
    numLevels = 1;
    numBranches = 5000;
    
    % Create a custom bag of features using the 'CustomExtractor' option.
    colorBag = bagOfFeatures(trainingSet, ...
        'CustomExtractor', @exampleBagOfFeaturesColorExtractor, ...
        'TreeProperties', [numLevels numBranches]);
else
    % Load a pretrained bagOfFeatures.
    load('savedColorBagOfFeatures.mat','colorBag');
end

Step 3 - Index the Images

Now that the bagOfFeatures is created, the entire flower image set can be indexed for search. The indexing procedure extracts features from each image using the custom extractor function from step 1. The extracted features are encoded into a visual word histogram and added into the image index.

if doTraining
    % Create a search index.
    flowerImageIndex = indexImages(flowerImageSet,colorBag,'SaveFeatureLocations',false);
else
    % Load a saved index
    load('savedColorBagOfFeatures.mat','flowerImageIndex');
end

Because the indexing step processes thousands of images, the rest of this example uses a saved index to save time. You may recreate the index locally by setting doTraining to true.

Step 4 - Search for Similar Images

The final step is to use the retrieveImages function to search for similar images.

% Define a query image
queryImage = readimage(flowerImageSet,200);

figure
imshow(queryImage)

The displayed image is by RetinaFunk.

% Search for the top 5 images with similar color content
[imageIDs, scores] = retrieveImages(queryImage, flowerImageIndex,'NumResults',5);

retrieveImages returns the image IDs and the scores of each result. The scores are sorted from best to worst.

scores
scores = 5×1

    0.4776
    0.2138
    0.1386
    0.1382
    0.1317

The imageIDs correspond to the images within the image set that are similar to the query image.

% Display results using montage. 
figure
montage(flowerImageSet.Files(imageIDs),'ThumbnailSize',[200 200])

The displayed images are by RetinaFunk, Jenny Downing, Mayeesherr, daBinsi, and Steve Snodgrass.

Conclusion

This example showed you how to customize the bagOfFeatures and how to use indexImages and retrieveImages to create an image retrieval system based on color features. The techniques shown here may be extended to other feature types by further customizing the features used within bagOfFeatures.

References

[1] Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV. (2003) 1470-1477

[2] Philbin, J., Chum, O., Isard, M., A., J.S., Zisserman: Object retrieval with large vocabularies and fast spatial matching. In: CVPR. (2007)

[3] TensorFlow: How to Retrain an Image Classifier for New Categories.