Shallow and Deep Convolutional Networks for Saliency Prediction

Overview

Shallow and Deep Convolutional Networks for Saliency Prediction

CVPR 2016 logo Paper accepted at 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Junting Pan Kevin McGuinness Elisa Sayrol Noel O'Connor Xavier Giro-i-Nieto
Junting Pan (*) Kevin McGuinness (*) Elisa Sayrol Noel O'Connor Xavier Giro-i-Nieto

(*) Equal contribution

A joint collaboration between:

logo-insight logo-dcu logo-upc logo-etsetb logo-gpi
Insight Centre for Data Analytics Dublin City University (DCU) Universitat Politecnica de Catalunya (UPC) UPC ETSETB TelecomBCN UPC Image Processing Group

Abstract

The prediction of salient areas in images has been traditionally addressed with hand-crafted features based on neuroscience principles. This paper, however, addresses the problem with a completely data-driven approach by training a convolutional neural network (convnet). The learning process is formulated as a minimization of a loss function that measures the Euclidean distance of the predicted saliency map with the provided ground truth. The recent publication of large datasets of saliency prediction has provided enough data to train end-to-end architectures that are both fast and accurate. Two designs are proposed: a shallow convnet trained from scratch, and a another deeper solution whose first three layers are adapted from another network trained for classification. To the authors knowledge, these are the first end-to-end CNNs trained and tested for the purpose of saliency prediction

Publication

Our paper is open published thanks to the Computer Science Foundation. An arXiv pre-print is also available.

Image of the paper

Please cite with the following Bibtex code:

@InProceedings{Pan_2016_CVPR,
author = {Pan, Junting and Sayrol, Elisa and Giro-i-Nieto, Xavier and McGuinness, Kevin and O'Connor, Noel E.},
title = {Shallow and Deep Convolutional Networks for Saliency Prediction},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}

You may also want to refer to our publication with the more human-friendly Chicago style:

Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel E. O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." In Proceedings of the IEEE International Conference on Computer Vision (CVPR). 2016.

Models

The two convnets presented in our work can be downloaded from the links provided below each respective figure:

Shallow ConvNet (aka JuntingNet) Deep ConvNet (aka SalNet)
shallow-fig deep-fig
[Lasagne Model (2.5 GB)] [Caffe Model (99 MB)] [Caffe Prototxt]

Our previous winning shallow models for the LSUN Saliency Prediction Challenge 2015 are described in this preprint and available from this other site. That work was also part of Junting Pan's bachelor thesis at UPC TelecomBCN school in June 2015, which report, slides and video are available here.

Visual Results

Qualitative saliency predictions

Datasets

Training

As explained in our paper, our networks were trained on the training and validation data provided by SALICON.

Test

Three different dataset were used for test:

A collection of links to the SALICON and iSUN datasets is available from the LSUN Challenge site.

Software frameworks

Our paper presents two different convolutional neural networks trained with different frameworks. For this reason, different instructions and source code folders are provided.

Shallow Network on Lasagne

The shallow network is implemented in Lasagne, which at its time is developed over Theano. To install required version of Lasagne and all the remaining dependencies, you should run this pip command.

pip install -r https://github.com/imatge-upc/saliency-2016-cvpr/blob/master/shallow/requirements.txt

This requirements file was provided by Daniel Nouri.

Deep Network on Caffe

The deep network was developed over Caffe by Berkeley Vision and Learning Center (BVLC). You will need to follow these instructions to install Caffe.

Posterior work

If you were interested in this work, you may want to also check our posterior work, SalGAN, which offers a better performance.

Acknowledgements

We would like to especially thank Albert Gil Moreno and Josep Pujal from our technical support team at the Image Processing Group at the UPC.

AlbertGil-photo JosepPujal-photo
Albert Gil Josep Pujal
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeoForce GTX Titan Z and Titan X used in this work. logo-nvidia
The Image ProcessingGroup at the UPC is a SGR14 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office. logo-catalonia
This work has been developed in the framework of the project BigGraph TEC2013-43935-R, funded by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF). logo-spain
This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under grant number SFI/12/RC/2289. logo-ireland

Contact

If you have any general doubt about our work or code which may be of interest for other researchers, please use the public issues section on this github repo. Alternatively, drop us an e-mail at mailto:[email protected].

Comments
  • Problems about training and image and saliency map normalization

    Problems about training and image and saliency map normalization

    1. Follow the parameters setting in the paper and the training prototxt provided by kevinmcguinness in issue #3, i train the deep net based on VGG_CNN_M, but the net does not convergent. My training prototxt is: net:"train_deep_sal.prototxt" test_iter: 500 test_interval: 1000 display: 10 average_loss: 20 lr_policy: "step" gamma: 0.5 stepsize: 100 base_lr: 0.00000013 momentum: 0.9 iter_size: 1 max_iter: 24000 weight_decay: 0.0005 snapshot: 5000 snapshot_prefix: "train_deep_SalNet" solver_mode: GPU test_initialization: false Is it right?
    2. In the training prototxt, saliency mean is 31 and scale is 2/255, can it resize the saliency map to [-1, 1]? It seems that ([0, 255]-31)*(2/255) dose not match [-1 1], and this will mark more regions salient?
    3. In the post-process stage, the net_output add 127 but why not 31?
    opened by inkfish2016 9
  • help with the demo code

    help with the demo code

    Hi, i am not very familier with the python code i might be doing something wrong, but i'm receiving same output for each image with the (modified) demo code. file names and indexes are fine, but the output gives same results for the images.

    def saliencyPredictor(xt, net2, url, outUrl): y_pred_test = net2.predict(xt) i=0 for file in glob.glob(url+"*.png"):
    img = misc.imread(file)
    tmp = y_pred_test[i].reshape(48,48) blured= ndimage.gaussian_filter(tmp, sigma=3) y = misc.imresize(blured,(img.shape[0],img.shape[1]))/255. #misc.imsave(outUrl + os.path.basename(file), y) base = os.path.basename(file) scipy.io.savemat(outUrl + os.path.splitext(base)[0] + ".mat", {'y':y}); i= i+1

    opened by yasin06 5
  • Caffe Models and Prototxt

    Caffe Models and Prototxt

    Are you planning to share your caffe model and also prototxt for training. (Also can you explain how can you feed groundtruths to network for training ? )

    opened by cagdasbak 4
  • Ready to make github repo public ?

    Ready to make github repo public ?

    Dear @junting @kevinmcguinness @agilmor

    I have been working this morning mostly on the README.md of our recently accepted paper for CVPR 2016. I would like to make this repo public as soon as possible, so I would need you to tell me if you think that everything contained at the moment is enough to reproduce the experiments.

    There are still some formatting issues I can improve, but I think that the basic is there. Also, please check in the source code and check that everything is in there.

    Please answer this message with your OK as I will wait for it before launching the repo as well as a github project site based on the README.md you can find here.

    What do you think ?

    opened by xavigiro 3
  • Question on two slices in the shallow net

    Question on two slices in the shallow net

    As I am a freshman on deep learning, I have a stupid question about the structure of your shallow net. There are two slices (slice1 and slice2) between the FC layer and the Maxout layer. I'm not clear about what they are. Are they the neurons in the hidden layer of the Maxout layer? I will be very appreciate it if you can answer that. qq 20170308162057

    opened by NUAAXQ 2
  • A Problem with the Deep Model

    A Problem with the Deep Model

    Hi, I'm now using the deep model with Caffe. However, when I tried to load the net an error always occurs: Check failed: target_blobs[j]->num() == source_layer.blobs(j).num() (96 vs. 0) I suspect that the deploy file and the model are not correspond, are they? The deploy file is on this github project and my code is here:

    import caffe
    import glob
    import os
    import numpy as np
    from scipy import misc
    
    inputFolder = r'../../image'
    outputFolder = r'./results'
    modelFile = 'deep_net_model.caffemodel'
    deployFile = 'deep_net_deploy.prototxt'
    
    if os.path.isdir(outputFolder) == False:
        os.mkdir(outputFolder)
    
    net = caffe.Net(deployFile, modelFile, caffe.TEST)
    

    It stops at caffe.Net().

    opened by kfxw 2
  • Fix to_rgb function

    Fix to_rgb function

    I found that the demo code does not work well as is - especially for deep model, so I fixed some minor bugs - especially to_rgb function.

    Anyway, the work is amazing and I was impressed! 👍

    opened by kcy1019 1
  • Regarding JuntingNet_iSUN.pickle input file

    Regarding JuntingNet_iSUN.pickle input file

    Hi, We are trying to run your code. But we couldn't find "JuntingNet_iSUN.pickle" input file which is used in demo.py of shallow folder. Can you please help us by providing link to required file. Thank you.

    opened by abhid95 1
  • Re-using the Deep-net Model

    Re-using the Deep-net Model

    Hey,

    I was trying to re-use your deep-net caffe model in my own Torch network to compare saliency maps from two seperate inputs using KL Divergence loss. (Note that I do-not fine-tune your model while training my network; it is simply used for outputting saliency maps and backward propagating the loss through the gradients wrt its input).

    I am curious to know if its simply a plug and play model (apart from taking care of input characteristics of a caffe model like BGR format and input value ranging b/w 0 -> 255) or are there some other pre and post processing steps involved in utilising the model.

    This is because I see that in the init.py file, you apply various pre-processing as well as post processing to the input image and to the Saliency Maps (as mentioned in your paper), but I am not clear if they are to be applied for test purposes as well ?

    Thanks in advance for your response to this thread.

    opened by PraveerSINGH 1
  • Regarding to AUC metrics in the Table 5 of paper

    Regarding to AUC metrics in the Table 5 of paper

    Hi,

    I am reading your paper. For the Table 5 of your paper, you report the AUC_Judd and AUC_Borji.

    However, when we submit the results to the SALICON official website, they only give the AUC_Borji value of SALICON test set.

    Screenshot_20210818_201025

    Could you please tell me how did you obtain the AUC_Judd value?

    Bests.

    opened by gqding 0
  • how to train the deep net

    how to train the deep net

    As a beginner in caffe and deep learning, l am exploring how to train a deep net. So could you share the training code of deep CNN for saliency predction?

    opened by inkfish2016 0
  • DeepNet in Pytorch

    DeepNet in Pytorch

    Hi, I am trying to implement the DeepNet architecture in pytorch. The code seems to work fine but the result are not as expected. I have done as per the protext files which are provided in the issue 3 and 9. You can find my implementation in https://github.com/Goutam-Kelam/Visual-Saliency/tree/master/Deep_Net. It would be helpful if you can tell me where my mistake lies. Thankyou in advance

    opened by Goutam-Kelam 7
  • shallow net issue

    shallow net issue

    Dear Pan,

    I am Sen, a phd student from the university of Exeter. Recently I am doing the literature review of saliency prediction and found your work. Therefore, I want to reproduce your work using the code you provided on Github. But when I was training using your code, the training loss is still zero, and the validation loss is also very small. It seems that I was training a pretrained model. So is there any missing file on github for training your shallow network. And another question is the input dimension. What is the dimension of X, and y in train.py of the shallow file? Have you done any preprocessing for the input image and the ground truth? Thank you very much!

    opened by SenHe 0
  • custmize new loss function

    custmize new loss function

    Hey, Nowdays, I have read your article about detecting image saliency published on CVPR 2016. Now I have a question: How to custmize new loss function of CNN in Lasagne&Theano specifically? Great thanks in advance for your response.

    opened by kawhi96 0
  • Something about the experiment results of the shallow convolutional networks for saliency prediction

    Something about the experiment results of the shallow convolutional networks for saliency prediction

    Dear friends: I have a question about the the experiment results of the shallow convolutional networks for saliency prediction. I use the code of shallow network provided by https://github.com/imatge-upc/saliency-2016-cvpr and use the default parameters, but we cannot get the result provided by the paper (Shallow and Deep Convolutional Networks for Saliency Prediction). Almost all the result generated by the Shallow network are only have the hot spot in the center of image.

    Some results is avaliable in the attach files, can you give me some advise about this experiment?
    

    partof the result.zip

    opened by WLTtuantuan 0
Owner
Image Processing Group - BarcelonaTECH - UPC
Image Processing Group - BarcelonaTECH - UPC
Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks

mtcnn-caffe Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks. This project provide you a method to update mu

Weilin Cong 500 Oct 30, 2022
DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning.

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiati

DL4S Team 2 Dec 5, 2021
Automatic colorization using deep neural networks. Colorful Image Colorization. In ECCV, 2016.

Colorful Image Colorization [Project Page] Richard Zhang, Phillip Isola, Alexei A. Efros. In ECCV, 2016. + automatic colorization functionality for Re

Richard Zhang 3k Dec 27, 2022
Model stock prediction for iOS

Stockify Problem Investing in Stocks is great way to grow money Picking the right stocks for you can get tedious and confusing Too many things to foll

Sanchitha Dinesh 1 Mar 20, 2022
🌅 iOS11 demo application for visual sentiment prediction.

Sentiment Vision Demo A Demo application using Vision and CoreML frameworks to detect the most likely sentiment of the given image. Model This demo is

Cocoa AI 34 Jan 29, 2022
Artificial intelligence/machine learning data structures and Swift algorithms for future iOS development. bayes theorem, neural networks, and more AI.

Swift Brain The first neural network / machine learning library written in Swift. This is a project for AI algorithms in Swift for iOS and OS X develo

Vishal 331 Oct 14, 2022
Accelerated tensor operations and dynamic neural networks based on reverse mode automatic differentiation for every device that can run Swift - from watchOS to Linux

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiati

Palle 87 Dec 29, 2022
Takes those cursed usernames you see on social networks and lets them be accessible to screen readers.

AccessibleAuthorLabel ?? Takes those cursed usernames you see on social networks and lets them be accessible to screen readers so everyone can partake

Christian Selig 40 Jan 25, 2022
A toolbox of AI modules written in Swift: Graphs/Trees, Support Vector Machines, Neural Networks, PCA, K-Means, Genetic Algorithms

AIToolbox A toolbox of AI modules written in Swift: Graphs/Trees, Linear Regression, Support Vector Machines, Neural Networks, PCA, KMeans, Genetic Al

Kevin Coble 776 Dec 18, 2022
Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood.

Bender Bender is an abstraction layer over MetalPerformanceShaders useful for working with neural networks. Contents Introduction Why did we need Bend

xmartlabs 1.7k Dec 24, 2022
A Swift deep learning library with Accelerate and Metal support.

Serrano Aiming to offering popular and cutting edge techs in deep learning area on iOS devices, Serrano is developed as a tool for developers & resear

pcpLiu 51 Nov 17, 2022
A simple deep learning library for estimating a set of tags and extracting semantic feature vectors from given illustrations.

Illustration2Vec illustration2vec (i2v) is a simple library for estimating a set of tags and extracting semantic feature vectors from given illustrati

Masaki Saito 661 Dec 12, 2022
On-device wake word detection powered by deep learning.

Porcupine Made in Vancouver, Canada by Picovoice Porcupine is a highly-accurate and lightweight wake word engine. It enables building always-listening

Picovoice 2.8k Dec 30, 2022
Automatic spoken language identification (LID) using deep learning.

iLID Automatic spoken language identification (LID) using deep learning. Motivation We wanted to classify the spoken language within audio files, a pr

Thomas Werkmeister 85 Apr 3, 2022
The source code of 'Visual Attribute Transfer through Deep Image Analogy'.

Deep Image Analogy The major contributors of this repository include Jing Liao, Yuan Yao, Lu Yuan, Gang Hua and Sing Bing Kang at Microsoft Research.

MSRA CVer 1.4k Jan 6, 2023
This is an open-source project for the aesthetic evaluation of images based on the deep learning-caffe framework, which we completed in the Victory team of Besti.

This is an open-source project for the aesthetic evaluation of images based on the deep learning-caffe framework, which we completed in the Victory team of Besti.

The Victory Group of Besti 102 Dec 15, 2022
Resource monitor - A flutter plugin for Android and IOS to monitor CPU and RAM usage of device.

resource_monitor A flutter plugin for Android and IOS to monitor CPU and RAM usage of device. TODO Implement Android Side of this plugin. Add listener

Skandar Munir 1 Nov 11, 2022
Matft is Numpy-like library in Swift. Function name and usage is similar to Numpy.

Numpy-like library in swift. (Multi-dimensional Array, ndarray, matrix and vector library)

null 80 Dec 21, 2022
Realtime yoga pose detection and classification plugin for Flutter using MLKit

ML Kit Pose Detection Plugin Flutter plugin for realtime pose detection using MLKit's Blazepose. License Copyright (c) 2021 Souvik Biswas, Bharat Bira

Souvik Biswas 8 May 5, 2022