Shallow and Deep Convolutional Networks for Saliency Prediction

Image Processing Group - BarcelonaTECH - UPC

Last update: Jan 5, 2023

Related tags

Machine Learning saliency-2016-cvpr

Overview

Shallow and Deep Convolutional Networks for Saliency Prediction

	Paper accepted at 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)


Junting Pan (*)	Kevin McGuinness (*)	Elisa Sayrol	Noel O'Connor	Xavier Giro-i-Nieto

(*) Equal contribution

A joint collaboration between:


Insight Centre for Data Analytics	Dublin City University (DCU)	Universitat Politecnica de Catalunya (UPC)	UPC ETSETB TelecomBCN	UPC Image Processing Group

Abstract

The prediction of salient areas in images has been traditionally addressed with hand-crafted features based on neuroscience principles. This paper, however, addresses the problem with a completely data-driven approach by training a convolutional neural network (convnet). The learning process is formulated as a minimization of a loss function that measures the Euclidean distance of the predicted saliency map with the provided ground truth. The recent publication of large datasets of saliency prediction has provided enough data to train end-to-end architectures that are both fast and accurate. Two designs are proposed: a shallow convnet trained from scratch, and a another deeper solution whose first three layers are adapted from another network trained for classification. To the authors knowledge, these are the first end-to-end CNNs trained and tested for the purpose of saliency prediction

Publication

Our paper is open published thanks to the Computer Science Foundation. An arXiv pre-print is also available.

Please cite with the following Bibtex code:

@InProceedings{Pan_2016_CVPR,
author = {Pan, Junting and Sayrol, Elisa and Giro-i-Nieto, Xavier and McGuinness, Kevin and O'Connor, Noel E.},
title = {Shallow and Deep Convolutional Networks for Saliency Prediction},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2016}
}

You may also want to refer to our publication with the more human-friendly Chicago style:

Junting Pan, Kevin McGuinness, Elisa Sayrol, Noel E. O'Connor, and Xavier Giro-i-Nieto. "Shallow and Deep Convolutional Networks for Saliency Prediction." In Proceedings of the IEEE International Conference on Computer Vision (CVPR). 2016.

Models

The two convnets presented in our work can be downloaded from the links provided below each respective figure:

Shallow ConvNet (aka JuntingNet)	Deep ConvNet (aka SalNet)

[Lasagne Model (2.5 GB)]	[Caffe Model (99 MB)] [Caffe Prototxt]

Our previous winning shallow models for the LSUN Saliency Prediction Challenge 2015 are described in this preprint and available from this other site. That work was also part of Junting Pan's bachelor thesis at UPC TelecomBCN school in June 2015, which report, slides and video are available here.

Visual Results

Datasets

Training

As explained in our paper, our networks were trained on the training and validation data provided by SALICON.

Test

Three different dataset were used for test:

Test partition of SALICON dataset.
Test partition of iSUN dataset.
MIT300.

A collection of links to the SALICON and iSUN datasets is available from the LSUN Challenge site.

Software frameworks

Our paper presents two different convolutional neural networks trained with different frameworks. For this reason, different instructions and source code folders are provided.

Shallow Network on Lasagne

The shallow network is implemented in Lasagne, which at its time is developed over Theano. To install required version of Lasagne and all the remaining dependencies, you should run this pip command.

pip install -r https://github.com/imatge-upc/saliency-2016-cvpr/blob/master/shallow/requirements.txt

This requirements file was provided by Daniel Nouri.

Deep Network on Caffe

The deep network was developed over Caffe by Berkeley Vision and Learning Center (BVLC). You will need to follow these instructions to install Caffe.

Posterior work

If you were interested in this work, you may want to also check our posterior work, SalGAN, which offers a better performance.

Acknowledgements

We would like to especially thank Albert Gil Moreno and Josep Pujal from our technical support team at the Image Processing Group at the UPC.


Albert Gil	Josep Pujal


We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GeoForce GTX Titan Z and Titan X used in this work.
The Image ProcessingGroup at the UPC is a SGR14 Consolidated Research Group recognized and sponsored by the Catalan Government (Generalitat de Catalunya) through its AGAUR office.
This work has been developed in the framework of the project BigGraph TEC2013-43935-R, funded by the Spanish Ministerio de Economía y Competitividad and the European Regional Development Fund (ERDF).
This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under grant number SFI/12/RC/2289.

Contact

If you have any general doubt about our work or code which may be of interest for other researchers, please use the public issues section on this github repo. Alternatively, drop us an e-mail at mailto:xavier.giro@upc.edu.

Comments

Problems about training and image and saliency map normalization
Follow the parameters setting in the paper and the training prototxt provided by kevinmcguinness in issue #3, i train the deep net based on VGG_CNN_M, but the net does not convergent. My training prototxt is: net:"train_deep_sal.prototxt" test_iter: 500 test_interval: 1000 display: 10 average_loss: 20 lr_policy: "step" gamma: 0.5 stepsize: 100 base_lr: 0.00000013 momentum: 0.9 iter_size: 1 max_iter: 24000 weight_decay: 0.0005 snapshot: 5000 snapshot_prefix: "train_deep_SalNet" solver_mode: GPU test_initialization: false Is it right?

In the training prototxt, saliency mean is 31 and scale is 2/255, can it resize the saliency map to [-1, 1]? It seems that ([0, 255]-31)*(2/255) dose not match [-1 1], and this will mark more regions salient?

In the post-process stage, the net_output add 127 but why not 31?
opened by inkfish2016 9
help with the demo code

Hi, i am not very familier with the python code i might be doing something wrong, but i'm receiving same output for each image with the (modified) demo code. file names and indexes are fine, but the output gives same results for the images.

def saliencyPredictor(xt, net2, url, outUrl): y_pred_test = net2.predict(xt) i=0 for file in glob.glob(url+"*.png"):
img = misc.imread(file)
tmp = y_pred_test[i].reshape(48,48) blured= ndimage.gaussian_filter(tmp, sigma=3) y = misc.imresize(blured,(img.shape[0],img.shape[1]))/255. #misc.imsave(outUrl + os.path.basename(file), y) base = os.path.basename(file) scipy.io.savemat(outUrl + os.path.splitext(base)[0] + ".mat", {'y':y}); i= i+1

opened by yasin06 5
Caffe Models and Prototxt

Are you planning to share your caffe model and also prototxt for training. (Also can you explain how can you feed groundtruths to network for training ? )

opened by cagdasbak 4
Ready to make github repo public ?

Dear @junting @kevinmcguinness @agilmor

I have been working this morning mostly on the README.md of our recently accepted paper for CVPR 2016. I would like to make this repo public as soon as possible, so I would need you to tell me if you think that everything contained at the moment is enough to reproduce the experiments.

There are still some formatting issues I can improve, but I think that the basic is there. Also, please check in the source code and check that everything is in there.

Please answer this message with your OK as I will wait for it before launching the repo as well as a github project site based on the README.md you can find here.

What do you think ?

opened by xavigiro 3
Question on two slices in the shallow net

As I am a freshman on deep learning, I have a stupid question about the structure of your shallow net. There are two slices (slice1 and slice2) between the FC layer and the Maxout layer. I'm not clear about what they are. Are they the neurons in the hidden layer of the Maxout layer? I will be very appreciate it if you can answer that.

opened by NUAAXQ 2
A Problem with the Deep Model
Hi, I'm now using the deep model with Caffe. However, when I tried to load the net an error always occurs: Check failed: target_blobs[j]->num() == source_layer.blobs(j).num() (96 vs. 0) I suspect that the deploy file and the model are not correspond, are they? The deploy file is on this github project and my code is here:

import caffe import glob import os import numpy as np from scipy import misc inputFolder = r'../../image' outputFolder = r'./results' modelFile = 'deep_net_model.caffemodel' deployFile = 'deep_net_deploy.prototxt' if os.path.isdir(outputFolder) == False: os.mkdir(outputFolder) net = caffe.Net(deployFile, modelFile, caffe.TEST)

It stops at caffe.Net().
opened by kfxw 2
Fix to_rgb function

I found that the demo code does not work well as is - especially for deep model, so I fixed some minor bugs - especially to_rgb function.

Anyway, the work is amazing and I was impressed! 👍

opened by kcy1019 1
Regarding JuntingNet_iSUN.pickle input file

Hi, We are trying to run your code. But we couldn't find "JuntingNet_iSUN.pickle" input file which is used in demo.py of shallow folder. Can you please help us by providing link to required file. Thank you.

opened by abhid95 1
Re-using the Deep-net Model

Hey,

I was trying to re-use your deep-net caffe model in my own Torch network to compare saliency maps from two seperate inputs using KL Divergence loss. (Note that I do-not fine-tune your model while training my network; it is simply used for outputting saliency maps and backward propagating the loss through the gradients wrt its input).

I am curious to know if its simply a plug and play model (apart from taking care of input characteristics of a caffe model like BGR format and input value ranging b/w 0 -> 255) or are there some other pre and post processing steps involved in utilising the model.

This is because I see that in the init.py file, you apply various pre-processing as well as post processing to the input image and to the Saliency Maps (as mentioned in your paper), but I am not clear if they are to be applied for test purposes as well ?

Thanks in advance for your response to this thread.

opened by PraveerSINGH 1
Regarding to AUC metrics in the Table 5 of paper

Hi,

I am reading your paper. For the Table 5 of your paper, you report the AUC_Judd and AUC_Borji.

However, when we submit the results to the SALICON official website, they only give the AUC_Borji value of SALICON test set.

Could you please tell me how did you obtain the AUC_Judd value?

Bests.

opened by gqding 0
how to train the deep net

As a beginner in caffe and deep learning, l am exploring how to train a deep net. So could you share the training code of deep CNN for saliency predction？

opened by inkfish2016 0
DeepNet in Pytorch

Hi, I am trying to implement the DeepNet architecture in pytorch. The code seems to work fine but the result are not as expected. I have done as per the protext files which are provided in the issue 3 and 9. You can find my implementation in https://github.com/Goutam-Kelam/Visual-Saliency/tree/master/Deep_Net. It would be helpful if you can tell me where my mistake lies. Thankyou in advance

opened by Goutam-Kelam 7
shallow net issue

Dear Pan,

I am Sen, a phd student from the university of Exeter. Recently I am doing the literature review of saliency prediction and found your work. Therefore, I want to reproduce your work using the code you provided on Github. But when I was training using your code, the training loss is still zero, and the validation loss is also very small. It seems that I was training a pretrained model. So is there any missing file on github for training your shallow network. And another question is the input dimension. What is the dimension of X, and y in train.py of the shallow file? Have you done any preprocessing for the input image and the ground truth? Thank you very much!

opened by SenHe 0
custmize new loss function

Hey, Nowdays, I have read your article about detecting image saliency published on CVPR 2016. Now I have a question: How to custmize new loss function of CNN in Lasagne&Theano specifically? Great thanks in advance for your response.

opened by kawhi96 0
Something about the experiment results of the shallow convolutional networks for saliency prediction
Dear friends: I have a question about the the experiment results of the shallow convolutional networks for saliency prediction. I use the code of shallow network provided by https://github.com/imatge-upc/saliency-2016-cvpr and use the default parameters, but we cannot get the result provided by the paper (Shallow and Deep Convolutional Networks for Saliency Prediction). Almost all the result generated by the Shallow network are only have the hot spot in the center of image.

Some results is avaliable in the attach files, can you give me some advise about this experiment?

partof the result.zip
opened by WLTtuantuan 0

Owner

Image Processing Group - BarcelonaTECH - UPC

GitHub http://imatge-upc.github.io/saliency-2016-cvpr/

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks

mtcnn-caffe Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks. This project provide you a method to update mu

500 Oct 30, 2022

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning.

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiati

2 Dec 5, 2021

Automatic colorization using deep neural networks. Colorful Image Colorization. In ECCV, 2016.

Colorful Image Colorization [Project Page] Richard Zhang, Phillip Isola, Alexei A. Efros. In ECCV, 2016. + automatic colorization functionality for Re

3k Dec 27, 2022

Model stock prediction for iOS

Stockify Problem Investing in Stocks is great way to grow money Picking the right stocks for you can get tedious and confusing Too many things to foll

1 Mar 20, 2022

🌅 iOS11 demo application for visual sentiment prediction.

Sentiment Vision Demo A Demo application using Vision and CoreML frameworks to detect the most likely sentiment of the given image. Model This demo is

34 Jan 29, 2022

Artificial intelligence/machine learning data structures and Swift algorithms for future iOS development. bayes theorem, neural networks, and more AI.

Swift Brain The first neural network / machine learning library written in Swift. This is a project for AI algorithms in Swift for iOS and OS X develo

331 Oct 14, 2022

Accelerated tensor operations and dynamic neural networks based on reverse mode automatic differentiation for every device that can run Swift - from watchOS to Linux

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiati

87 Dec 29, 2022

Takes those cursed usernames you see on social networks and lets them be accessible to screen readers.

AccessibleAuthorLabel ?? Takes those cursed usernames you see on social networks and lets them be accessible to screen readers so everyone can partake

40 Jan 25, 2022

A toolbox of AI modules written in Swift: Graphs/Trees, Support Vector Machines, Neural Networks, PCA, K-Means, Genetic Algorithms

AIToolbox A toolbox of AI modules written in Swift: Graphs/Trees, Linear Regression, Support Vector Machines, Neural Networks, PCA, KMeans, Genetic Al

776 Dec 18, 2022

Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood.

Bender Bender is an abstraction layer over MetalPerformanceShaders useful for working with neural networks. Contents Introduction Why did we need Bend

1.7k Dec 24, 2022

A Swift deep learning library with Accelerate and Metal support.

Serrano Aiming to offering popular and cutting edge techs in deep learning area on iOS devices, Serrano is developed as a tool for developers & resear

51 Nov 17, 2022

A simple deep learning library for estimating a set of tags and extracting semantic feature vectors from given illustrations.

Illustration2Vec illustration2vec (i2v) is a simple library for estimating a set of tags and extracting semantic feature vectors from given illustrati

661 Dec 12, 2022

This is an open-source project for the aesthetic evaluation of images based on the deep learning-caffe framework, which we completed in the Victory team of Besti.

102 Dec 15, 2022

Shallow and Deep Convolutional Networks for Saliency Prediction

Related tags

Overview

Shallow and Deep Convolutional Networks for Saliency Prediction

Abstract

Publication

Models

Visual Results

Datasets

Training

Test

Software frameworks

Shallow Network on Lasagne

Deep Network on Caffe

Posterior work

Acknowledgements

Contact

Comments

Owner

Image Processing Group - BarcelonaTECH - UPC

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning.

Automatic colorization using deep neural networks. Colorful Image Colorization. In ECCV, 2016.

Model stock prediction for iOS

🌅 iOS11 demo application for visual sentiment prediction.

Artificial intelligence/machine learning data structures and Swift algorithms for future iOS development. bayes theorem, neural networks, and more AI.

Accelerated tensor operations and dynamic neural networks based on reverse mode automatic differentiation for every device that can run Swift - from watchOS to Linux

Takes those cursed usernames you see on social networks and lets them be accessible to screen readers.

A toolbox of AI modules written in Swift: Graphs/Trees, Support Vector Machines, Neural Networks, PCA, K-Means, Genetic Algorithms

Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood.

A Swift deep learning library with Accelerate and Metal support.

A simple deep learning library for estimating a set of tags and extracting semantic feature vectors from given illustrations.

On-device wake word detection powered by deep learning.

Automatic spoken language identification (LID) using deep learning.

The source code of 'Visual Attribute Transfer through Deep Image Analogy'.

This is an open-source project for the aesthetic evaluation of images based on the deep learning-caffe framework, which we completed in the Victory team of Besti.

Resource monitor - A flutter plugin for Android and IOS to monitor CPU and RAM usage of device.

Matft is Numpy-like library in Swift. Function name and usage is similar to Numpy.

Realtime yoga pose detection and classification plugin for Flutter using MLKit