Deeper Depth Prediction with Fully Convolutional Residual Networks (FCRN)

Overview

Deeper Depth Prediction with Fully Convolutional Residual Networks

By Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, Nassir Navab.

Contents

  1. Introduction
  2. Quick Guide
  3. Models
  4. Results
  5. Citation
  6. License

Introduction

This repository contains the CNN models trained for depth prediction from a single RGB image, as described in the paper "Deeper Depth Prediction with Fully Convolutional Residual Networks". The provided models are those that were used to obtain the results reported in the paper on the benchmark datasets NYU Depth v2 and Make3D for indoor and outdoor scenes respectively. Moreover, the provided code can be used for inference on arbitrary images.

Quick Guide

The trained models are currently provided in two frameworks, MatConvNet and TensorFlow. Please read below for more information on how to get started.

TensorFlow

The code provided in the tensorflow folder requires accordingly a successful installation of the TensorFlow library (any platform). The model's graph is constructed in fcrn.py and the corresponding weights can be downloaded using the link below. The implementation is based on ethereon's Caffe-to-TensorFlow conversion tool. predict.py provides sample code for using the network to predict the depth map of an input image. Use python predict.py NYU_FCRN.ckpt yourimage.jpg to try the code.

MatConvNet

Prerequisites

The code provided in the matlab folder requires the MatConvNet toolbox for CNNs. It is required that a version of the library equal or newer than the 1.0-beta20 is successfully compiled either with or without GPU support. Furthermore, the user should modify matconvnet_path = '../matconvnet-1.0-beta20' within evaluateNYU.m and evaluateMake3D.m so that it points to the correct path, where the library is stored.

How-to

For acquiring the predicted depth maps and evaluation on NYU or Make3D test sets, the user can simply run evaluateNYU.m or evaluateMake3D.m respectively. Please note that all required data and models will be then automatically downloaded (if they do not already exist) and no further user intervention is needed, except for setting the options opts and netOpts as preferred. Make sure that you have enough free disk space (up to 5 GB). The predictions will be eventually saved in a .mat file in the specified directory.

Alternatively, one could run DepthMapPrediction.m in order to manually use a trained model in test mode to predict the depth maps of arbitrary images.

Models

The models are fully convolutional and use the residual learning idea also for upsampling CNN layers. Here we provide the fastest variant in which interleaving of feature maps is used for upsampling. For this reason, a custom layer +dagnn/Combine.m is provided.

The trained models - namely ResNet-UpProj in the paper - can also be downloaded here:

Results

NEW! The predictions for the validation set of NYU-Depth-v2 dataset can also be downloaded here (.mat).

In the following tables, we report the results that should be obtained after evaluation and also compare to other (most recent) methods on depth prediction from a single image.

  • Error metrics on NYU Depth v2:
State of the art on NYU rel rms log10
Roy & Todorovic (CVPR 2016) 0.187 0.744 0.078
Eigen & Fergus (ICCV 2015) 0.158 0.641 -
Ours 0.127 0.573 0.055
  • Error metrics on Make3D:
State of the art on Make3D rel rms log10
Liu et al. (CVPR 2015) 0.314 8.60 0.119
Li et al. (CVPR 2015) 0.278 7.19 0.092
Ours 0.175 4.45 0.072
  • Qualitative results: Results

Citation

If you use this method in your research, please cite:

@inproceedings{laina2016deeper,
        title={Deeper depth prediction with fully convolutional residual networks},
        author={Laina, Iro and Rupprecht, Christian and Belagiannis, Vasileios and Tombari, Federico and Navab, Nassir},
        booktitle={3D Vision (3DV), 2016 Fourth International Conference on},
        pages={239--248},
        year={2016},
        organization={IEEE}
}

License

Simplified BSD License

Copyright (c) 2016, Iro Laina
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Comments
  • train image

    train image

    i am trying this paper in the caffe i am using NYU2 raw depth dataset. about 12K~13K sampled depth image. at the train task of you, did you use filtered image(cross bf or colorization) or not(raw depth image)? colorization is good filter method but it is too slow

    opened by seokhoonboo 23
  • Training details

    Training details

    Hello!

    I am trying to recreate your results on the NYU_depth dataset with Pytorch. I am fairly confident that my network structure, loss function, and data augmentation process is correct, but I am unable to reach a similar depth image quality as your Tensorflow outputs (see the attached images).

    My guess is that the difference might be in the training process. I tried to work according to your article, but a few details are unclear. You wrote that you gradually reduce the learning rate when you observe plateaus. How do you define a plateau, and what does gradually means in this case?

    To get the results below, I used SGD optimizer with 0.01 init LR and 0.9 momentum, and I halve the learning rate after every 7th epoch.

    some test image: test2

    the output using your Tensorflow network: your_results_with_tf_

    the output using my Pytorch network: test2_res_2_

    awaiting response 
    opened by harsanyika 16
  • Output after training

    Output after training

    Hi,

    I am trying to replicate the output of your network, I have created my training code based on the details your provide in the paper (data augmentation, learning rate init. 0.01, pretrained resnet50 weights, ignoring the invalid pixels, implemented the inv. huber loss, batch size 16 and 20 epoch). The results I got from the network look like this:

    figure_1

    As you can see, the prediction of the network and the real GT look very similar from the geometrical perspective, but it seems they are in a different scale or the network has learned from images with very low resolution and has done an upsampling (because the values are the black ones 990-999 mm and the clearest ones have values of 199X).

    If anyone can bring some light to this please. Best regards.

    opened by Zuriich 14
  • The prediction is not well

    The prediction is not well

    I restore the model that you provide, and finetune the layers except the resnet50. I use Berhu loss. And I use AdamOptimizer to minimize my loss, the learning rate is 0.0001, and after 10000 steps the lr is 0.000001. The whole steps is 20000. Here is my loss curve: finetune Here is the prediction that I use my model: 1 As you see, the loss value is small, but the prediction is not very well, it is ambiguous. Can you give me some advice? Thanks! @iro-cp

    opened by Ariel-JUAN 13
  • Fine-tune result not smooth

    Fine-tune result not smooth

    Hi, @iro-cp , I want to fine-tune from your nyu checkpoint model for my own dataset.

    This is my fine-tune loss (berhu as your paper), fine-tune code is similar with tensorflow-deeplab-resnet Finetune layer : layer16x_ and ConvPred Learning Rate : 0.001 BatchSize : 8 Optimizer : AdamOptimizer

    image

    Tensorboard shows the depth result doesn't smooth in plane scenes (such as wall, red circle) image image

    Can you provide some advise? Thanks very much for sharing this perfect work!

    opened by JackHenry1992 12
  • Question about test the image

    Question about test the image

    Hello, I have a little problem, I try to test the image on preduct,py. When I input commend "python predict.py NYU_ResNet-UpProj.npy (my own image)", it's get a error like this: image

    How to fix it problem? Thanks you!!

    opened by Kn15263ss 8
  • Unpooling indices stored as tf.Variables

    Unpooling indices stored as tf.Variables

    In network.py:

      def prepare_indices(self, before, row, col, after, dims ):
        x_0 = tf.Variable(x0.reshape([-1]), name = 'x_0')
        x_1 = tf.Variable(x1.reshape([-1]), name = 'x_1')
        x_2 = tf.Variable(x2.reshape([-1]), name = 'x_2')
        x_3 = tf.Variable(x3.reshape([-1]), name = 'x_3')
    

    This is the wrong use of tf.Variable (variable nodes are used as a source for value that the net expects to change, typically weights). In this case, this is just reshaping the indices from np.meshgrid, so these values aren't weights, or anything like that. There could be a specific reason these are made as tf.Variable that I'm unaware of, but it seems these lines should be:

      x_0 = x0.reshape([-1])
    

    Why this matters: tf.Variable nodes typically store "trainable" values, which must be stored in checkpoints and loaded weight files. Since these are four 4D-flattened-to-1D arrays and there is a set of these for each up-conversion, this is a lot of data being stored to disk, which must also be loaded from disk (and saved, when creating checkpoints). These are basically indices so no change (learning) is expected, this saving and loading I propose is needless.

    Case in point, these variables seems to be the prime contributor to the long load time of the weights file. In predict.py:

      net.load(model_data_path, sess)
    

    takes several minutes on my computer in the current state. Changing prepare_indices() as indicated above reduces the load time by orders of magnitude, however making this change MIGHT make the new model incompatible with the current weights file, NYU_ResNet-UpProj.npy (I am having trouble making the net work with this change, so more investigation is needed on my end, but I figured I would raise this issue in case others are available to work on resolve this).

    Since this is a non-functional change, I propose the authors try the following:

    1. Remove the tf.Variable nodes as shown above (making them simple operations)
    2. Retrain using identical meta-parameters as in the paper (if the starting weight values are still available)
    3. Compare results pre- and post- change to ensure they generate the same output?

    If the starting weights aren't available, I suppose a full retraining would just need to generate acceptable results.

    opened by rodamn 6
  • Matlab version

    Matlab version

    Hi, I found websave function is not exist in my matlab R2014a. So it will result in error when websave is used in .m file. What can I do to solve this problem?

    opened by CatherineYao 6
  • Can't converge when trainning using TensorFlow

    Can't converge when trainning using TensorFlow

    I am trying to train this model using your TensorFlow code. But it can't converge. I am using 'nyu_depth_v2_labeled.mat'. I accept the L2 loss for convenience. The raw depths are used. Invalid pixels (where depth is zero) have been excluded from training. I have tried to fix the ResNet50 or not.

    The code is as below:

    # NetWork
    graph = tf.Graph()
    with graph.as_default():
        # Create a placeholder for the input image
        tf_train_dataset = tf.placeholder(tf.float32, shape=(None, img_height, img_width, num_channels))
        tf_train_labels = tf.placeholder(tf.float32, shape=(None, depth_height, depth_width))
    
        # Model.
        # net = models.ResNet50UpProj({'data': tf_train_dataset}, batch_size, trainable=True)
        net_ResNet50 = models.ResNet50({'data': tf_train_dataset}, batch_size, trainable=False)
        layer1_BN = net_ResNet50.get_output()
        net = models.UpProj({'layer1_BN': layer1_BN}, batch_size, trainable=True)
    
        # Training computation.
        output = tf.squeeze(net.get_output(), squeeze_dims=[3])
        loss = tf.reduce_mean(tf.nn.l2_loss((output-tf_train_labels)*(tf_train_labels!=0)))
    
        # Optimizer.
        global_step = tf.Variable(0, trainable=False)
        starter_learning_rate = 10**-3
        learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                                   200, 0.8, staircase=True)
        # optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
        momentum = 0.9
        optimizer = tf.train.MomentumOptimizer(learning_rate, momentum).minimize(loss, global_step=global_step)
    
        # Add a scalar summary for the snapshot loss.
        tf.summary.scalar('loss', loss)
    
        # Build the summary Tensor based on the TF collection of Summaries.
        summary = tf.summary.merge_all()
    
    # Train
    with tf.Session(graph=graph) as session:
      tf.global_variables_initializer().run()
      saver = tf.train.Saver()
    
      # Instantiate a SummaryWriter to output summaries and the Graph.
      summary_writer = tf.summary.FileWriter(log_dir, session.graph)
    
      # Load the converted parameters
      print('Loading the model')
      net.load('NYU_ResNet-UpProj.txt',session)
      print("Initialized")
    
      for step in range(num_steps):
        # Pick an offset within the training data, which has been randomized.
        # Note: we could use better randomization across epochs.
        offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
        # Generate a minibatch.
        batch_data = train_dataset[offset:(offset + batch_size), :]
        batch_labels = train_labels[offset:(offset + batch_size), :]
    
        # Prepare a dictionary telling the session where to feed the minibatch.
        # The key of the dictionary is the placeholder node of the graph to be fed,
        # and the value is the numpy array to feed to it.
        feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
        _, l, predictions = session.run([optimizer, loss, output], feed_dict=feed_dict)
    
        if (step % 10 == 0):
          print("Minibatch loss at step %d: %f" % (step, l))
          print("Minibatch accuracy: %.1f" % accuracy(predictions, batch_labels))
          # Update the events file.
          summary_str = session.run(summary, feed_dict=feed_dict)
          summary_writer.add_summary(summary_str, step)
          summary_writer.flush()
    
      # print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))
    
      saver.save(session, model_path)
    
      print('Done!!!')
    

    The loss curve is as follws: loss

    opened by chenynCV 6
  • Error when running your TensorFlow code-predict.py

    Error when running your TensorFlow code-predict.py

    line 46, in setup .conv(1, 1, 512, 2, 2, biased=False, relu=False, name='res3a_branch1') ValueError: ('stride must be less than or equal to filter size', 'stride: [2x2] filter: [1x1]')

    opened by chenynCV 6
  • simpler/more efficient interleaving for up_conv

    simpler/more efficient interleaving for up_conv

    time comparison on a single image (inference): Elapsed time for run: 0:00:02.076628 (pack + reshape) Elapsed time for run: 0:00:02.555231 (dynamic stitch)

    disclaimers:

    • I ran on a CPU instead of a GPU, so it's probably worth testing that.
    • I included the timing code so you guys know what I did to time it (just wrapped the session.run)
    • I included a get_incoming_shape since I just felt like lists with integers are easier to deal with than tensorflow's Dimension stuff.
    • Also added code to display the input image. Thought it would be nice to compare.
    opened by jtatusko 5
  • Running predict.py on multiple images

    Running predict.py on multiple images

    I get an error everytime I try to run the prediction on several images through a jupyter notebook.

    Do you know how I can fix this?

    I think it's better if you define a function load_model in predict.py to create model and load weights from the checkpoint and returns the model that could be passed later to the run function.

    I'm trying to implement that but the tf.Session() seems to make it a bit complicated.

    opened by Saoussenl 0
  • Matlab; invalid input syntax

    Matlab; invalid input syntax

    Hello and thak you for your work!

    Testing your code for the NYU_v2 trained tensorflow models worked great for me! But I'd also like to test your model on outdoor scenes. So I installed Matlab and got it linked to MatConvNet but I cant't get it to run...

    I am tryind to test the code with an arbitray image, therefore my command Window statement is as follows:

    DepthMapPrediction my_picture.jpg Make3D_ResNet-UpProj.mat

    Running this the error statement is as follows:

    **Error using imresize>parsePreMethodArgs (line 379) Invalid input syntax; input image missing from argument list.

    Error in imresize>parseInputs (line 273) parsePreMethodArgs(varargin, method_arg_idx, first_param_string_idx);

    Error in imresize (line 152) params = parseInputs(args{:});

    Error in DepthMapPrediction (line 40) images = imresize(images, net.meta.normalization.imageSize(1:2));**

    Does somebody have any advice on where to go from here?

    (Using Matlab R2020b, and matconvnet-1.0-beta25)

    opened by StillZeroo 2
  • what can the depth esitimation picture do?

    what can the depth esitimation picture do?

    I am confused about I did get the same depth estimation pictures as yours,but I dont know how can i use it.it can get the truth distance between object and camera?if it can get the distance and can you show me the detail?

    opened by Asherchi 4
  • How should the input size be filled

    How should the input size be filled

    I see default input size is 304228,but use this to tensorflow savedmodel or mobile model,I need to specify the input size,always pic is not 304228,I scale it and fill it with rgb(255,255,255),but this will make result accuracy low,so how to adjust input image?

    opened by candrwow 0
  • Not getting good result after training

    Not getting good result after training

    Actually I have prepared my own data set of indoor scene in my environment and want to train model on that. I am freezing all other layers except for the up projection blocks and the result is not so good. Even I trained it on as small data set as 600 images and achieved 82 percent accuracy but the results were not good visually. I donot know the reason of that maybe you can suggest me something. And the images I want to train are approximately 6k. The pretrained weights with NYU are even performing better. batch_size = 32 learning_rate = 1.0e-3 monentum = 0.9 weight_decay = 0.0005 num_epochs = 70 optimizer = torch.optim.SGD(filter(lambda p: p.requires_grad, model.parameters()), lr=learning_rate, momentum=monentum, weight_decay=weight_decay) and lr is halved after 10 epochs.

    MS_LAB_269_unfilled

    Validation depth image Screenshot from 2019-06-27 21-37-28

    rgb image Screenshot from 2019-06-27 21-44-59

    opened by abdur4373 2
Owner
Iro Laina
Iro Laina
Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks

mtcnn-caffe Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks. This project provide you a method to update mu

Weilin Cong 500 Oct 30, 2022
Model stock prediction for iOS

Stockify Problem Investing in Stocks is great way to grow money Picking the right stocks for you can get tedious and confusing Too many things to foll

Sanchitha Dinesh 1 Mar 20, 2022
🌅 iOS11 demo application for visual sentiment prediction.

Sentiment Vision Demo A Demo application using Vision and CoreML frameworks to detect the most likely sentiment of the given image. Model This demo is

Cocoa AI 34 Jan 29, 2022
Accelerated tensor operations and dynamic neural networks based on reverse mode automatic differentiation for every device that can run Swift - from watchOS to Linux

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiati

Palle 87 Dec 29, 2022
Artificial intelligence/machine learning data structures and Swift algorithms for future iOS development. bayes theorem, neural networks, and more AI.

Swift Brain The first neural network / machine learning library written in Swift. This is a project for AI algorithms in Swift for iOS and OS X develo

Vishal 331 Oct 14, 2022
A toolbox of AI modules written in Swift: Graphs/Trees, Support Vector Machines, Neural Networks, PCA, K-Means, Genetic Algorithms

AIToolbox A toolbox of AI modules written in Swift: Graphs/Trees, Linear Regression, Support Vector Machines, Neural Networks, PCA, KMeans, Genetic Al

Kevin Coble 776 Dec 18, 2022
Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood.

Bender Bender is an abstraction layer over MetalPerformanceShaders useful for working with neural networks. Contents Introduction Why did we need Bend

xmartlabs 1.7k Dec 24, 2022
DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning.

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiati

DL4S Team 2 Dec 5, 2021
Takes those cursed usernames you see on social networks and lets them be accessible to screen readers.

AccessibleAuthorLabel ?? Takes those cursed usernames you see on social networks and lets them be accessible to screen readers so everyone can partake

Christian Selig 40 Jan 25, 2022
Automatic colorization using deep neural networks. Colorful Image Colorization. In ECCV, 2016.

Colorful Image Colorization [Project Page] Richard Zhang, Phillip Isola, Alexei A. Efros. In ECCV, 2016. + automatic colorization functionality for Re

Richard Zhang 3k Dec 27, 2022
Shallow and Deep Convolutional Networks for Saliency Prediction

Shallow and Deep Convolutional Networks for Saliency Prediction Paper accepted at 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVP

Image Processing Group - BarcelonaTECH - UPC 183 Jan 5, 2023
The example of running Depth Prediction using Core ML

DepthPrediction-CoreML This project is Depth Prediction on iOS with Core ML. If you are interested in iOS + Machine Learning, visit here you can see v

tucan9389 113 Nov 17, 2022
Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks

mtcnn-caffe Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks. This project provide you a method to update mu

Weilin Cong 500 Oct 30, 2022
iOS association game chatbot. AI based on neural word embedding language model. Image recognition with convolutional neural net.

AssociationBot ##iOS association game chatbot. UI based on JSQMessagesDemo. Association database created with the help of Word2Vec neural word embeddi

Alex Sosnovshchenko 16 Nov 24, 2022
Model stock prediction for iOS

Stockify Problem Investing in Stocks is great way to grow money Picking the right stocks for you can get tedious and confusing Too many things to foll

Sanchitha Dinesh 1 Mar 20, 2022
🌅 iOS11 demo application for visual sentiment prediction.

Sentiment Vision Demo A Demo application using Vision and CoreML frameworks to detect the most likely sentiment of the given image. Model This demo is

Cocoa AI 34 Jan 29, 2022
Visualize your dividend growth. DivRise tracks dividend prices of your stocks, gives you in-depth information about dividend paying stocks like the next dividend date and allows you to log your monthly dividend income.

DivRise DivRise is an iOS app written in Pure SwiftUI that tracks dividend prices of your stocks, gives you in-depth information about dividend paying

Kevin Li 78 Oct 17, 2022
Code examples for Depth APIs in iOS

iOS-Depth-Sampler Code examples of Depth APIs in iOS Requirement Use devices which has a dual camera (e.g. iPhone 8 Plus) or a TrueDepth camera (e.g.

Shuichi Tsutsumi 1.1k Jan 2, 2023
IBrain - Displaying a Point Cloud Using Scene Depth

Displaying a Point Cloud Using Scene Depth Present a visualization of the physic

YuXuan (Andrew) Liu 2 Oct 24, 2022
IPadLiDARExperiment - Simple experiment to capture Depth data from the iPad Pro's LiDAR

iPad LiDAR Experiment Simple experiment to capture and display Depth data from t

Fabio 16 Jul 25, 2022