Deeper Depth Prediction with Fully Convolutional Residual Networks (FCRN)

Iro Laina

Last update: Dec 22, 2022

Related tags

Overview

Deeper Depth Prediction with Fully Convolutional Residual Networks

By Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, Nassir Navab.

Introduction
Quick Guide
Models
Results
Citation
License

Introduction

This repository contains the CNN models trained for depth prediction from a single RGB image, as described in the paper "Deeper Depth Prediction with Fully Convolutional Residual Networks". The provided models are those that were used to obtain the results reported in the paper on the benchmark datasets NYU Depth v2 and Make3D for indoor and outdoor scenes respectively. Moreover, the provided code can be used for inference on arbitrary images.

Quick Guide

The trained models are currently provided in two frameworks, MatConvNet and TensorFlow. Please read below for more information on how to get started.

TensorFlow

The code provided in the tensorflow folder requires accordingly a successful installation of the TensorFlow library (any platform). The model's graph is constructed in fcrn.py and the corresponding weights can be downloaded using the link below. The implementation is based on ethereon's Caffe-to-TensorFlow conversion tool. predict.py provides sample code for using the network to predict the depth map of an input image. Use python predict.py NYU_FCRN.ckpt yourimage.jpg to try the code.

MatConvNet

Prerequisites

The code provided in the matlab folder requires the MatConvNet toolbox for CNNs. It is required that a version of the library equal or newer than the 1.0-beta20 is successfully compiled either with or without GPU support. Furthermore, the user should modify matconvnet_path = '../matconvnet-1.0-beta20' within evaluateNYU.m and evaluateMake3D.m so that it points to the correct path, where the library is stored.

How-to

For acquiring the predicted depth maps and evaluation on NYU or Make3D test sets, the user can simply run evaluateNYU.m or evaluateMake3D.m respectively. Please note that all required data and models will be then automatically downloaded (if they do not already exist) and no further user intervention is needed, except for setting the options opts and netOpts as preferred. Make sure that you have enough free disk space (up to 5 GB). The predictions will be eventually saved in a .mat file in the specified directory.

Alternatively, one could run DepthMapPrediction.m in order to manually use a trained model in test mode to predict the depth maps of arbitrary images.

Models

The models are fully convolutional and use the residual learning idea also for upsampling CNN layers. Here we provide the fastest variant in which interleaving of feature maps is used for upsampling. For this reason, a custom layer +dagnn/Combine.m is provided.

The trained models - namely ResNet-UpProj in the paper - can also be downloaded here:

NYU Depth v2: MatConvNet model, TensorFlow model (.npy), TensorFlow model (.ckpt)
Make3D: MatConvNet model, TensorFlow model (soon)

Results

NEW! The predictions for the validation set of NYU-Depth-v2 dataset can also be downloaded here (.mat).

In the following tables, we report the results that should be obtained after evaluation and also compare to other (most recent) methods on depth prediction from a single image.

Error metrics on NYU Depth v2:

State of the art on NYU	rel	rms	log10
Roy & Todorovic (CVPR 2016)	0.187	0.744	0.078
Eigen & Fergus (ICCV 2015)	0.158	0.641	-
Ours	0.127	0.573	0.055

Error metrics on Make3D:

State of the art on Make3D	rel	rms	log10
Liu et al. (CVPR 2015)	0.314	8.60	0.119
Li et al. (CVPR 2015)	0.278	7.19	0.092
Ours	0.175	4.45	0.072

Qualitative results:

Citation

If you use this method in your research, please cite:

@inproceedings{laina2016deeper,
        title={Deeper depth prediction with fully convolutional residual networks},
        author={Laina, Iro and Rupprecht, Christian and Belagiannis, Vasileios and Tombari, Federico and Navab, Nassir},
        booktitle={3D Vision (3DV), 2016 Fourth International Conference on},
        pages={239--248},
        year={2016},
        organization={IEEE}
}

License

Simplified BSD License

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Comments

train image

i am trying this paper in the caffe i am using NYU2 raw depth dataset. about 12K~13K sampled depth image. at the train task of you, did you use filtered image(cross bf or colorization) or not(raw depth image)? colorization is good filter method but it is too slow

opened by seokhoonboo 23
Training details

Hello!

I am trying to recreate your results on the NYU_depth dataset with Pytorch. I am fairly confident that my network structure, loss function, and data augmentation process is correct, but I am unable to reach a similar depth image quality as your Tensorflow outputs (see the attached images).

My guess is that the difference might be in the training process. I tried to work according to your article, but a few details are unclear. You wrote that you gradually reduce the learning rate when you observe plateaus. How do you define a plateau, and what does gradually means in this case?

To get the results below, I used SGD optimizer with 0.01 init LR and 0.9 momentum, and I halve the learning rate after every 7th epoch.

some test image:

the output using your Tensorflow network:

the output using my Pytorch network:
awaiting response

opened by harsanyika 16
Output after training

Hi,

I am trying to replicate the output of your network, I have created my training code based on the details your provide in the paper (data augmentation, learning rate init. 0.01, pretrained resnet50 weights, ignoring the invalid pixels, implemented the inv. huber loss, batch size 16 and 20 epoch). The results I got from the network look like this:

As you can see, the prediction of the network and the real GT look very similar from the geometrical perspective, but it seems they are in a different scale or the network has learned from images with very low resolution and has done an upsampling (because the values are the black ones 990-999 mm and the clearest ones have values of 199X).

If anyone can bring some light to this please. Best regards.

opened by Zuriich 14
The prediction is not well

I restore the model that you provide, and finetune the layers except the resnet50. I use Berhu loss. And I use AdamOptimizer to minimize my loss, the learning rate is 0.0001, and after 10000 steps the lr is 0.000001. The whole steps is 20000. Here is my loss curve: Here is the prediction that I use my model: As you see, the loss value is small, but the prediction is not very well, it is ambiguous. Can you give me some advice? Thanks! @iro-cp

opened by Ariel-JUAN 13
Fine-tune result not smooth

Hi, @iro-cp , I want to fine-tune from your nyu checkpoint model for my own dataset.

This is my fine-tune loss (berhu as your paper), fine-tune code is similar with tensorflow-deeplab-resnet Finetune layer : layer16x_ and ConvPred Learning Rate : 0.001 BatchSize : 8 Optimizer : AdamOptimizer

Tensorboard shows the depth result doesn't smooth in plane scenes (such as wall, red circle)

Can you provide some advise? Thanks very much for sharing this perfect work!

opened by JackHenry1992 12
Question about test the image

Hello, I have a little problem, I try to test the image on preduct,py. When I input commend "python predict.py NYU_ResNet-UpProj.npy (my own image)", it's get a error like this:

How to fix it problem? Thanks you!!

opened by Kn15263ss 8
Unpooling indices stored as tf.Variables
In network.py:

def prepare_indices(self, before, row, col, after, dims ): x_0 = tf.Variable(x0.reshape([-1]), name = 'x_0') x_1 = tf.Variable(x1.reshape([-1]), name = 'x_1') x_2 = tf.Variable(x2.reshape([-1]), name = 'x_2') x_3 = tf.Variable(x3.reshape([-1]), name = 'x_3')

This is the wrong use of tf.Variable (variable nodes are used as a source for value that the net expects to change, typically weights). In this case, this is just reshaping the indices from np.meshgrid, so these values aren't weights, or anything like that. There could be a specific reason these are made as tf.Variable that I'm unaware of, but it seems these lines should be:

x_0 = x0.reshape([-1])

Why this matters: tf.Variable nodes typically store "trainable" values, which must be stored in checkpoints and loaded weight files. Since these are four 4D-flattened-to-1D arrays and there is a set of these for each up-conversion, this is a lot of data being stored to disk, which must also be loaded from disk (and saved, when creating checkpoints). These are basically indices so no change (learning) is expected, this saving and loading I propose is needless.

Case in point, these variables seems to be the prime contributor to the long load time of the weights file. In predict.py:

net.load(model_data_path, sess)

takes several minutes on my computer in the current state. Changing prepare_indices() as indicated above reduces the load time by orders of magnitude, however making this change MIGHT make the new model incompatible with the current weights file, NYU_ResNet-UpProj.npy (I am having trouble making the net work with this change, so more investigation is needed on my end, but I figured I would raise this issue in case others are available to work on resolve this).

Since this is a non-functional change, I propose the authors try the following:

Remove the tf.Variable nodes as shown above (making them simple operations)

Retrain using identical meta-parameters as in the paper (if the starting weight values are still available)

Compare results pre- and post- change to ensure they generate the same output?

If the starting weights aren't available, I suppose a full retraining would just need to generate acceptable results.
opened by rodamn 6
Matlab version

Hi, I found websave function is not exist in my matlab R2014a. So it will result in error when websave is used in .m file. What can I do to solve this problem？

opened by CatherineYao 6

Can't converge when trainning using TensorFlow

I am trying to train this model using your TensorFlow code. But it can't converge. I am using 'nyu_depth_v2_labeled.mat'. I accept the L2 loss for convenience. The raw depths are used. Invalid pixels (where depth is zero) have been excluded from training. I have tried to fix the ResNet50 or not.

The code is as below:

# NetWork
graph = tf.Graph()
with graph.as_default():
    # Create a placeholder for the input image
    tf_train_dataset = tf.placeholder(tf.float32, shape=(None, img_height, img_width, num_channels))
    tf_train_labels = tf.placeholder(tf.float32, shape=(None, depth_height, depth_width))

    # Model.
    # net = models.ResNet50UpProj({'data': tf_train_dataset}, batch_size, trainable=True)
    net_ResNet50 = models.ResNet50({'data': tf_train_dataset}, batch_size, trainable=False)
    layer1_BN = net_ResNet50.get_output()
    net = models.UpProj({'layer1_BN': layer1_BN}, batch_size, trainable=True)

    # Training computation.
    output = tf.squeeze(net.get_output(), squeeze_dims=[3])
    loss = tf.reduce_mean(tf.nn.l2_loss((output-tf_train_labels)*(tf_train_labels!=0)))

    # Optimizer.
    global_step = tf.Variable(0, trainable=False)
    starter_learning_rate = 10**-3
    learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
                                               200, 0.8, staircase=True)
    # optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
    momentum = 0.9
    optimizer = tf.train.MomentumOptimizer(learning_rate, momentum).minimize(loss, global_step=global_step)

    # Add a scalar summary for the snapshot loss.
    tf.summary.scalar('loss', loss)

    # Build the summary Tensor based on the TF collection of Summaries.
    summary = tf.summary.merge_all()

# Train
with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  saver = tf.train.Saver()

  # Instantiate a SummaryWriter to output summaries and the Graph.
  summary_writer = tf.summary.FileWriter(log_dir, session.graph)

  # Load the converted parameters
  print('Loading the model')
  net.load('NYU_ResNet-UpProj.txt',session)
  print("Initialized")

  for step in range(num_steps):
    # Pick an offset within the training data, which has been randomized.
    # Note: we could use better randomization across epochs.
    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)
    # Generate a minibatch.
    batch_data = train_dataset[offset:(offset + batch_size), :]
    batch_labels = train_labels[offset:(offset + batch_size), :]

    # Prepare a dictionary telling the session where to feed the minibatch.
    # The key of the dictionary is the placeholder node of the graph to be fed,
    # and the value is the numpy array to feed to it.
    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}
    _, l, predictions = session.run([optimizer, loss, output], feed_dict=feed_dict)

    if (step % 10 == 0):
      print("Minibatch loss at step %d: %f" % (step, l))
      print("Minibatch accuracy: %.1f" % accuracy(predictions, batch_labels))
      # Update the events file.
      summary_str = session.run(summary, feed_dict=feed_dict)
      summary_writer.add_summary(summary_str, step)
      summary_writer.flush()

  # print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

  saver.save(session, model_path)

  print('Done!!!')

The loss curve is as follws: loss

opened by chenynCV 6

Error when running your TensorFlow code-predict.py

line 46, in setup .conv(1, 1, 512, 2, 2, biased=False, relu=False, name='res3a_branch1') ValueError: ('stride must be less than or equal to filter size', 'stride: [2x2] filter: [1x1]')

opened by chenynCV 6
simpler/more efficient interleaving for up_conv
time comparison on a single image (inference): Elapsed time for run: 0:00:02.076628 (pack + reshape) Elapsed time for run: 0:00:02.555231 (dynamic stitch)

disclaimers:

I ran on a CPU instead of a GPU, so it's probably worth testing that.

I included the timing code so you guys know what I did to time it (just wrapped the session.run)

I included a get_incoming_shape since I just felt like lists with integers are easier to deal with than tensorflow's Dimension stuff.

Also added code to display the input image. Thought it would be nice to compare.
opened by jtatusko 5
Running predict.py on multiple images

I get an error everytime I try to run the prediction on several images through a jupyter notebook.

Do you know how I can fix this?

I think it's better if you define a function load_model in predict.py to create model and load weights from the checkpoint and returns the model that could be passed later to the run function.

I'm trying to implement that but the tf.Session() seems to make it a bit complicated.

opened by Saoussenl 0
Matlab; invalid input syntax

Hello and thak you for your work!

Testing your code for the NYU_v2 trained tensorflow models worked great for me! But I'd also like to test your model on outdoor scenes. So I installed Matlab and got it linked to MatConvNet but I cant't get it to run...

I am tryind to test the code with an arbitray image, therefore my command Window statement is as follows:

DepthMapPrediction my_picture.jpg Make3D_ResNet-UpProj.mat

Running this the error statement is as follows:

**Error using imresize>parsePreMethodArgs (line 379) Invalid input syntax; input image missing from argument list.

Error in imresize>parseInputs (line 273) parsePreMethodArgs(varargin, method_arg_idx, first_param_string_idx);

Error in imresize (line 152) params = parseInputs(args{:});

Error in DepthMapPrediction (line 40) images = imresize(images, net.meta.normalization.imageSize(1:2));**

Does somebody have any advice on where to go from here?

(Using Matlab R2020b, and matconvnet-1.0-beta25)

opened by StillZeroo 2
what can the depth esitimation picture do?

I am confused about I did get the same depth estimation pictures as yours,but I dont know how can i use it.it can get the truth distance between object and camera?if it can get the distance and can you show me the detail?

opened by Asherchi 4
How should the input size be filled

I see default input size is 304228,but use this to tensorflow savedmodel or mobile model,I need to specify the input size,always pic is not 304228,I scale it and fill it with rgb(255,255,255),but this will make result accuracy low,so how to adjust input image?

opened by candrwow 0
Not getting good result after training

Actually I have prepared my own data set of indoor scene in my environment and want to train model on that. I am freezing all other layers except for the up projection blocks and the result is not so good. Even I trained it on as small data set as 600 images and achieved 82 percent accuracy but the results were not good visually. I donot know the reason of that maybe you can suggest me something. And the images I want to train are approximately 6k. The pretrained weights with NYU are even performing better. batch_size = 32 learning_rate = 1.0e-3 monentum = 0.9 weight_decay = 0.0005 num_epochs = 70 optimizer = torch.optim.SGD(filter(lambda p: p.requires_grad, model.parameters()), lr=learning_rate, momentum=monentum, weight_decay=weight_decay) and lr is halved after 10 epochs.

Validation depth image

rgb image

opened by abdur4373 2

Owner

Iro Laina

GitHub

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks

mtcnn-caffe Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks. This project provide you a method to update mu

500 Oct 30, 2022

Model stock prediction for iOS

Stockify Problem Investing in Stocks is great way to grow money Picking the right stocks for you can get tedious and confusing Too many things to foll

1 Mar 20, 2022

🌅 iOS11 demo application for visual sentiment prediction.

Sentiment Vision Demo A Demo application using Vision and CoreML frameworks to detect the most likely sentiment of the given image. Model This demo is

34 Jan 29, 2022

Accelerated tensor operations and dynamic neural networks based on reverse mode automatic differentiation for every device that can run Swift - from watchOS to Linux

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiati

87 Dec 29, 2022

Artificial intelligence/machine learning data structures and Swift algorithms for future iOS development. bayes theorem, neural networks, and more AI.

Swift Brain The first neural network / machine learning library written in Swift. This is a project for AI algorithms in Swift for iOS and OS X develo

331 Oct 14, 2022

A toolbox of AI modules written in Swift: Graphs/Trees, Support Vector Machines, Neural Networks, PCA, K-Means, Genetic Algorithms

AIToolbox A toolbox of AI modules written in Swift: Graphs/Trees, Linear Regression, Support Vector Machines, Neural Networks, PCA, KMeans, Genetic Al

776 Dec 18, 2022

Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood.

Bender Bender is an abstraction layer over MetalPerformanceShaders useful for working with neural networks. Contents Introduction Why did we need Bend

1.7k Dec 24, 2022

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning.

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiati

2 Dec 5, 2021

Takes those cursed usernames you see on social networks and lets them be accessible to screen readers.

AccessibleAuthorLabel ?? Takes those cursed usernames you see on social networks and lets them be accessible to screen readers so everyone can partake

40 Jan 25, 2022

Automatic colorization using deep neural networks. Colorful Image Colorization. In ECCV, 2016.

Colorful Image Colorization [Project Page] Richard Zhang, Phillip Isola, Alexei A. Efros. In ECCV, 2016. + automatic colorization functionality for Re

3k Dec 27, 2022

Shallow and Deep Convolutional Networks for Saliency Prediction

Shallow and Deep Convolutional Networks for Saliency Prediction Paper accepted at 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVP

Image Processing Group - BarcelonaTECH - UPC

183 Jan 5, 2023

The example of running Depth Prediction using Core ML

DepthPrediction-CoreML This project is Depth Prediction on iOS with Core ML. If you are interested in iOS + Machine Learning, visit here you can see v

113 Nov 17, 2022

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks

mtcnn-caffe Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks. This project provide you a method to update mu

500 Oct 30, 2022

iOS association game chatbot. AI based on neural word embedding language model. Image recognition with convolutional neural net.

AssociationBot ##iOS association game chatbot. UI based on JSQMessagesDemo. Association database created with the help of Word2Vec neural word embeddi

16 Nov 24, 2022

Model stock prediction for iOS

Stockify Problem Investing in Stocks is great way to grow money Picking the right stocks for you can get tedious and confusing Too many things to foll

1 Mar 20, 2022

🌅 iOS11 demo application for visual sentiment prediction.

Sentiment Vision Demo A Demo application using Vision and CoreML frameworks to detect the most likely sentiment of the given image. Model This demo is

34 Jan 29, 2022

Visualize your dividend growth. DivRise tracks dividend prices of your stocks, gives you in-depth information about dividend paying stocks like the next dividend date and allows you to log your monthly dividend income.

DivRise DivRise is an iOS app written in Pure SwiftUI that tracks dividend prices of your stocks, gives you in-depth information about dividend paying

78 Oct 17, 2022

Deeper Depth Prediction with Fully Convolutional Residual Networks (FCRN)

Related tags

Overview

Deeper Depth Prediction with Fully Convolutional Residual Networks

Contents

Introduction

Quick Guide

TensorFlow

MatConvNet

Models

Results

Citation

License

Comments

Owner

Iro Laina

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks

Model stock prediction for iOS

🌅 iOS11 demo application for visual sentiment prediction.

Accelerated tensor operations and dynamic neural networks based on reverse mode automatic differentiation for every device that can run Swift - from watchOS to Linux

Artificial intelligence/machine learning data structures and Swift algorithms for future iOS development. bayes theorem, neural networks, and more AI.

A toolbox of AI modules written in Swift: Graphs/Trees, Support Vector Machines, Neural Networks, PCA, K-Means, Genetic Algorithms

Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood.

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning.

Takes those cursed usernames you see on social networks and lets them be accessible to screen readers.

Automatic colorization using deep neural networks. Colorful Image Colorization. In ECCV, 2016.

Shallow and Deep Convolutional Networks for Saliency Prediction

The example of running Depth Prediction using Core ML

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks

iOS association game chatbot. AI based on neural word embedding language model. Image recognition with convolutional neural net.

Model stock prediction for iOS

🌅 iOS11 demo application for visual sentiment prediction.

Visualize your dividend growth. DivRise tracks dividend prices of your stocks, gives you in-depth information about dividend paying stocks like the next dividend date and allows you to log your monthly dividend income.

Code examples for Depth APIs in iOS

IBrain - Displaying a Point Cloud Using Scene Depth

IPadLiDARExperiment - Simple experiment to capture Depth data from the iPad Pro's LiDAR