GithubHelp home page GithubHelp logo

Venkata Satya Sai Ajay Daliparthi's Projects

comparing-the-performance-effect-of-various-cnn-architectures-on-image-captioning icon comparing-the-performance-effect-of-various-cnn-architectures-on-image-captioning

In our study, we compared the performance effect of four Different CNN Architectures VGG16, DenseNet121, MobileNet, and ResNet50 (Pre-trained on ImageNet dataset) as encoder, while using LSTM as de- coder in Image captioning . The Bilingual Evaluation Understudy (BLEU) metric is used to evaluate the results generated by four models. The mean BLEU scores for the Models are VGG16(0.015), DenseNet121(0.010), MobileNet(0.013) and ResNet50(0.024). We evaluated the models on Flickr8K Dataset. Experimental results show that using ResNet50 as CNN encoder shows a huge difference in performance compared to more recent State-of-the-art image classification Networks like DenseNet121 and MobileNet.

pdfnet-pointwise-dense-flow-network-for-urban-scene-segmentation icon pdfnet-pointwise-dense-flow-network-for-urban-scene-segmentation

Using a deep convolutional neural network (CNN) as a feature encoder (or backbone) is the most commonly observed architectural pattern in several computer vision methods, and semantic segmentation is no exception. The two major drawbacks of this architectural pattern are: (i) the networks often fail to capture small classes such as wall, fence, pole, traffic light, traffic sign, and bicycle, which are crucial for autonomous vehicles to make accurate decisions. (ii) due to the arbitrarily increasing depth, the networks require massive labeled data and additional regularization techniques to converge and to prevent the risk of over-fitting, respectively. While regularization techniques come at minimal cost, the collection of labeled data is an expensive and laborious process. In this work, we address these two drawbacks by proposing a novel lightweight architecture named point-wise dense flow network (PDFNet). In PDFNet, we employ dense, residual, and multiple shortcut connections to allow a smooth gradient flow to all parts of the network. The extensive experiments on Cityscapes and CamVid benchmarks demonstrate that our method significantly outperforms baselines in capturing small classes and in few-data regimes. Moreover, our method achieves considerable performance in classifying out-of-the training distribution samples, evaluated on Cityscapes to KITTI dataset.

semantic-segmentation-of-urban-scene-images-using-recurrent-neural-networks icon semantic-segmentation-of-urban-scene-images-using-recurrent-neural-networks

This study investigates the performance effect of using recurrent neural networks (RNNs) for semantic segmentation of urban scene images, to generate a semantic output map with refined edges. We proposed three deep neural network architectures using recurrent neural networks and evaluated them on the Cityscapes dataset. All three proposed architectures outperformed the baseline and shown improvement in classifying edges. Additionally, we showed a new method for using RNN for any prior semantic segmentation network that makes use of skip connections. PyTorch was the selected framework for conducting this study.

the-ikshana-hypothesis-of-human-scene-understanding icon the-ikshana-hypothesis-of-human-scene-understanding

In recent years, deep neural networks (DNNs) achieved state-of-the-art performance on many computer vision tasks. However, the one typical drawback of these DNNs is the requirement of massive labeled data. Even though few-shot learning methods addressed this problem through metric-learning and meta-learning techniques, in this work, we address this problem from a neuroscience perspective. We propose a theory named Ikshana, to explain the functioning of the human brain, while humans understand an image. By following the Ikshana theory, we propose a novel neural-inspired CNN architecture named IkshanaNet for semantic segmentation. The empirical results demonstrate the effectiveness of our method on few data samples, outperforming several baselines, on the Cityscapes and the CamVid benchmarks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.