GithubHelp home page GithubHelp logo

capstone-project-louis-cavaleri's Introduction

Capstone project

Introduction

The goal of this project is to develop a system that can accurately identify, classify and recognise fossil images like dinosaurs or trace fossils.

Identifying fossils can be a time-consuming process that relies on expert knowledge of fossil morphology and can be challenging to identify due to their fragmented and degraded nature.

The main problem to address in this project is the development of a machine learning model capable of accurately recognising and classifying fossils based on their images.

The data

  • Sources and context of the dataset: The dataset is a collection of fossil images obtained by using a web crawler to download fossil images from the Internet and automatically export the data into a structured dataset.

    • reduced-FID dataset: I will use the reduced-FID with 60 thousand images and 50 category of fossils publish by zenodo.org . Links to download the reduced-FID dataset
    • FID dataset: This dataset is used to fill the gaps of the reduced-FID. Links to download the FID dataset.
    • fossil-vs-non-fossil dataset, I used to remove irrelevant images. fossil-vs-non-fossil.zip

  • Samples of the entries, features, values: The dataset is a reduced version of the Fossil Image Dataset that contains 415 thousand images.

  • Number of features and samples: The dataset contains 60 thousand RGB images 1200~ image for each 50 category of fossils.

  • Encoding of the features: The images are stored in subfolder with each subfolder named according to the commune ancestor.

  • Quality of the data: the data is of high quality, with no missing images. However some images are not relevant or have some obstruction like text or humans.

  • Images format: the images have the following format BMP, GIF, JPEG, PNG, TIFF.

  • Below I will display a small sample of some images contained in the dataset.

agnath

Alt text

amphibian

Alt text

theropod

Alt text

Dependencies to install

imagehash is a package that needs to be installed on the environment:

  • using conda: conda install -c conda-forge imagehash
  • using pip: pip install ImageHash

scikit-image 0.21.0 is a package that needs to be installed on the environment:

  • using pip: pip install scikit-image

split-folders is a package that needs to be installed on the environment:

  • using pip: pip install split-folders

yellowbrick 1.5 is a package that needs to be installed on the environment:

  • using pip: pip install yellowbrick

Files:

  1. Data exploration and cleaning.ipynb
  2. Remove-irrelevant-images.ipynb
  3. Fossil-classifier.ipynb

capstone-project-louis-cavaleri's People

Contributors

piwebswiss avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.