GithubHelp home page GithubHelp logo

iarroyof / elastic_pytorch_loader Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 4 KB

Python class to load a page of es_page_size from ElasticSearch. This page is consumed in batches of batch_size documents by a pytorch data loader. A new page is loaded before the last batch is consumed by the torch model in training time.

License: MIT License

Python 100.00%

elastic_pytorch_loader's Introduction

Elastic PyTorch Loader

A PyTorch DataLoader for interfacing with ElasticSearch to load documents in batches for training machine learning models.

Installation

You can install the package using pip:

pip install git+https://github.com/iarroyof/elastic_pytorch_loader.git

Usage Examples

Basic Usage

To use the ElasticSearchDataset with a PyTorch DataLoader, follow these steps:

from elastic_pytorch_loader.dataset import ElasticSearchDataset
from torch.utils.data import DataLoader

# Initialize the dataset with your specific parameters
es_dataset = ElasticSearchDataset(
    index='your_index_name',
    es_page_size=1000,
    batch_size=10,
    async_loading=False,
    shuffle=True,
    seed=42
)

# Create a DataLoader
data_loader = DataLoader(es_dataset, batch_size=None)

# Iterate over the DataLoader in your training loop
for batch in data_loader:
    # Your training logic here
    pass

Asynchronous Loading

To enable asynchronous loading of data:

# Set async_loading to True when initializing the dataset
es_dataset = ElasticSearchDataset(
    index='your_index_name',
    es_page_size=1000,
    batch_size=10,
    async_loading=True,  # Enable asynchronous data loading
    shuffle=True,
    seed=42
)
# The rest is the same as the basic usage

Shuffling Data

Shuffling the data can lead to better training performance:

# Set shuffle to True and specify a seed for reproducibility
es_dataset = ElasticSearchDataset(
    index='your_index_name',
    es_page_size=1000,
    batch_size=10,
    async_loading=False,
    shuffle=True,  # Enable shuffling
    seed=42        # Seed for the random number generator
)

# The rest is the same as the basic usage

Make sure to replace 'your_index_name' with the actual name of the ElasticSearch index you are using. These examples provide a clear guide on how to initialize the dataset with different options and use it with a PyTorch DataLoader.

elastic_pytorch_loader's People

Contributors

iarroyof avatar

Watchers

 avatar Kostas Georgiou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.