GithubHelp home page GithubHelp logo

creating a NeMo model about nemo HOT 1 CLOSED

ShabnamRA avatar ShabnamRA commented on June 21, 2024
creating a NeMo model

from nemo.

Comments (1)

ShabnamRA avatar ShabnamRA commented on June 21, 2024

In this modified version provided here, split() is called without specifying any separator, which defaults to splitting based on whitespace characters such as space, tab, or newline. This resolved the ValueError caused by the empty separator.You need to modify this tutorial as follows :

class NeMoGPTv2(NeMoGPT):

    def setup_training_data(self, train_data_config: OmegaConf):
        self.vocab = None
        self._train_dl = self._setup_data_loader(train_data_config)

        # Save the vocab into a text file for now
        with open('vocab.txt', 'w') as f:
            for token in self.vocab:
                f.write(f"{token}")

        # This is going to register the file into .nemo!
        # When you later use .save_to(), it will copy this file into the tar file.
        self.register_artifact('vocab_file', 'vocab.txt')


    def setup_validation_data(self, val_data_config: OmegaConf):
        vocab_file = self.register_artifact('vocab_file', 'vocab.txt')

        with open(vocab_file, 'r') as f:
            vocab = f.read().split()[:-1]  # Split based on whitespace characters
            self.vocab = vocab

        self._validation_dl = self._setup_data_loader(val_data_config)


    def setup_test_data(self, test_data_config: OmegaConf):
        # This is going to try to find the same file, and if it fails,
        # it will use the copy in .nemo
        vocab_file = self.register_artifact('vocab_file', 'vocab.txt')

        with open(vocab_file, 'r') as f:
            vocab = []
            vocab = f.read().split()[:-1]  # the -1 here is for the dangling  token in the file
            self.vocab = vocab

        self._test_dl = self._setup_data_loader(test_data_config)

from nemo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.