GithubHelp home page GithubHelp logo

Comments (5)

santhoshkolloju avatar santhoshkolloju commented on June 6, 2024 1

give me few days i will write a detailed post how you can run on your data

from abstractive-summarization-with-transfer-learning.

santhoshkolloju avatar santhoshkolloju commented on June 6, 2024

I have provided the code to generate the tf records file

from abstractive-summarization-with-transfer-learning.

Vibha111094 avatar Vibha111094 commented on June 6, 2024

Do we need to create an empty file gs://bert_summ/train.tf_record and then call the function 'file_based_convert_examples_to_features' ?

from abstractive-summarization-with-transfer-learning.

callzhang avatar callzhang commented on June 6, 2024

I have provided the code to generate the tf records file

Is this the right way?

def get_dataset(processor,
                tokenizer,
                data_dir,
                max_seq_length_src,
                max_seq_length_tgt,
                batch_size,
                mode,
                output_dir,
                is_distributed=False):
    """
    Args:
        processor: Data Preprocessor, must have get_lables,
            get_train/dev/test/examples methods defined.
        tokenizer: The Sentence Tokenizer. Generally should be
            SentencePiece Model.
        data_dir: The input data directory.
        max_seq_length: Max sequence length.
        batch_size: mini-batch size.
        model: `train`, `eval` or `test`.
        output_dir: The directory to save the TFRecords in.
    """
    #label_list = processor.get_labels()
    if mode == 'train':
        train_examples = processor.get_train_examples(data_dir)
        #train_file = os.path.join(output_dir, "train.tf_record")
        train_file = "gs://bert_summarization/train.tf_record"
        file_based_convert_examples_to_features(
           train_examples, max_seq_length_src,max_seq_length_tgt,
           tokenizer, train_file)
        dataset = file_based_input_fn_builder(
            input_file=train_file,
            max_seq_length_src=max_seq_length_src,
            max_seq_length_tgt =max_seq_length_tgt,
            is_training=True,
            drop_remainder=True,
            is_distributed=is_distributed)({'batch_size': batch_size})
    elif mode == 'eval':
        eval_examples = processor.get_dev_examples(data_dir)
        #eval_file = os.path.join(output_dir, "eval.tf_record")
        eval_file = "gs://bert_summarization/eval.tf_record"
        file_based_convert_examples_to_features(
           eval_examples, max_seq_length_src,max_seq_length_tgt,
           tokenizer, eval_file)
        dataset = file_based_input_fn_builder(
            input_file=eval_file,
            max_seq_length_src=max_seq_length_src,
            max_seq_length_tgt =max_seq_length_tgt,
            is_training=False,
            drop_remainder=True,
            is_distributed=is_distributed)({'batch_size': batch_size})
    elif mode == 'test':
      
        test_examples = processor.get_test_examples(data_dir)
        #test_file = os.path.join(output_dir, "predict.tf_record")
        test_file = "gs://bert_summarization/predict.tf_record"
        
        file_based_convert_examples_to_features(
           test_examples, max_seq_length_src,max_seq_length_tgt,
           tokenizer, test_file)
        dataset = file_based_input_fn_builder(
            input_file=test_file,
            max_seq_length_src=max_seq_length_src,
            max_seq_length_tgt =max_seq_length_tgt,
            is_training=False,
            drop_remainder=True,
            is_distributed=is_distributed)({'batch_size': batch_size})
    return dataset

from abstractive-summarization-with-transfer-learning.

callzhang avatar callzhang commented on June 6, 2024

give me few days i will write a detailed post how you can run on your data

That will be awesome. I have gone ahead started training with the code modification mentioned above. I would love to read your post and learn more details about it.

from abstractive-summarization-with-transfer-learning.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.