GithubHelp home page GithubHelp logo

Comments (3)

guody5 avatar guody5 commented on May 18, 2024
>>>from transformers import AutoTokenizer, AutoModel
>>>import torch
>>>tokenizer = AutoTokenizer.from_pretrained("microsoft/codebert-base")
>>>model = AutoModel.from_pretrained("microsoft/codebert-base")
>>>tokens=tokenizer.tokenize("def max(a,b):")
['def', 'Ġmax', '(', 'a', ',', 'b', '):']
>>>tokens=[tokenizer.cls_token]+tokens+[tokenizer.sep_token]     
['<s>', 'def', 'Ġmax', '(', 'a', ',', 'b', '):', '</s>']
>>>tokens_ids=tokenizer.convert_tokens_to_ids(tokens)
[0, 9232, 19220, 1640, 102, 6, 428, 3256, 2]
>>>context_embeddings=model(torch.tensor(tokens_ids)[None,:])[0][0]
tensor([[-0.1740,  0.2737,  0.0452,  ..., -0.2411, -0.2950,  0.2668],
        [-1.0550, -0.1229,  0.6714,  ..., -0.5628, -0.1209,  0.4683],
        [-0.9436,  0.3294, -0.0098,  ..., -0.3375, -0.5014,  0.6879],
        ...,
        [-0.3381,  0.4317,  0.4450,  ..., -0.4600, -0.4070,  0.6626],
        [-0.3735, -0.1088,  0.6358,  ..., -0.6854, -0.0860,  0.2248],
        [-0.1740,  0.2744,  0.0457,  ..., -0.2414, -0.2962,  0.2675]],
       grad_fn=<SelectBackward>)

from codebert.

cs17b027 avatar cs17b027 commented on May 18, 2024

its just the token embedding of code but I want to embed nl-pl pair

from codebert.

guody5 avatar guody5 commented on May 18, 2024
>>> from transformers import AutoTokenizer, AutoModel
>>> import torch
>>> tokenizer = AutoTokenizer.from_pretrained("microsoft/codebert-base")
>>> model = AutoModel.from_pretrained("microsoft/codebert-base")
>>> nl_tokens=tokenizer.tokenize("return maximum value")
['return', 'Ġmaximum', 'Ġvalue']
>>> code_tokens=tokenizer.tokenize("def max(a,b): if a>b: return a else return b")
['def', 'Ġmax', '(', 'a', ',', 'b', '):', 'Ġif', 'Ġa', '>', 'b', ':', 'Ġreturn', 'Ġa', 'Ġelse', 'Ġreturn', 'Ġb']
>>> tokens=[tokenizer.cls_token]+nl_tokens+[tokenizer.sep_token]+code_tokens+[tokenizer.sep_token]
['<s>', 'return', 'Ġmaximum', 'Ġvalue', '</s>', 'def', 'Ġmax', '(', 'a', ',', 'b', '):', 'Ġif', 'Ġa', '>', 'b', ':', 'Ġreturn', 'Ġa', 'Ġelse', 'Ġreturn', 'Ġb', '</s>']
>>> tokens_ids=tokenizer.convert_tokens_to_ids(tokens)
[0, 30921, 4532, 923, 2, 9232, 19220, 1640, 102, 6, 428, 3256, 114, 10, 15698, 428, 35, 671, 10, 1493, 671, 741, 2]
>>> context_embeddings=model(torch.tensor(tokens_ids)[None,:])[0]
torch.Size([1, 23, 768])
tensor([[-0.1423,  0.3766,  0.0443,  ..., -0.2513, -0.3099,  0.3183],
        [-0.5739,  0.1333,  0.2314,  ..., -0.1240, -0.1219,  0.2033],
        [-0.1579,  0.1335,  0.0291,  ...,  0.2340, -0.8801,  0.6216],
        ...,
        [-0.4042,  0.2284,  0.5241,  ..., -0.2046, -0.2419,  0.7031],
        [-0.3894,  0.4603,  0.4797,  ..., -0.3335, -0.6049,  0.4730],
        [-0.1433,  0.3785,  0.0450,  ..., -0.2527, -0.3121,  0.3207]],
       grad_fn=<SelectBackward>)

from codebert.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.