GithubHelp home page GithubHelp logo

nerzid / variable-recommendation-system Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 0.0 20 KB

Variable recommendation system for Java

License: Apache License 2.0

Java 100.00%
java software-engineering recommendation-system

variable-recommendation-system's Introduction

variable-recommendation-system

variable-recommendation-system's People

Contributors

nerzid avatar

Watchers

James Cloos avatar  avatar

variable-recommendation-system's Issues

Bad and Good Token examples

There are tokens which is used in variables. Some of them are good because they are intuitive. One of the challenges in this project is to define and objectify the intuitism behind use of the tokens.

Good Tokens:

  • RGB(refers to Red Green Blue and it is highly used as abbrevated form in computer vision)
  • num(refers to number)
  • min, max, avg, pow(refers to minimum, maximum, average, power in mathematics)
  • freq(refers to frequency)
  • repo(refers to repository)
  • id(refers to identity)
  • iter(refers to iterator)
  • lat, long(refers to latitude, longitude)
  • histo(refers to histogram)
  • val(refers to value)
  • init(refers to initialize)
  • msg(refers to message)
  • obj(refers to object)
  • attr(refers to attribute)
  • spec(refers to specification)
  • arg(refers to argument)
  • param(refers to parameter)
  • yaml, txt, json(refers to file extensions. Still this may need some discuss. Using these tokens in variable naming sounds bad idea. File extension can already be option or class name so use of its name as tokens can be unneccasary. Still, it sometimes gives information to user especially if the programming language is dynamically typed(e.g. python) rather than statically typed(e.g. Java))

Bad Tokens:

  • cur(can be anything. most common use is that it is abbrevation for current)
  • orig(refers to original or origin ???)
  • resp(refers to response ???)
  • req(refers to require or requirement or request ???)
  • exc(refers to execute ???)
  • the, a, an(why use these in any name ?)
  • dep(refers to depth ???)
  • mov(refers to move ??? if so, why drop only one letter just to shorten it ?)
  • mem(refers to memory ???)

Undecided Tokens: This tokens's fate aren't decided yet. We need to discuss further more to say that these are good and bad tokens.

  • str(refers mostly to string, but it's frequency is pretty low in well-written projects. Main reason behind this is that if a variable's type is string then the wording itself must imply that variable's type actually a string. For example; we use String name; rather than String nameStr. It is intiutive, yes, yet it leads users the use bad, ambigious variable names.)
  • inc(refers to increment or increase or else ???)
  • var(refers to variable, but why use this token in variable ?)
  • comp(referes to component ???)
  • Non-English words(if we didn't find anything similar to given token, what should we do?(e.g. user defined names may not be in our dictionary, like EUnit in my master thesis))

Need a validation method

We need to validate that our recommendation system works. One of the things that we can try goes as follows

  1. Install word net
  2. Collect the tokens that doesn't have any meaning at all OR has really low frequency(currently threshold is <100) using word net
  3. Use recommendation system to generate proper tokens.
  4. Compare the bad token with the token we generate(this will be done by manually labeling)

Improve string matching algorithm

Levenshtein and Jaccard doesn't work well with our recommendation system.
For example;
when we give input cur
it recommends score or other words but we actually expect the outcome to be current.

Some of the rules I think of when abbreviating in programming;

  • Always use the first letter in abbreviation which means do not dismiss it when abbreviating
  • Phonetics matter
  • There are abbreviations exist which dismiss all vowels(e.g. message -> msg) and also exist which doesn't dismiss vowels(e.g. preference -> pref).
  • If there is a letter repetition as in "message" or "quantity", after removing vowels, it's almost everytime the excess letters are removed(e.g. "message -> msg , quantity, qty).

Types of abbreviations are;

  • Use the first part of the word(e.g. preference -> pref)
  • Use the first letters of multiple words(e.g. red green blue -> rgb)
  • Remove all vowels except the first one(e.g. message -> msg. Still it doesn't mean that all non-vowels letters will stay. Like in the example, we kept only the one 's')
  • Monosyllabic words(e.g. mount -> mt) I didn't decided it yet that this is being used in programming.

Remove noise from tokens

I'll list some of the removals to decrease the noise in our token database here.

  • All tokens lowercased
  • Removed vowel letters(a, e, i, o, u) we do this only when searching for token, we still have original tokens in our database
  • Removed digits and merged the same tokens(request1 and request2 are same tokens and they are merged with the token request. The word merge here means we summed their frequencies)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.