GithubHelp home page GithubHelp logo

Comments (9)

eawagner avatar eawagner commented on August 22, 2024

I am for breaking up the way types are handled. For accumulo normalization, I think this is a great idea.

For things such as converting to strings for Json representation, we need another way. Currently the two are linked in the Types normalizers.

Unless these get broken up, I am not too comfortable in having an extra third party dependency simply to use the Types apis.

from mango.

cjnolet avatar cjnolet commented on August 22, 2024

Seems like one is for normalization and one is for lexicoding. I don't mind proposing a design change and breaking these up.

from mango.

eawagner avatar eawagner commented on August 22, 2024

Ok looking at type normalization, we have 5 main functions on a TypeNormalizer.

resolves -> the class it handles
getAlias -> short name for the class
asString -> readable representation (used only for json)
fromString -> inverse of asString
normalize -> lexigraphically sortable representation (used only for accumulo)
denomalize -> inverse of normalize

It seems to me that if we come up with a more generic name for asString/normalize and fromString/denormalize, we can have a common types interface that can back normalizers, encoders, serializers.

Maybe something along the lines of
resolves -> the class it handles
getAlias -> short name for the class
encode -> some string representation
decode -> inverse of encode.

Then have implementations for each type specified for serialization and another set for accumulo using Lexicoders or current normalization (or both).

from mango.

cjnolet avatar cjnolet commented on August 22, 2024

+1 Per IM conversation with Edward Wagner:

We should probably use bytes for the lexicographically sortable encoders and use strings for the pretty-print encoders. Probably could use generics to type the implementation of the encoder to string or byte array.

from mango.

eawagner avatar eawagner commented on August 22, 2024

Opened issue Issue #27, to discuss and track the changes to the types API.

from mango.

eawagner avatar eawagner commented on August 22, 2024

One issue that I have found using the lexicoders is more of a recipes problem. We will need to be more careful with how we construct rows.

Currently we will do something like longval + "\u0000" + longval. Then to parse it we simply do a string split. This logic doesn't work with the lexicoders as an encoded long will have "\u0000" in the encoded value if any byte did not have any set bits.

This simply means that we need to be more careful parsing rows. For example extract 8 bytes for the first long then another 8 for the next.

Just something to keep in mind, but I still want to include this functionality at some point.

from mango.

cjnolet avatar cjnolet commented on August 22, 2024

Yeah, but at the same time, it would also be more efficient to pull split the strings using known indexes when possible rather than having an algorithm (like StringUtils) need to do a possible O(n) search through the string.

from mango.

eawagner avatar eawagner commented on August 22, 2024

Now that these are actually included in accumulo, does this make sense to do anymore?

Especially in mango?

from mango.

cjnolet avatar cjnolet commented on August 22, 2024

Can we close this? I'm in agreement that I don't think we need it.

from mango.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.