GithubHelp home page GithubHelp logo

mango's People

Contributors

cjnolet avatar dependabot[bot] avatar eawagner avatar kvangrae avatar roshanp avatar skahmann avatar tequalsme avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

mango's Issues

Accumulo type encoders

I'd like to pull the Accumulo Type Encoders back into mango core and maybe call them Lexicographic Type Encoders or something. There are several times I'd like to use them in projects which should not require Accumulo dependencies.

Use primative types over alternate java.lang Types

This is just for simple maintainability reasons. The two main ones being

  • Avoids a lot of useless null checking before calling methods.
  • Allows for cleaner code. for example being able to use "==" instead of ".equals(...)"

These types of bugs using Mixed-type representations are also very hard to identify

Entity Store for the Accumulo Recipes

If possible, I'd like to follow the table structure that we wanted RYA to have so that we can use some iterators to perform intersections on the server.

id index
R=type/id
CF=attr name
CQ=attrid\x00attr value

attribute index
R=attribute name\x00type/id\x00attrvalue

value index
R=attribute value\x00type/id\x00attr name

URI Query string parser utility method

We could pull in extra dependencies to do this. I checked and it seems Guava doesn't support this out of the box. Best solution for now is a utility class in the mango-core/uri/support package. It will be used for many things- specifically parsing of the Pig uri string in the Load/Storefuncs

Merkle tree's hashnode should use an iterative hash

Currently, to compute the Node's hash, a string is continually appended before a hash is generated. This wastes memory and is unnecessary.

For example

MessageDigest digest = getMd5Digest();
for(Node node : children)
    digest.update((node.getHash() + "\u0000").getBytes());

hash = Hex.encodeHexString(digest.digest());

or using Guava:

Hasher hasher = Hashing.md5().newHasher();
for(Node node : children)
    hasher.putString(node.getHash()).putChar('\u0000');

hash = Hex.encodeHexString(hasher.hash().asBytes());

Set up the wiki

There should atleast be a home page giving the general purpose of the project.

This should link to the current existing wiki pages based off the old readme files.

Upgrade Jackson to version 2.2.2

This is the latest version of Jackson.

This is more than simply upgrading the version. Between version 1.9 and 2.0 several things changed.

The dependencies have changed and all the package names are different.

This will be helpful for several reasons. First it gives us the latest optimizations made in Jackson. Second, the namespace change is a blessing in disguise. This means that integration with other components won't cause a conflict with the classloader when working with Jackson. For example later versions of Hadoop use a 1.0.1 dependency of Jackson which makes using jackson difficult.

Make Types interface more generic.

This from a discussion in Issue #15

The default interface will be:
TypeEncoder<T,U>

  • Class resolves();
  • String getAlias();
  • U encode(T value);
  • T decode(U value);

T is the type of class the encoder is handling, U is the returned encoding type.

There will be two default implementations (for now). The first is for json which will provide the same functionality in as the current toString(), fromString() functions on the current TypeNormalizer interface. The second is for accumulo which will provide the same functionality at the normalize(), denormalize() functions on the current TypeNormalizer interface.

Move readme information to wiki

There are very useful readme files for several of the current projects. I propose moving these to wiki pages to allow more flexability in changing the project structure without losing the useful information in the readme.

Use static methods to create Ipv4 objects from various formats

Currently there are 2 recognized types.

One more way that should be added is java.net.Inet4Address and in the future a possible away to extract an embedded IPv4 address from an IPv6 address.

Instead of adding more constructors it would be easier to simply wrap all the logic for generating an ipv4 in various static contructor methods. For example

public static IPv4 from(String ipStr) { ... }
public static IPv4 from(Inet4Address inetAddr) { ... }

//Only have one default contructor
public IPv4 (long value) { ... }

BatchScannerWithScanners shouldn't throw an exception on close

If there are no inherent side effects to calling close more than once, then why would there need to be an arbitrary side effect imposed?

Throwing an exception here, simply causes an inconvenience to users of the API and adds additional logic to the implementation with no benefit at all.

Remove mango-jms module

As they are not being actively used within any of the calrissian code base, these classes at a minimum need dedicated documentation to explain their purpose.

I would additionally propose renaming them to be more descriptive of their purpose.
For example:

JmsConnectionFactoryTopicDecorator -> SingleTopicConnectionFactory

I know this proposed name isn't great, but it is far more descriptive of what the intended behavior is.

Add equals() method to Node class.

This can be used for applying optimizations so like the following:

Node previous = node.clone();
while(true) {
node.accept(optimizations);
if(node.equals(previous)) break;
previous = node.clone();
}

Dont use mutatable static configurations

This is a serious concern especially if you need two configurations in the same JVM.

One culprit of this is the ObjectMapperContext which lets retrieve a single ObjectMapper. This is fine if you only have one configuration, but if any caller configures the object mapper, then all users of that will also have that configuration.

Instead these configurations should be passed in or created locally.

Jms Decorators should be renamed and consolidated.

I would additionally propose renaming them to be more descriptive of their purpose.
For example:

JmsConnectionFactoryTopicDecorator -> SingleTopicConnectionFactory

I know this proposed name isn't great, but it is far more descriptive of what the intended behavior is.

Additionally, most of the behavior in the queue and topic decorators is redundant. We should consolidate the core decorator functions to abstract classes that provide the basic funcitonality. That way the specific implementations only need to override the specific functions that they are modifying.

Move non build related plugins to profiles

For example the rat plugin, sortpom, and the source/javadoc plugins.

These are great, but can drastically slow down the build. Ideally, these would be run inside a CI build (see Issue #33) to catch any problems such as no license headers.

Remove Spring dependency from mango-jms

This class really doesn't need spring. Additionally, spring-jms brings in a large number of dependencies.

Finally, most of the stuff being done here doesn't need anything special that Spring has to offer, other than make the code a littler easier to handle than the raw jms api.

Create an IO section in mango

To start off this should include the AbstractIO streams from the blobstore and DeletingFileInputStream from the jms stuff.

Upgrade to Java 7

I understand that there are some limitations here, but 1.6 has gone EOL and this should be a priority.

Several classes are unnecessary

For example

ObjectMapperContext.java -> does no special logic other than to create a generic ObjectMapper.

ByteArray -> does very little that can't be done with standard java libraries or utilties in guava.

ValueRange and CidrValueRangeIPv4 -> can both easily be replaced by Guava's range classes.

DateTimeUtils -> the only method in here is already handled by DateNormalizer.

Move jms and uri packages to its own project

I like having these, but these aren't common java utilities so much as an answer to a specific usecase.

I would propose making a project for things such as distributed utilities where these types of things can live.

Move all Mango-core serialization to a Mango-json project

This will remove a dependency on Jackson from mango core. It will also make sense as to where to find json serialization code. Currently these are scattered throughout the packages.

The term serialization is also very misleading as there are several other types of serialization formats, some of which may need to be better supported.

Collapse several of these projects into fewer projects

Most of these have the same or no dependencies and would be easier to utilize if they were in the same artifact.

For example, collect, common, hash, and criteria are all very simple and would be easier to manage and use in one project.

Consider extending java.net.Inet4Address to implement IPv4

This will simply allow for simpler integration with libraries that already provide support for InetAddress.

The reason I suggest extending this class is that the java.net.Inet4Address class is not comparable. This makes it difficult to integrate with Guava's or Mango's Range API which is useful for the current IPv4 implementation. It is also nice to have an efficient way to extract the integer value of the IPv4 which Inet4Address hides by default.

This paradym would also make it easier to role out IPv6 as we can simply extend the Inet6Address class.

Adding a more robust Criteria interface.

The current Criteria interface as it stands is decent for doing things such as defining queries and providing simple optimizations to the expression tree.

For usecases where evaluation of an expression needs to processed quickly, the use of the visitor pattern is not well suited. For example in using criteria to analyze data in a stream.

Ideally the two types of criteria could inhabit the same interface, but that would require some redesign of the current implementation. Alternatively, we can provide a separate implementation for this usecase, but this will leave redundant code.

Update Accumulo version

The latest release is version 1.5

If we want to stay on 1.4 then maybe consider going to 1.4.3.

Make type encoders and registry Serializable

This is to allow for the passing of the configured TypeRegistry at run time to a distributed system. For example being able to pass the TypeRegistry to an accumulo iterator or into a Storm bolt at runtime.

CLI project is unnecessary

All of the capabilities in the cli project can be found and are better supported in other open source projects such as commons-cli or jcommander.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.