calrissian / mango Goto Github PK

View Code? Open in Web Editor NEW

17.0 17.0 7.0 1.52 MB

Common utilities for rapid application development

License: Apache License 2.0

Java 100.00%

mango's People

Contributors

Stargazers

Watchers

Forkers

eawagner anujsrc skahmann ahmedomarjee willmurnane cuilike dizhaung

mango's Issues

Accumulo type encoders

I'd like to pull the Accumulo Type Encoders back into mango core and maybe call them Lexicographic Type Encoders or something. There are several times I'd like to use them in projects which should not require Accumulo dependencies.

Use primative types over alternate java.lang Types

This is just for simple maintainability reasons. The two main ones being

Avoids a lot of useless null checking before calling methods.
Allows for cleaner code. for example being able to use "==" instead of ".equals(...)"

These types of bugs using Mixed-type representations are also very hard to identify

Entity Store for the Accumulo Recipes

If possible, I'd like to follow the table structure that we wanted RYA to have so that we can use some iterators to perform intersections on the server.

id index
R=type/id
CF=attr name
CQ=attrid\x00attr value

attribute index
R=attribute name\x00type/id\x00attrvalue

value index
R=attribute value\x00type/id\x00attr name

URI Query string parser utility method

We could pull in extra dependencies to do this. I checked and it seems Guava doesn't support this out of the box. Best solution for now is a utility class in the mango-core/uri/support package. It will be used for many things- specifically parsing of the Pig uri string in the Load/Storefuncs

Set up Travis CI

Move Scanners class to Accumulo Recipes Common project

While it may be a general utility for dealing with accumulo, it makes more sense there. This is especially true with the adoption of Issue #55.

Merkle tree's hashnode should use an iterative hash

Currently, to compute the Node's hash, a string is continually appended before a hash is generated. This wastes memory and is unnecessary.

For example

MessageDigest digest = getMd5Digest();
for(Node node : children)
    digest.update((node.getHash() + "\u0000").getBytes());

hash = Hex.encodeHexString(digest.digest());

or using Guava:

Hasher hasher = Hashing.md5().newHasher();
for(Node node : children)
    hasher.putString(node.getHash()).putChar('\u0000');

hash = Hex.encodeHexString(hasher.hash().asBytes());

Set up the wiki

There should atleast be a home page giving the general purpose of the project.

This should link to the current existing wiki pages based off the old readme files.

RelationshipType moved from accumulo-recipes to mango-core

It doesn't make sense to have to depend on Accumulo just to have a URI-formatted type linking an entity to another entity. This should be moved up to here- especially since the Entity is in mango-core.

Upgrade Jackson to version 2.2.2

This is the latest version of Jackson.

This is more than simply upgrading the version. Between version 1.9 and 2.0 several things changed.

The dependencies have changed and all the package names are different.

This will be helpful for several reasons. First it gives us the latest optimizations made in Jackson. Second, the namespace change is a blessing in disguise. This means that integration with other components won't cause a conflict with the classloader when working with Jackson. For example later versions of Hadoop use a 1.0.1 dependency of Jackson which makes using jackson difficult.

HashUtils should be replaced with commons codec DigestUtils

Specificially the two methods are the same as DigestUtils.md5Hex().

http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/digest/DigestUtils.html#md5Hex%28byte[]%29

Make Types interface more generic.

This from a discussion in Issue #15

The default interface will be:
TypeEncoder<T,U>

Class resolves();
String getAlias();
U encode(T value);
T decode(U value);

T is the type of class the encoder is handling, U is the returned encoding type.

There will be two default implementations (for now). The first is for json which will provide the same functionality in as the current toString(), fromString() functions on the current TypeNormalizer interface. The second is for accumulo which will provide the same functionality at the normalize(), denormalize() functions on the current TypeNormalizer interface.

Move readme information to wiki

There are very useful readme files for several of the current projects. I propose moving these to wiki pages to allow more flexability in changing the project structure without losing the useful information in the readme.

There are a lot of stale classes in mango-jms

Most of the decorators and a lot of the utils are either unnecessary or redundant.

Upgrade Accumulo dependency

Ideally to 1.5.1 or the new 1.6.0 release

Use static methods to create Ipv4 objects from various formats

Currently there are 2 recognized types.

One more way that should be added is java.net.Inet4Address and in the future a possible away to extract an embedded IPv4 address from an IPv6 address.

Instead of adding more constructors it would be easier to simply wrap all the logic for generating an ipv4 in various static contructor methods. For example

public static IPv4 from(String ipStr) { ... }
public static IPv4 from(Inet4Address inetAddr) { ... }

//Only have one default contructor
public IPv4 (long value) { ... }

BatchScannerWithScanners shouldn't throw an exception on close

If there are no inherent side effects to calling close more than once, then why would there need to be an arbitrary side effect imposed?

Throwing an exception here, simply causes an inconvenience to users of the API and adds additional logic to the implementation with no benefit at all.

Add clone() and equals() methods to Criteria. Add TreeCriteria class for being the base for all Criteria nodes.

Apache Rat to verify license header is added to all files

Remove mango-jms module

As they are not being actively used within any of the calrissian code base, these classes at a minimum need dedicated documentation to explain their purpose.

I would additionally propose renaming them to be more descriptive of their purpose.
For example:

JmsConnectionFactoryTopicDecorator -> SingleTopicConnectionFactory

I know this proposed name isn't great, but it is far more descriptive of what the intended behavior is.

Incorporate json config file to make mango-types pluggable.

Commom criteria building capability

Add equals() method to Node class.

This can be used for applying optimizations so like the following:

Node previous = node.clone();
while(true) {
node.accept(optimizations);
if(node.equals(previous)) break;
previous = node.clone();
}

Common serialization of tuples using jackson to keep datatype

Dont use mutatable static configurations

This is a serious concern especially if you need two configurations in the same JVM.

One culprit of this is the ObjectMapperContext which lets retrieve a single ObjectMapper. This is fine if you only have one configuration, but if any caller configures the object mapper, then all users of that will also have that configuration.

Instead these configurations should be passed in or created locally.

Jms Decorators should be renamed and consolidated.

I would additionally propose renaming them to be more descriptive of their purpose.
For example:

JmsConnectionFactoryTopicDecorator -> SingleTopicConnectionFactory

I know this proposed name isn't great, but it is far more descriptive of what the intended behavior is.

Additionally, most of the behavior in the queue and topic decorators is redundant. We should consolidate the core decorator functions to abstract classes that provide the basic funcitonality. That way the specific implementations only need to override the specific functions that they are modifying.

The long and integer normalizers don't handle negative numbers correctly.

The Long and Integer normalizers (and any normalizers that use the same algorithm) don't handle negative numbers correctly.

-5 -> -0000000005
-10 -> -0000000010

This treats -5 less than -10 lexigraphically.

CloseableIterables.partition

This is needed. It would be useful if it could return a CloseableIterable of Iterables.

Move non build related plugins to profiles

For example the rat plugin, sortpom, and the source/javadoc plugins.

These are great, but can drastically slow down the build. Ideally, these would be run inside a CI build (see Issue #33) to catch any problems such as no license headers.

Remove ThreadPoolConnector and BatchScannerWithScanners classes from mango-accumulo

These both can be dangerous if used incorrectly and are not very robust.

[DISCUSS] Rename TupleCollection

We've been dicussing the possiblity of renaming TupleCollection to something else. Possibly to TupleStore?

Remove Spring dependency from mango-jms

This class really doesn't need spring. Additionally, spring-jms brings in a large number of dependencies.

Finally, most of the stuff being done here doesn't need anything special that Spring has to offer, other than make the code a littler easier to handle than the raw jms api.

Create an IO section in mango

To start off this should include the AbstractIO streams from the blobstore and DeletingFileInputStream from the jms stuff.

Upgrade to Java 7

I understand that there are some limitations here, but 1.6 has gone EOL and this should be a priority.

Several classes are unnecessary

For example

ObjectMapperContext.java -> does no special logic other than to create a generic ObjectMapper.

ByteArray -> does very little that can't be done with standard java libraries or utilties in guava.

ValueRange and CidrValueRangeIPv4 -> can both easily be replaced by Guava's range classes.

DateTimeUtils -> the only method in here is already handled by DateNormalizer.

Merge mango types with Keith Turner's lexicoders.

Move jms and uri packages to its own project

I like having these, but these aren't common java utilities so much as an answer to a specific usecase.

I would propose making a project for things such as distributed utilities where these types of things can live.

Move all Mango-core serialization to a Mango-json project

This will remove a dependency on Jackson from mango core. It will also make sense as to where to find json serialization code. Currently these are scattered throughout the packages.

The term serialization is also very misleading as there are several other types of serialization formats, some of which may need to be better supported.

Collapse several of these projects into fewer projects

Most of these have the same or no dependencies and would be easier to utilize if they were in the same artifact.

For example, collect, common, hash, and criteria are all very simple and would be easier to manage and use in one project.

Consider extending java.net.Inet4Address to implement IPv4

This will simply allow for simpler integration with libraries that already provide support for InetAddress.

The reason I suggest extending this class is that the java.net.Inet4Address class is not comparable. This makes it difficult to integrate with Guava's or Mango's Range API which is useful for the current IPv4 implementation. It is also nice to have an efficient way to extract the integer value of the IPv4 which Inet4Address hides by default.

This paradym would also make it easier to role out IPv6 as we can simply extend the Inet6Address class.

Have the new Criteria matchers extend Guava's Predicate

This interface already provides an interface for evaluating things. By using this interface it also allows for integration into other guava library calls seemlessly such as Iterables.filter(predicate).

Adding a more robust Criteria interface.

The current Criteria interface as it stands is decent for doing things such as defining queries and providing simple optimizations to the expression tree.

For usecases where evaluation of an expression needs to processed quickly, the use of the visitor pattern is not well suited. For example in using criteria to analyze data in a stream.

Ideally the two types of criteria could inhabit the same interface, but that would require some redesign of the current implementation. Alternatively, we can provide a separate implementation for this usecase, but this will leave redundant code.

calrissian / mango Goto Github PK

mango's People

Contributors

Stargazers

Watchers

Forkers

mango's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs