calrissian / mango Goto Github PK
View Code? Open in Web Editor NEWCommon utilities for rapid application development
License: Apache License 2.0
Common utilities for rapid application development
License: Apache License 2.0
I'd like to pull the Accumulo Type Encoders back into mango core and maybe call them Lexicographic Type Encoders or something. There are several times I'd like to use them in projects which should not require Accumulo dependencies.
This is just for simple maintainability reasons. The two main ones being
These types of bugs using Mixed-type representations are also very hard to identify
If possible, I'd like to follow the table structure that we wanted RYA to have so that we can use some iterators to perform intersections on the server.
id index
R=type/id
CF=attr name
CQ=attrid\x00attr value
attribute index
R=attribute name\x00type/id\x00attrvalue
value index
R=attribute value\x00type/id\x00attr name
We could pull in extra dependencies to do this. I checked and it seems Guava doesn't support this out of the box. Best solution for now is a utility class in the mango-core/uri/support package. It will be used for many things- specifically parsing of the Pig uri string in the Load/Storefuncs
While it may be a general utility for dealing with accumulo, it makes more sense there. This is especially true with the adoption of Issue #55.
Currently, to compute the Node's hash, a string is continually appended before a hash is generated. This wastes memory and is unnecessary.
For example
MessageDigest digest = getMd5Digest();
for(Node node : children)
digest.update((node.getHash() + "\u0000").getBytes());
hash = Hex.encodeHexString(digest.digest());
or using Guava:
Hasher hasher = Hashing.md5().newHasher();
for(Node node : children)
hasher.putString(node.getHash()).putChar('\u0000');
hash = Hex.encodeHexString(hasher.hash().asBytes());
There should atleast be a home page giving the general purpose of the project.
This should link to the current existing wiki pages based off the old readme files.
It doesn't make sense to have to depend on Accumulo just to have a URI-formatted type linking an entity to another entity. This should be moved up to here- especially since the Entity is in mango-core.
This is the latest version of Jackson.
This is more than simply upgrading the version. Between version 1.9 and 2.0 several things changed.
The dependencies have changed and all the package names are different.
This will be helpful for several reasons. First it gives us the latest optimizations made in Jackson. Second, the namespace change is a blessing in disguise. This means that integration with other components won't cause a conflict with the classloader when working with Jackson. For example later versions of Hadoop use a 1.0.1 dependency of Jackson which makes using jackson difficult.
Specificially the two methods are the same as DigestUtils.md5Hex().
This from a discussion in Issue #15
The default interface will be:
TypeEncoder<T,U>
T is the type of class the encoder is handling, U is the returned encoding type.
There will be two default implementations (for now). The first is for json which will provide the same functionality in as the current toString(), fromString() functions on the current TypeNormalizer interface. The second is for accumulo which will provide the same functionality at the normalize(), denormalize() functions on the current TypeNormalizer interface.
There are very useful readme files for several of the current projects. I propose moving these to wiki pages to allow more flexability in changing the project structure without losing the useful information in the readme.
Most of the decorators and a lot of the utils are either unnecessary or redundant.
Ideally to 1.5.1 or the new 1.6.0 release
Currently there are 2 recognized types.
One more way that should be added is java.net.Inet4Address and in the future a possible away to extract an embedded IPv4 address from an IPv6 address.
Instead of adding more constructors it would be easier to simply wrap all the logic for generating an ipv4 in various static contructor methods. For example
public static IPv4 from(String ipStr) { ... }
public static IPv4 from(Inet4Address inetAddr) { ... }
//Only have one default contructor
public IPv4 (long value) { ... }
If there are no inherent side effects to calling close more than once, then why would there need to be an arbitrary side effect imposed?
Throwing an exception here, simply causes an inconvenience to users of the API and adds additional logic to the implementation with no benefit at all.
As they are not being actively used within any of the calrissian code base, these classes at a minimum need dedicated documentation to explain their purpose.
I would additionally propose renaming them to be more descriptive of their purpose.
For example:
JmsConnectionFactoryTopicDecorator -> SingleTopicConnectionFactory
I know this proposed name isn't great, but it is far more descriptive of what the intended behavior is.
This can be used for applying optimizations so like the following:
Node previous = node.clone();
while(true) {
node.accept(optimizations);
if(node.equals(previous)) break;
previous = node.clone();
}
This is a serious concern especially if you need two configurations in the same JVM.
One culprit of this is the ObjectMapperContext which lets retrieve a single ObjectMapper. This is fine if you only have one configuration, but if any caller configures the object mapper, then all users of that will also have that configuration.
Instead these configurations should be passed in or created locally.
I would additionally propose renaming them to be more descriptive of their purpose.
For example:
JmsConnectionFactoryTopicDecorator -> SingleTopicConnectionFactory
I know this proposed name isn't great, but it is far more descriptive of what the intended behavior is.
Additionally, most of the behavior in the queue and topic decorators is redundant. We should consolidate the core decorator functions to abstract classes that provide the basic funcitonality. That way the specific implementations only need to override the specific functions that they are modifying.
The Long and Integer normalizers (and any normalizers that use the same algorithm) don't handle negative numbers correctly.
-5 -> -0000000005
-10 -> -0000000010
This treats -5 less than -10 lexigraphically.
This is needed. It would be useful if it could return a CloseableIterable of Iterables.
For example the rat plugin, sortpom, and the source/javadoc plugins.
These are great, but can drastically slow down the build. Ideally, these would be run inside a CI build (see Issue #33) to catch any problems such as no license headers.
These both can be dangerous if used incorrectly and are not very robust.
We've been dicussing the possiblity of renaming TupleCollection to something else. Possibly to TupleStore?
This class really doesn't need spring. Additionally, spring-jms brings in a large number of dependencies.
Finally, most of the stuff being done here doesn't need anything special that Spring has to offer, other than make the code a littler easier to handle than the raw jms api.
To start off this should include the AbstractIO streams from the blobstore and DeletingFileInputStream from the jms stuff.
I understand that there are some limitations here, but 1.6 has gone EOL and this should be a priority.
For example
ObjectMapperContext.java -> does no special logic other than to create a generic ObjectMapper.
ByteArray -> does very little that can't be done with standard java libraries or utilties in guava.
ValueRange and CidrValueRangeIPv4 -> can both easily be replaced by Guava's range classes.
DateTimeUtils -> the only method in here is already handled by DateNormalizer.
I like having these, but these aren't common java utilities so much as an answer to a specific usecase.
I would propose making a project for things such as distributed utilities where these types of things can live.
This will remove a dependency on Jackson from mango core. It will also make sense as to where to find json serialization code. Currently these are scattered throughout the packages.
The term serialization is also very misleading as there are several other types of serialization formats, some of which may need to be better supported.
Most of these have the same or no dependencies and would be easier to utilize if they were in the same artifact.
For example, collect, common, hash, and criteria are all very simple and would be easier to manage and use in one project.
This will simply allow for simpler integration with libraries that already provide support for InetAddress.
The reason I suggest extending this class is that the java.net.Inet4Address class is not comparable. This makes it difficult to integrate with Guava's or Mango's Range API which is useful for the current IPv4 implementation. It is also nice to have an efficient way to extract the integer value of the IPv4 which Inet4Address hides by default.
This paradym would also make it easier to role out IPv6 as we can simply extend the Inet6Address class.
This interface already provides an interface for evaluating things. By using this interface it also allows for integration into other guava library calls seemlessly such as Iterables.filter(predicate).
The current Criteria interface as it stands is decent for doing things such as defining queries and providing simple optimizations to the expression tree.
For usecases where evaluation of an expression needs to processed quickly, the use of the visitor pattern is not well suited. For example in using criteria to analyze data in a stream.
Ideally the two types of criteria could inhabit the same interface, but that would require some redesign of the current implementation. Alternatively, we can provide a separate implementation for this usecase, but this will leave redundant code.
The latest release is version 1.5
If we want to stay on 1.4 then maybe consider going to 1.4.3.
Since the tuples are immutable, they can be reused.
This is to allow for the passing of the configured TypeRegistry at run time to a distributed system. For example being able to pass the TypeRegistry to an accumulo iterator or into a Storm bolt at runtime.
All of the capabilities in the cli project can be found and are better supported in other open source projects such as commons-cli or jcommander.
Even though the value of the map containing the tuples is a Set, the remove(Tuple t) method is acting as though the value is still a tuple. This is an artifact left over from when the tuples were a collection instead of a map.
These may not be the most efficient, but they work and they have been used on several different sub-projects, including ones at my current day job.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.