GithubHelp home page GithubHelp logo

liveramp / jack Goto Github PK

View Code? Open in Web Editor NEW
69.0 125.0 61.0 6.33 MB

A set of scripts for generating fully functional Java database models from Ruby's ActiveRecord models and migrations.

Home Page: https://liveramp.github.io/jack/

License: Other

Ruby 6.78% Java 85.74% HTML 6.83% Shell 0.33% Lex 0.32%
orm activerecord-model

jack's Introduction

Jack

Build Status

Do you use Ruby/Rails and Java in your company? We do. And we're sick and tired of maintaining two different sets of schemas, models, and whatnot!

To that end, we've created Jack (Java ACtive record (+K)). The project consists of:

  • a scheme for defining and managing multiple Rails projects that contribute models
  • a Ruby parser for schema.rb and ActiveRecord models that generates Java code
  • a Java library that provides model-oriented database access and database-wide query on top of standard database connections

Project Organization

A Jack project consists of two things:

  1. The project definition file
  2. One or more Rails project

Project Definition File

The project definition file is a YAML file that tells Jack where to find your Rails projects and how to generate code. Here's an annotated example:

# This is the class path where you want the top-level generated code to go.
databases_namespace: com.rapleaf.jack.test_project
# Here, we define each of the separate databases for which Jack should
# generate code. Each database is roughly equivalent to a Rails project.
databases:
  -
    # The root namespace for this database. It's nice, but not required, to
    # have this be a subpackage of the 'databases_namespace'.
    root_namespace: com.rapleaf.jack.test_project.database_1
    # What do you want to call this database? Leave out the "_production".
    db_name: Database1
    # The path to the schema.rb in your Rails project.
    schema_rb: database_1/db/schema.rb
    # The path to the app/models dir in your Rails project.
    models: database_1/app/models
    # Tables to be ignored. No code will be generated for them.
    ignored_tables:
      table_1
      table_2
  -
    root_namespace: com.rapleaf.jack.test_project.database_2
    db_name: Database2
    schema_rb: database_2/db/schema.rb
    models: database_2/app/models

Rails Projects

Jack supports generating code for an arbitrary number of inter-related Rails 3 projects. If you only have one Rails project, then things are easy - just configure your project.yml appropriately.

If you have more than one project, here's the setup we suggest. (We use this ourselves.)

/all_my_databases
  /project.yml
  /rails_project_1
  /rails_project_2
/ruby_project_that_uses_rails_project_2
  /include/rails_project_2              # <= svn external to /all_my_databases/rails_project_2

Running the Generator

Running the Jack generator is easy. From a fresh clone, do the following:

export PATH=$PATH:`pwd`/src/rb

Then, change directories to wherever your project.yml lives and run:

ruby jack.rb project.yml /path/for/generated/code

Assuming everything is configured correctly, that's it.

Note: We know that the path thing stinks. We're going to improve this in a future version.

Layout of the Generated Code

The Java code that Jack produces is designed around interfaces so that it is very modular and mockable.

Models

The generated models contain getters and setters for all the fields, as well as getters for detected associations. In contrast to ActiveRecord models, there are no CRUD methods on the Java models.

Model Persistences

This is where the CRUD methods live. The generic base class supports:

  • find
  • find all (with and without conditions)
  • finding by foreign key
  • delete by id or instance
  • delete all
  • save (update)
  • cache manipulators

while there is a unique, per-model interface and implementation that additionally provides:

  • create

Databases

You get one Database per database entry in your project.yml. Their main purpose is to provide a collection of all the individual model persistences. You can also execute queries across all models within each database.

All Databases

Finally, there is one overarching Databases class that serves as a collection for all of the databases configured in your project.yml. Generally, this is what you will instantiate, though you can subsequently get the Database or Persistence instance you actually care about and use that.

Download

You can find releases on The Central Repository and find snapshots on Sonatype OSSRH (OSS Repository Hosting).

To get snapshots, add the OSSRH snapshot repository. See the guide to using multiple repositories.

<repository>
  <id>ossrh-snapshots</id>
  <url>https://oss.sonatype.org/content/repositories/snapshots/</url>
  <releases>
    <enabled>false</enabled>
  </releases>
</repository>

License

Copyright 2014 LiveRamp

Licensed under the Apache License, Version 2.0

http://www.apache.org/licenses/LICENSE-2.0

jack's People

Contributors

amaurysabran avatar atlantis-github-bakohsh5 avatar bpodgursky avatar briancecker avatar bryanduxbury avatar databanana avatar dwarri avatar eddiesiegel avatar henryzhao81 avatar jacobcrofts avatar johnjelinek avatar joshk0 avatar kurt-von-laven avatar lancehc avatar npoulallion avatar ongman avatar pkozikow avatar pwestling avatar rfaugeroux avatar roshan avatar rupertchen avatar seancarr avatar sherifnada avatar sidoh avatar takashiyonebayashi avatar tenzing-shaw avatar thearchduke avatar tuliren avatar tv4fun avatar twshaw3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jack's Issues

Aggregated model query incompatible with MySql 5.7.5

ModelQuery#getSelectClause adds the id column to its select clause (code):

sqlClause.append("id, ");

However, when querying with a group by clause, this is incompatible with the new ONLY_FULL_GROUP_BY mode introduced in MySql 5.7.5 (reference):

Reject queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on (uniquely determined by) GROUP BY columns.

As of MySQL 5.7.5, the default SQL mode includes ONLY_FULL_GROUP_BY. (Before 5.7.5, MySQL does not detect functional dependency and ONLY_FULL_GROUP_BY is not enabled by default.

Db connection is hung when there is a network change

A db connection will hang when there is a network change. This is because we don't enforce timeout at any level. For detailed explanation, please refer to this article.

There are several ways to fix it:

  1. Specify socket timeout. This is easy to implement (modify connection string), but it is hard to find the optimal value, or to use special setting for legitimate long-running query.
  2. Enforce timeout on transactor. This requires a refactoring of the transactor implementation (relevant PR: #210).
  3. Call isValid before checking out a connection from the pool (modify this method). This approach will only partially solve this problem. Queries that are already running while there is a network change will still suffer from this issue. In addition, currently IDb methods are implemented in each implementation, while many of the methods are the same and can be extracted to a base class. Should we choose this approach, we should consider performing this refactoring as well.

specs/tests for ruby code

It would be good to start unit testing/spec'ing the actual ruby generator classes. We could use much more precise examples to exercise weird edge cases.

add support for transactions

startTransaction() and commitTransaction() would be helpful functions at the database level

startTransaction could look something like:
dbConn.getConnection().setAutoCommit(false);
savepoint = dbConn.getConnection().setSavepoint();

and commitTransaction could look like:
dbConn.getConnection().commit();
dbConn.getConnection().setAutoCommit(true);

rollbackTransaction would be a little more involved

:through associations are created but do not work

an association such as
class Student
has_many :classes, :through => :student_class
end

creates the association to Class in the Student java class, however it's treated like a regular has_many (i.e. Class is expected to have Student's ids as a foreign key)

Upgrade commons-pool2 to 2.4.3 to prevent error when creating multiple transactor instances

The EvictionTimer in commons-pools version 2.4.2 has the following cancel method:

static synchronized void cancel(TimerTask task) {
  task.cancel();
  _usageCount--;
  if (_usageCount == 0) {
    _timer.cancel();
    _timer = null;
  }
}

It does not wait for the timer#cancel call to finish. This will lead to concurrency issue when users try to create multiple transactors in a very short period of time. What happens is:

  1. The GenericObjectPool constructor calls the startEvictor methods twice when we do need object eviction. The second time it is called, it will first cancel the initialized evictor and create a new one.
final void startEvictor(long delay) {
  synchronized (evictionLock) {
    if (null != evictor) {
      EvictionTimer.cancel(evictor);
      evictor = null;
      evictionIterator = null;
    }
    if (delay > 0) {
      evictor = new Evictor();
      EvictionTimer.schedule(evictor, delay, delay);
    }
  }
}
  1. When multiple transactors are created, the first transactor construction creates the evictor, cancels it, and creates it again.
  2. The second transactor construction will do the same. However, the cancellation of the timer started in the construction of the first transactor may not have completed. When that happens, and the EvicationTimer#schedule method is called, it will throw the following exception:
java.lang.IllegalStateException: Timer already cancelled.
        at java.util.Timer.sched(Timer.java:397)
        at java.util.Timer.schedule(Timer.java:248)
        at org.apache.commons.pool2.impl.EvictionTimer.schedule(EvictionTimer.java:76)
        at org.apache.commons.pool2.impl.BaseGenericObjectPool.startEvictor(BaseGenericObjectPool.java:695)
        at org.apache.commons.pool2.impl.BaseGenericObjectPool.setTimeBetweenEvictionRunsMillis(BaseGenericObjectPool.java:458)
        at org.apache.commons.pool2.impl.GenericObjectPool.setConfig(GenericObjectPool.java:317)
        at org.apache.commons.pool2.impl.GenericObjectPool.<init>(GenericObjectPool.java:117)
        at com.rapleaf.jack.transaction.DbPoolManager.<init>(DbPoolManager.java:73)

A relevant issue is reported in POOL-315 (this issue is based on updated EvictionTimer so the code is not exactly the same).

This is fixed in version 2.4.3 by that the EvictionTimer#cancel uses an executor service to run everything and users are allowed to specify a timeout value for the executor, so that the method will only return after the cancellation has completed. However, 2.4.3 has not been released yet.

For now, just don't create multiple transactors in very short period of time.

"updated_at" is not set if the current value in the db is null

Expected

When a model defines updated_at field of type Long, the value of updated_at should always be automatically set to the current time when the model is modified.

Observed

When the model is loaded, if the updated_at column is null in the database, it will be null in the model's field and it will not be set to the current time when the model is modified.

I think the issue is within AbstractDatabaseModel#updatedAtCanBeHandled1. When the value of the field is null, it is unable to determine the type of the field. A possible solution is to add a getFieldType method that returns the type's Class.

Possible Workaround

A workaround for this is to explicitly set the value of the updated_at field to 0L. This allows Jack to confirm updated_at is of type Long and it will overwrite the 0 with the current time.

`GenericConstraint<T> or()` might have unexpected behavior for chained constraints

When writing something like:

constraint1.or(constraint2, constraint3)

The generated SQL statement is the following:

WHERE (constraint1 OR (constraint2) AND (constraint3))

I think we should expect this to be the following:

WHERE (constraint1 OR ((constraint2) AND (constraint3)))

Let me know if I'm doing things wrong. I'll submit a PR for that in the next minutes.

Support bulk deletion

Would like IModelPersistence to implement

deleteAll(Set ids)

and

deleteAll(Set models)

connection reset not handled internally and the ability to reset the connection isn't exposed

When we switch which db is running as our master requests to the old one are given an exception response telling them to reset their connection (so they get the correct master). Currently jack doesn't handle this internally (which is fine) but there is also no exposed method for resetting the db connection if the application wants to handle this case. DatabaseConnection has a resetConnection() method that is all we need. Potentially it could be exposed via IDb.

Leaked Connections in TransactorImpl

In the TransactorImpl#query and TransactorImpl#execute methods, the connection.setAutoCommit(!asTransaction) call is not contained within the try block (see here and here).

When the DB is unreachable, the DB#setAutoCommit call will fail, but the DB connection object will never be released back to the DbPoolManager because the finally blocks (here and here) are not executed. This causes the DbPoolManager's underlying GenericObjectPool to leak these DB connection objects that will forever be considered "active" and prevent future getConnection calls from succeeding due to:

com.rapleaf.jack.exception.NoAvailableConnectionException: No available connection; please consider increasing wait time or total connections
    at com.rapleaf.jack.transaction.DbPoolManager.getConnection(DbPoolManager.java:103)
    at com.rapleaf.jack.transaction.TransactorImpl.query(TransactorImpl.java:76)
    at com.rapleaf.jack.transaction.TransactorImpl.query(TransactorImpl.java:45)

Typesafe fields

There should be a way to make an alternative to the _Fields enum which enables typesafe code. For example, we can have something like:

public static class _TypesafeFields extends JackFieldSet<Model> {
  public static final JackField<String> NAME = JackField.of(String.class);
  public static final JackField<Long> COUNT = JackField.of(Long.class);
}

which might be used in some kind of fancy type-safe query builder:

Set<Model> results = model.findBuilder()
    .where(Model._Fields.COUNT, 123L)
    .where(Model._Fields.NAME, "LiveRamp")
    .runQuery()

the where method might look something like:

public QueryBuilder where(JackField<T> field, T value) {
...
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.