liveramp / jack Goto Github PK

View Code? Open in Web Editor NEW

69.0 125.0 61.0 6.33 MB

A set of scripts for generating fully functional Java database models from Ruby's ActiveRecord models and migrations.

Home Page: https://liveramp.github.io/jack/

License: Other

Ruby 6.78% Java 85.74% HTML 6.83% Shell 0.33% Lex 0.32%

orm activerecord-model

jack's Introduction

Jack

Do you use Ruby/Rails and Java in your company? We do. And we're sick and tired of maintaining two different sets of schemas, models, and whatnot!

To that end, we've created Jack (Java ACtive record (+K)). The project consists of:

a scheme for defining and managing multiple Rails projects that contribute models
a Ruby parser for schema.rb and ActiveRecord models that generates Java code
a Java library that provides model-oriented database access and database-wide query on top of standard database connections

Project Organization

A Jack project consists of two things:

The project definition file
One or more Rails project

Project Definition File

The project definition file is a YAML file that tells Jack where to find your Rails projects and how to generate code. Here's an annotated example:

# This is the class path where you want the top-level generated code to go.
databases_namespace: com.rapleaf.jack.test_project
# Here, we define each of the separate databases for which Jack should
# generate code. Each database is roughly equivalent to a Rails project.
databases:
  -
    # The root namespace for this database. It's nice, but not required, to
    # have this be a subpackage of the 'databases_namespace'.
    root_namespace: com.rapleaf.jack.test_project.database_1
    # What do you want to call this database? Leave out the "_production".
    db_name: Database1
    # The path to the schema.rb in your Rails project.
    schema_rb: database_1/db/schema.rb
    # The path to the app/models dir in your Rails project.
    models: database_1/app/models
    # Tables to be ignored. No code will be generated for them.
    ignored_tables:
      table_1
      table_2
  -
    root_namespace: com.rapleaf.jack.test_project.database_2
    db_name: Database2
    schema_rb: database_2/db/schema.rb
    models: database_2/app/models

Rails Projects

Jack supports generating code for an arbitrary number of inter-related Rails 3 projects. If you only have one Rails project, then things are easy - just configure your project.yml appropriately.

If you have more than one project, here's the setup we suggest. (We use this ourselves.)

/all_my_databases
  /project.yml
  /rails_project_1
  /rails_project_2
/ruby_project_that_uses_rails_project_2
  /include/rails_project_2              # <= svn external to /all_my_databases/rails_project_2

Running the Generator

Running the Jack generator is easy. From a fresh clone, do the following:

export PATH=$PATH:`pwd`/src/rb

Then, change directories to wherever your project.yml lives and run:

ruby jack.rb project.yml /path/for/generated/code

Assuming everything is configured correctly, that's it.

Note: We know that the path thing stinks. We're going to improve this in a future version.

Layout of the Generated Code

The Java code that Jack produces is designed around interfaces so that it is very modular and mockable.

Models

The generated models contain getters and setters for all the fields, as well as getters for detected associations. In contrast to ActiveRecord models, there are no CRUD methods on the Java models.

Model Persistences

This is where the CRUD methods live. The generic base class supports:

find
find all (with and without conditions)
finding by foreign key
delete by id or instance
delete all
save (update)
cache manipulators

while there is a unique, per-model interface and implementation that additionally provides:

create

Databases

You get one Database per database entry in your project.yml. Their main purpose is to provide a collection of all the individual model persistences. You can also execute queries across all models within each database.

All Databases

Finally, there is one overarching Databases class that serves as a collection for all of the databases configured in your project.yml. Generally, this is what you will instantiate, though you can subsequently get the Database or Persistence instance you actually care about and use that.

Download

You can find releases on The Central Repository and find snapshots on Sonatype OSSRH (OSS Repository Hosting).

To get snapshots, add the OSSRH snapshot repository. See the guide to using multiple repositories.

<repository>
  <id>ossrh-snapshots</id>
  <url>https://oss.sonatype.org/content/repositories/snapshots/</url>
  <releases>
    <enabled>false</enabled>
  </releases>
</repository>

License

Licensed under the Apache License, Version 2.0

http://www.apache.org/licenses/LICENSE-2.0

jack's People

Contributors

Stargazers

Watchers

jack's Issues

add test case for table_name directive

make "ant test" run the ruby tests and code generator

Aggregated model query incompatible with MySql 5.7.5

ModelQuery#getSelectClause adds the id column to its select clause (code):

sqlClause.append("id, ");

However, when querying with a group by clause, this is incompatible with the new ONLY_FULL_GROUP_BY mode introduced in MySql 5.7.5 (reference):

Reject queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on (uniquely determined by) GROUP BY columns.

As of MySQL 5.7.5, the default SQL mode includes ONLY_FULL_GROUP_BY. (Before 5.7.5, MySQL does not detect functional dependency and ONLY_FULL_GROUP_BY is not enabled by default.

Db connection is hung when there is a network change

A db connection will hang when there is a network change. This is because we don't enforce timeout at any level. For detailed explanation, please refer to this article.

There are several ways to fix it:

Specify socket timeout. This is easy to implement (modify connection string), but it is hard to find the optimal value, or to use special setting for legitimate long-running query.
Enforce timeout on transactor. This requires a refactoring of the transactor implementation (relevant PR: #210).
Call isValid before checking out a connection from the pool (modify this method). This approach will only partially solve this problem. Queries that are already running while there is a network change will still suffer from this issue. In addition, currently IDb methods are implemented in each implementation, while many of the methods are the same and can be extracted to a base class. Should we choose this approach, we should consider performing this refactoring as well.

Be sensitive to int/long for primary keys

Put use instructions in the README

fields in toString() should be tab or space separated

When printed there doesn't seem to be a separation between a field contents and the label of the subsequent field.

Package ruby script into a gem

create() should put the newly-created instance into the find cache

specs/tests for ruby code

It would be good to start unit testing/spec'ing the actual ruby generator classes. We could use much more precise examples to exercise weird edge cases.

add support for transactions

startTransaction() and commitTransaction() would be helpful functions at the database level

startTransaction could look something like:
dbConn.getConnection().setAutoCommit(false);
savepoint = dbConn.getConnection().setSavepoint();

and commitTransaction could look like:
dbConn.getConnection().commit();
dbConn.getConnection().setAutoCommit(true);

rollbackTransaction would be a little more involved

remove piotr's weird sprucedb hack

:through associations are created but do not work

an association such as
class Student
has_many :classes, :through => :student_class
end

creates the association to Class in the Student java class, however it's treated like a regular has_many (i.e. Class is expected to have Student's ids as a foreign key)

Warnings about tables not found, etc. should be more informative

The message "couldn't find a table named my_table" would be much more helpful if it also said where is the table being referenced from.

Keep generated test projects out of the distribution jar

Upgrade commons-pool2 to 2.4.3 to prevent error when creating multiple transactor instances

The EvictionTimer in commons-pools version 2.4.2 has the following cancel method:

static synchronized void cancel(TimerTask task) {
  task.cancel();
  _usageCount--;
  if (_usageCount == 0) {
    _timer.cancel();
    _timer = null;
  }
}

It does not wait for the timer#cancel call to finish. This will lead to concurrency issue when users try to create multiple transactors in a very short period of time. What happens is:

The GenericObjectPool constructor calls the startEvictor methods twice when we do need object eviction. The second time it is called, it will first cancel the initialized evictor and create a new one.

final void startEvictor(long delay) {
  synchronized (evictionLock) {
    if (null != evictor) {
      EvictionTimer.cancel(evictor);
      evictor = null;
      evictionIterator = null;
    }
    if (delay > 0) {
      evictor = new Evictor();
      EvictionTimer.schedule(evictor, delay, delay);
    }
  }
}

When multiple transactors are created, the first transactor construction creates the evictor, cancels it, and creates it again.
The second transactor construction will do the same. However, the cancellation of the timer started in the construction of the first transactor may not have completed. When that happens, and the EvicationTimer#schedule method is called, it will throw the following exception:

java.lang.IllegalStateException: Timer already cancelled.
        at java.util.Timer.sched(Timer.java:397)
        at java.util.Timer.schedule(Timer.java:248)
        at org.apache.commons.pool2.impl.EvictionTimer.schedule(EvictionTimer.java:76)
        at org.apache.commons.pool2.impl.BaseGenericObjectPool.startEvictor(BaseGenericObjectPool.java:695)
        at org.apache.commons.pool2.impl.BaseGenericObjectPool.setTimeBetweenEvictionRunsMillis(BaseGenericObjectPool.java:458)
        at org.apache.commons.pool2.impl.GenericObjectPool.setConfig(GenericObjectPool.java:317)
        at org.apache.commons.pool2.impl.GenericObjectPool.<init>(GenericObjectPool.java:117)
        at com.rapleaf.jack.transaction.DbPoolManager.<init>(DbPoolManager.java:73)

A relevant issue is reported in POOL-315 (this issue is based on updated EvictionTimer so the code is not exactly the same).

This is fixed in version 2.4.3 by that the EvictionTimer#cancel uses an executor service to run everything and users are allowed to specify a timeout value for the executor, so that the method will only return after the cancellation has completed. However, 2.4.3 has not been released yet.

For now, just don't create multiple transactors in very short period of time.

"updated_at" is not set if the current value in the db is null

Expected

When a model defines updated_at field of type Long, the value of updated_at should always be automatically set to the current time when the model is modified.

Observed

When the model is loaded, if the updated_at column is null in the database, it will be null in the model's field and it will not be set to the current time when the model is modified.

I think the issue is within AbstractDatabaseModel#updatedAtCanBeHandled1. When the value of the field is null, it is unable to determine the type of the field. A possible solution is to add a getFieldType method that returns the type's Class.

Possible Workaround

A workaround for this is to explicitly set the value of the updated_at field to 0L. This allows Jack to confirm updated_at is of type Long and it will overwrite the 0 with the current time.

factor TemplateProcessor#render_create_method out into a partial erb template

`GenericConstraint<T> or()` might have unexpected behavior for chained constraints

When writing something like:

constraint1.or(constraint2, constraint3)

The generated SQL statement is the following:

WHERE (constraint1 OR (constraint2) AND (constraint3))

I think we should expect this to be the following:

WHERE (constraint1 OR ((constraint2) AND (constraint3)))

Let me know if I'm doing things wrong. I'll submit a PR for that in the next minutes.

The table 'schema_info' should be excluded from the generated models

This table doesn't have an id, which prevents generation, and shouldn't be considered regardless.

boolean type fields should expose corresponding is methods, rather than get

put the migration number into the generated file headers

This will make it easy to tell when files are out of date.

Make main script print usage

Support bulk deletion

Would like IModelPersistence to implement

deleteAll(Set ids)

and

deleteAll(Set models)

implement reasonable toString() on generated models

support save! -like features

connection reset not handled internally and the ability to reset the connection isn't exposed

When we switch which db is running as our master requests to the old one are given an exception response telling them to reset their connection (so they get the correct master). Currently jack doesn't handle this internally (which is fine) but there is also no exposed method for resetting the db connection if the application wants to handle this case. DatabaseConnection has a resetConnection() method that is all we need. Potentially it could be exposed via IDb.

Generated create() methods should only use class version of primitives (Integer) when the fields allow nulls

Jack should have something akin to 'bundle install' to ensure all the gems it needs are installed

Modifications of objects is reflected in the cache, but not in the persistence

If an object is returned and is present in the cache, any changes to that object without a call to save are reflected in the cache, but not in the underlying persistence.

It should be decided what the desired behavior is.

replace (or augment) all the find() overloads with a query "builder" style interface

it would be something like myPersistence.orderBy(...).limit(...).findAll()

Support UPSERT

A common pattern is "Create if this doesn't exist, otherwise update". The conflict is determined as being on a duplicate key. Upsert is called:

INSERT ... ON CONFLICT UPDATE in PostgreSQL
INSERT ... ON DUPLICATE KEY UPDATE in MariaDB
INSERT ... ON CONFLICT DO UPDATE in Sqlite

Add a license to the project

Leaked Connections in TransactorImpl

In the TransactorImpl#query and TransactorImpl#execute methods, the connection.setAutoCommit(!asTransaction) call is not contained within the try block (see here and here).

When the DB is unreachable, the DB#setAutoCommit call will fail, but the DB connection object will never be released back to the DbPoolManager because the finally blocks (here and here) are not executed. This causes the DbPoolManager's underlying GenericObjectPool to leak these DB connection objects that will forever be considered "active" and prevent future getConnection calls from succeeding due to:

com.rapleaf.jack.exception.NoAvailableConnectionException: No available connection; please consider increasing wait time or total connections
    at com.rapleaf.jack.transaction.DbPoolManager.getConnection(DbPoolManager.java:103)
    at com.rapleaf.jack.transaction.TransactorImpl.query(TransactorImpl.java:76)
    at com.rapleaf.jack.transaction.TransactorImpl.query(TransactorImpl.java:45)

Model processing should not fail when corresponding table is not found

I think the model in question should just be ignored and cause a warning, but this should not cause generation to fail.

Typesafe fields

There should be a way to make an alternative to the _Fields enum which enables typesafe code. For example, we can have something like:

public static class _TypesafeFields extends JackFieldSet<Model> {
  public static final JackField<String> NAME = JackField.of(String.class);
  public static final JackField<Long> COUNT = JackField.of(Long.class);
}

which might be used in some kind of fancy type-safe query builder:

Set<Model> results = model.findBuilder()
    .where(Model._Fields.COUNT, 123L)
    .where(Model._Fields.NAME, "LiveRamp")
    .runQuery()

the where method might look something like:

public QueryBuilder where(JackField<T> field, T value) {
...
}