GithubHelp home page GithubHelp logo

papeeria-edit-history's People

Contributors

dbarashev avatar iisuslik43 avatar sashaorlova avatar xamgore avatar yanenkoa avatar

Watchers

 avatar  avatar  avatar

papeeria-edit-history's Issues

Fix `CosmasGoogleCloudService.getVersion`

CosmasGoogleCloudService.getVersion is broken right now โ€” whenever it is called to retrieve a version of a file which is not current, it causes a com.google.cloud.storage.StorageException with a message PreconditionFailed.

If we log the GCS response, here's the JSON response it gives:

{
  "code" : 412,
  "errors" : [ {
    "domain" : "global",
    "location" : "If-Match",
    "locationType" : "header",
    "message" : "Precondition Failed",
    "reason" : "conditionNotMet"
  } ],
  "message" : "Precondition Failed"
}

Use Google Cloud Storage for storing files

Let's add another implementation of CosmasService which uses a real storage option for our file versions: Google Cloud Storage.

Google Cloud Storage is a binary object storage which supports object versioning. It mostly makes sense when application is running in Google Cloud Platform datacenter, but it can also be accessed from anywhere in the world. You can find good overview and tutorials in the docs

You are mostly interested in reading and writing objects and not interested in managing buckets. I'll create a test bucket for you and in production we'll use an already created bucket (so please pass the bucket name in the command line args).

For the first step we may ignore versioning at all and just write and read the latest version of the object, as if there is just one version.

You will also need a credentials file which I will send separately.

Also, please do not throw away in-memory implementation. We'll need both of them.

Add object versioning support over GCS

So we have now GCS storage, let's add versioning.

You can read about object versioning in GCE in the docs

I enabled versioning on the test bucket papeeria-interns-cosmas. Now every write will create a new version and versions with age > 1 day will be automatically deleted.

We need to consider version arguments in GetVersionRequest and add a new type of request/response which lists all available versions of the given object. Pay attention that version (aka generation) number in GCS is not a sequential ordinal number :)

Create basic GRPC server

So we now have nice service definition in cosmas.proto. Let's create a basic runnable GRPC server which implements our Cosmas service. Refer to the docs on grpc-java. For now we only need what is called a simple RPC in the docs.

Please write the server code in Kotlin. Although there is no special grpc-kotlin, Java and Kotlin are very interoperable languages.

Maintain ordered list of file versions

User story: we want to show the first N file versions fast, and load the remaining on request.

Technical issue : we can't rely on any ordering of blobs returned by list method. There is no guarantee that more recent versions come first or vice versa.

Possible solution: keep our own piece of information with version ordering. Perhaps we can piggyback on cemetery and fileidmapping objects.

File id mapping is not persistent

As far as I understand, file id mapping is not persistent, which means that file move + Cosmas restart loses the whole history prior to move.

Create a demo page with ACE editor

Production Papeeria service uses ACE editor. We're not yet changing production code, but we need demo UI which we could use for development of UI-related features.

As the very first step, let's create a simple HTML page with embedded ACE editor. Refer to ACE docs for examples of such page.

Please put all HTML, JS and CSS files into demo folder.

Primitive in-memory implementation of file version storage

Let's expand our Cosmas service and create a primitive in-memory implementation with the following methods:

createVersion
getVersion

Method createVersion saves file text as a new record in the history. Every file has a string identifier (which is a long hexadecimal sequence) and identifier of the project where the file belongs to (project identifier is also a long hexadecimal sequence written as a string). Project id is not needed for file identification, but it may be used later for the history parameters (e.g. we may restrict some projects to short file histories and allow longer file histories for other projects).

Besides the identifiers, every createVersion request naturally requires file contents which is a byte array in common case (because there may be binary files).

Method getVersion will in the future accept version identifier in the request, but for this primitive implementation we'll be very dumb: just ignore the request and return the latest version. The response shall return byte array.

So we need these methods and request/response objects in the protocol buffers, their in-memory implementation and tests, of course.

Verify validity of patch list

We store subsequent file versions and list of patches between them. We assume that if we apply patches to version N then we get version N+1. However, we do not check if it is really true.

We have a couple of options:

  1. Validate the sequence of patches on e.g. commit requests and check that version N + patches = version N+1
  2. Send only patches from Papeeria FE to Cosmas and re-construct version N+1 from version N+accumulated patches on commit request.

I think we should proceed with option 2, because it reduces the amount of data exchanged between FE and Cosmas (we send only patches, not the actual file contents). At the same time, we need to apply the accumulated list of patches to version N in any scenario, be it validation or building of version N+1

Run Cosmas inside Docker container

We can run Cosmas with gradle run, now let's take a step further and package it into docker container.

We need a Dockerfile which would package Cosmas distribution (server component in particular) into docker image based e.g. on openjdk:8-slim. Please notice that we don't want to run gradle in the container, we want to run already compiled application. gradle distZip will build a ZIP file with a ready to run app and will put it into build/distributions directory.

Enhance Cosmas service with a method to send edit patches

Cosmas GRPC service currently provides methods to save new file version and get some specific version. We also want to record what happens between versions: who changes what at what time. For that purpose we need a method which takes this information and stores it in-memory or persistently.

Let's start with in-memory implementation. You have to add method in the service declaration in the proto file and implement it in the in-memory impl

User who makes the changes is identified by short string id.
File is identified by short string id a well.
The patch itself is a string, but it can be pretty large. For the purpose of this task the contents of that string doesn't matter
Modification timestamp is 64-bit integer.

Write rate limit exceeded when deleting many files

Apparently, when we get a lot of delete file requests in a row (this may happen e.g. when usr deletes a directory), we send the same number of writes to the cemetery which makes GCS unhappy. We need to batch cemetery writes and/or use rate-limited way of sending write operations. In the former case we probably need some way to send a batch of delete requests to Cosmas. Currently we're sending delete requests to Cosmas one-by-one despite that they come as a batch in Papeeria.

06:32:48.204 [grpc-default-executor-871] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:32:48.311 [grpc-default-executor-866] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:32:48.345 [grpc-default-executor-889] [INFO] [CosmasGoogleCloudService] <<< forcedFileCommit 
06:32:48.517 [grpc-default-executor-859] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:32:48.533 [grpc-default-executor-901] [INFO] [CosmasGoogleCloudService] <<< forcedFileCommit 
06:32:48.820 [grpc-default-executor-874] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:32:49.217 [grpc-default-executor-869] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:32:49.353 [grpc-default-executor-876] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:32:49.962 [grpc-default-executor-865] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:32:50.296 [grpc-default-executor-860] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:32:50.307 [grpc-default-executor-860] [INFO] [CosmasGoogleCloudService] >>> forcedFileCommit [projectId=XXXXXX, fileId=YYYYYYY]
06:32:50.655 [grpc-default-executor-861] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:32:50.715 [grpc-default-executor-877] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:32:51.320 [grpc-default-executor-864] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:32:51.415 [grpc-default-executor-867] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:32:52.102 [grpc-default-executor-863] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:32:52.114 [grpc-default-executor-868] [INFO] [CosmasGoogleCloudService] <<< deleteFile 
06:33:45.383 [grpc-default-executor-897] [ERROR] [CosmasGoogleCloudService] StorageException happened at Cosmas 
com.google.cloud.storage.StorageException: The total number of changes to the object papeeria-eu-prod-cosmas-paid/XXXXXXXXXXXXXXXX-cemetery exceeds the rate limit. Please reduce t
he rate of create, update, and delete requests.
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:220)
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:291)
        at com.google.cloud.storage.StorageImpl$3.call(StorageImpl.java:159)
        at com.google.cloud.storage.StorageImpl$3.call(StorageImpl.java:156)
        at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:89)
        at com.google.cloud.RetryHelper.run(RetryHelper.java:74)
        at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:51)
        at com.google.cloud.storage.StorageImpl.internalCreate(StorageImpl.java:156)
        at com.google.cloud.storage.StorageImpl.create(StorageImpl.java:137)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService$deleteFile$1.invoke(CosmasGoogleCloud
Service.kt:436)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService$deleteFile$1.invoke(CosmasGoogleCloud
Service.kt:65)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudServiceKt.logging(CosmasGoogleCloudService.kt
:51)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudServiceKt.logging$default(CosmasGoogleCloudSe
rvice.kt:41)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService.deleteFile(CosmasGoogleCloudService.k
t:416)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGrpc$MethodHandlers.invoke(CosmasGrpc.java:929)

Deleted files cemetery

We need some place for storing information about deleted files.

From user perspective, when he deleted a file he expects its history to be available for a while and expects som usr interface for accessing the history.

From Papeeria-FE perspective when file is gone, it is gone. There is no information about deleted files in the live Papeeria database.

So we may store a sort of "cemetery" in GCS bucket managed by Cosmas. It might be a single object per project with a list of records like
(file_id, file_name, removal_timestamp). When user deletes a file, we append a new record to the cemetery and write this object to GCS. This information can later be used for building user interface showing the history of deleted files.

Add patch id and GRPC method to delete patch

So now we have some code which can safely delete patches and we need a convenient API for it.

When user rejects some patch in the browser client he just clicks a button on corresponding changes. After that we need to identify the patch which needs to be deleted and send request to Papeeria server and eventually to Cosmas.

We thus need some way of patch identification and GRPC method "deletePatch" which will accept the patch identifier, delete patch appropriately and will return the resulting text.

Perhaps a pair "file_id, timestamp" or a triple "project_id, file_id, timestamp" is a good patch identifier provided that all timestamps for the given file are different.However if they are not then we can build patch id from file generation number and patch position or from user_id and MD5 hash of patch text.

Make --help working

If launched with --help Cosmas shows not really helpful message. I believe there must be a way to print a nice help text.

build/install/Cosmas/bin/cosmas-server --help
Exception in thread "main" com.xenomachina.argparser.ShowHelpException: Help was requested
	at com.xenomachina.argparser.ArgParser$1.invoke(ArgParser.kt:608)
	at com.xenomachina.argparser.ArgParser$1.invoke(ArgParser.kt:38)
	at com.xenomachina.argparser.OptionDelegate.parseOption(OptionDelegate.kt:61)
	at com.xenomachina.argparser.ArgParser.parseLongOpt(ArgParser.kt:567)
	at com.xenomachina.argparser.ArgParser.access$parseLongOpt(ArgParser.kt:38)
	at com.xenomachina.argparser.ArgParser$parseOptions$2.invoke(ArgParser.kt:493)
	at com.xenomachina.argparser.ArgParser$parseOptions$2.invoke(ArgParser.kt:38)
	at kotlin.SynchronizedLazyImpl.getValue(Lazy.kt:131)
	at com.xenomachina.argparser.ArgParser.getParseOptions(ArgParser.kt)
	at com.xenomachina.argparser.ArgParser.force(ArgParser.kt:448)
	at com.xenomachina.argparser.DefaultKt$default$3.getValue(Default.kt:73)
	at com.xenomachina.argparser.ArgParser$Delegate.getValue(ArgParser.kt:342)
	at com.bardsoftware.papeeria.backend.cosmas.CosmasServerArgs.getPort(CosmasServer.kt)
	at com.bardsoftware.papeeria.backend.cosmas.CosmasServerKt.main(CosmasServer.kt:55)

Separate file history buckets for paid and free projects

We have paid projects (those whose owner is a paying user) and free projects and we want file versions to have different lifetime for them, e.g. 24h for free projects and 30d for paid projects. The lifetime is configured per-bucket => we need to store file versions in different buckets for paid and free projects.

High CPU usage when underlying GCS starts failing

We had SSL issues with GCS and our requests failed like this:

11:03:18.955 [grpc-default-executor-627] [ERROR] [CosmasGoogleCloudService] Error while applying patches [fileId=0
d2974935531a8c1bc3965480efa5a46]
com.google.cloud.storage.StorageException: Remote host closed connection during handshake
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:220)
        at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:291)
        at com.google.cloud.storage.StorageImpl$3.call(StorageImpl.java:159)
        at com.google.cloud.storage.StorageImpl$3.call(StorageImpl.java:156)
        at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:89)
        at com.google.cloud.RetryHelper.run(RetryHelper.java:74)
        at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:51)
        at com.google.cloud.storage.StorageImpl.internalCreate(StorageImpl.java:156)
        at com.google.cloud.storage.StorageImpl.create(StorageImpl.java:137)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService.commitFromMemoryToGCS(CosmasGoogleClo
udService.kt:231)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService.access$commitFromMemoryToGCS(CosmasGo
ogleCloudService.kt:65)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService$commitVersion$1.invoke(CosmasGoogleCl
oudService.kt:180)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService$commitVersion$1.invoke(CosmasGoogleCl
oudService.kt:65)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudServiceKt.logging(CosmasGoogleCloudService.kt
:51)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudServiceKt.logging$default(CosmasGoogleCloudSe
rvice.kt:41)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService.commitVersion(CosmasGoogleCloudServic
e.kt:151)
        at com.bardsoftware.papeeria.backend.cosmas.CosmasGrpc$MethodHandlers.invoke(CosmasGrpc.java:921)

At the same time CPU and disk usage graphs looked like this

image

image

Is it possible that retry policy was configured wrong and we eventually collected a big queue of requests which were all retrying (=> failing, logging, etc) ?

Add patch list to the stored file content versions

Here is a summary of process which we can support now:

Time          t1      t2      t3        t4     t5      t6 
Alice types   "foo"                            "baz"
Bob   types           "bar"             "qux"      
Cosmas does   +patch  +patch  +version  +patch +patch  +version
              "foo"   "bar"   "foobar"  "qux"  "baz"   "foobar quxbaz"

We have versions and we have a separate patch list. However, we need them to be synchronized, because we want to support the following user scenario: Alice takes version produced at time t3 and applies patches one-by-one.

Arguably the best way to get consistent version and patch list to the next version is to keep patch list and version together in the same structure.

So, the plan: replace file contents which is written to GCS on commit version request with file contents + accumulated patch list structure (which might be a protocol buffer serialized to bytes). Then we empty the patch lists for the committed files and start accumulating patches again.

Logging configs

We already use slf4j and logback for logging and we have logback.xml in our resources. What do we need to use them properly:

  • create appender writing everything to file /var/log/cosmas/cosmas.log and employs rolling policy
  • create reasonable <logger> records (at least one for CosmasGoogleCloudService) using that new appender
  • update and standardize logged information. We can use Mapped Diagnostic Context for passing project/file identifiers to the logger and format these key/value pairs using pattern string in the config. I also suggest making "entry" messages (Get request for commit last version of files...) less wordy. It is okay to just print the method name: LOG.info("commitVersion"). Together with MDC and pattern it will nicely look in the log file like this: commitVersion: projectId=123456 fileId= and will be greppable

Buffered "CreateVersion" for a set of files in a project

We want to implement the following scenario to support real-time collaboration of many users in the same project:

  1. Client (read "frontend server") sends file contents to Cosmas without creating a new file version at the moment of sending.
  2. Cosmas saves or updates the actual contents of the file in memory
  3. At some moment -- either when user hits "Compile" or just once per few minutes -- Cosmas commits all in-memory updates as new versions of corresponding files.

This way we 1) will not create new versions very frequently and 2) will save a set of possible related changes in different files in the same moment

This requires changing the behavior (and most likely the name) of CreateVersion request -- it should buffer the updates instead of writing them immediately and addign a new method CommitVersion which will actually write all buffered updates in the given project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.