bardsoftware / papeeria-edit-history Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
CosmasGoogleCloudService.getVersion
is broken right now โ whenever it is called to retrieve a version of a file which is not current, it causes a com.google.cloud.storage.StorageException
with a message PreconditionFailed
.
If we log the GCS response, here's the JSON response it gives:
{
"code" : 412,
"errors" : [ {
"domain" : "global",
"location" : "If-Match",
"locationType" : "header",
"message" : "Precondition Failed",
"reason" : "conditionNotMet"
} ],
"message" : "Precondition Failed"
}
If we force-commit binary file (when deleting it), we convert its bytes to UTF8 string
Let's add another implementation of CosmasService which uses a real storage option for our file versions: Google Cloud Storage.
Google Cloud Storage is a binary object storage which supports object versioning. It mostly makes sense when application is running in Google Cloud Platform datacenter, but it can also be accessed from anywhere in the world. You can find good overview and tutorials in the docs
You are mostly interested in reading and writing objects and not interested in managing buckets. I'll create a test bucket for you and in production we'll use an already created bucket (so please pass the bucket name in the command line args).
For the first step we may ignore versioning at all and just write and read the latest version of the object, as if there is just one version.
You will also need a credentials file which I will send separately.
Also, please do not throw away in-memory implementation. We'll need both of them.
So we have now GCS storage, let's add versioning.
You can read about object versioning in GCE in the docs
I enabled versioning on the test bucket papeeria-interns-cosmas. Now every write will create a new version and versions with age > 1 day will be automatically deleted.
We need to consider version arguments in GetVersionRequest and add a new type of request/response which lists all available versions of the given object. Pay attention that version (aka generation) number in GCS is not a sequential ordinal number :)
So we now have nice service definition in cosmas.proto
. Let's create a basic runnable GRPC server which implements our Cosmas
service. Refer to the docs on grpc-java. For now we only need what is called a simple RPC in the docs.
Please write the server code in Kotlin. Although there is no special grpc-kotlin, Java and Kotlin are very interoperable languages.
User story: we want to show the first N file versions fast, and load the remaining on request.
Technical issue : we can't rely on any ordering of blobs returned by list
method. There is no guarantee that more recent versions come first or vice versa.
Possible solution: keep our own piece of information with version ordering. Perhaps we can piggyback on cemetery and fileidmapping objects.
As far as I understand, file id mapping is not persistent, which means that file move + Cosmas restart loses the whole history prior to move.
Production Papeeria service uses ACE editor. We're not yet changing production code, but we need demo UI which we could use for development of UI-related features.
As the very first step, let's create a simple HTML page with embedded ACE editor. Refer to ACE docs for examples of such page.
Please put all HTML, JS and CSS files into demo
folder.
Let's expand our Cosmas service and create a primitive in-memory implementation with the following methods:
createVersion
getVersion
Method createVersion
saves file text as a new record in the history. Every file has a string identifier (which is a long hexadecimal sequence) and identifier of the project where the file belongs to (project identifier is also a long hexadecimal sequence written as a string). Project id is not needed for file identification, but it may be used later for the history parameters (e.g. we may restrict some projects to short file histories and allow longer file histories for other projects).
Besides the identifiers, every createVersion
request naturally requires file contents which is a byte array in common case (because there may be binary files).
Method getVersion
will in the future accept version identifier in the request, but for this primitive implementation we'll be very dumb: just ignore the request and return the latest version. The response shall return byte array.
So we need these methods and request/response objects in the protocol buffers, their in-memory implementation and tests, of course.
We store subsequent file versions and list of patches between them. We assume that if we apply patches to version N then we get version N+1. However, we do not check if it is really true.
We have a couple of options:
I think we should proceed with option 2, because it reduces the amount of data exchanged between FE and Cosmas (we send only patches, not the actual file contents). At the same time, we need to apply the accumulated list of patches to version N in any scenario, be it validation or building of version N+1
We can run Cosmas with gradle run
, now let's take a step further and package it into docker container.
We need a Dockerfile which would package Cosmas distribution (server component in particular) into docker image based e.g. on openjdk:8-slim. Please notice that we don't want to run gradle
in the container, we want to run already compiled application. gradle distZip
will build a ZIP file with a ready to run app and will put it into build/distributions
directory.
We already have a lot of println
statements in the code. Let's replace them with slf4j+logback based logging.
Cosmas GRPC service currently provides methods to save new file version and get some specific version. We also want to record what happens between versions: who changes what at what time. For that purpose we need a method which takes this information and stores it in-memory or persistently.
Let's start with in-memory implementation. You have to add method in the service declaration in the proto file and implement it in the in-memory impl
User who makes the changes is identified by short string id.
File is identified by short string id a well.
The patch itself is a string, but it can be pretty large. For the purpose of this task the contents of that string doesn't matter
Modification timestamp is 64-bit integer.
As a starting point, let's create a basic GRPC service definition with just a single file with protocol buffers defining request, response and service itself.
Put the file into src/main/proto/cosmas.proto
Apparently, when we get a lot of delete file requests in a row (this may happen e.g. when usr deletes a directory), we send the same number of writes to the cemetery which makes GCS unhappy. We need to batch cemetery writes and/or use rate-limited way of sending write operations. In the former case we probably need some way to send a batch of delete requests to Cosmas. Currently we're sending delete requests to Cosmas one-by-one despite that they come as a batch in Papeeria.
06:32:48.204 [grpc-default-executor-871] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:32:48.311 [grpc-default-executor-866] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:32:48.345 [grpc-default-executor-889] [INFO] [CosmasGoogleCloudService] <<< forcedFileCommit
06:32:48.517 [grpc-default-executor-859] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:32:48.533 [grpc-default-executor-901] [INFO] [CosmasGoogleCloudService] <<< forcedFileCommit
06:32:48.820 [grpc-default-executor-874] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:32:49.217 [grpc-default-executor-869] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:32:49.353 [grpc-default-executor-876] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:32:49.962 [grpc-default-executor-865] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:32:50.296 [grpc-default-executor-860] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:32:50.307 [grpc-default-executor-860] [INFO] [CosmasGoogleCloudService] >>> forcedFileCommit [projectId=XXXXXX, fileId=YYYYYYY]
06:32:50.655 [grpc-default-executor-861] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:32:50.715 [grpc-default-executor-877] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:32:51.320 [grpc-default-executor-864] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:32:51.415 [grpc-default-executor-867] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:32:52.102 [grpc-default-executor-863] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:32:52.114 [grpc-default-executor-868] [INFO] [CosmasGoogleCloudService] <<< deleteFile
06:33:45.383 [grpc-default-executor-897] [ERROR] [CosmasGoogleCloudService] StorageException happened at Cosmas
com.google.cloud.storage.StorageException: The total number of changes to the object papeeria-eu-prod-cosmas-paid/XXXXXXXXXXXXXXXX-cemetery exceeds the rate limit. Please reduce t
he rate of create, update, and delete requests.
at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:220)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:291)
at com.google.cloud.storage.StorageImpl$3.call(StorageImpl.java:159)
at com.google.cloud.storage.StorageImpl$3.call(StorageImpl.java:156)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:89)
at com.google.cloud.RetryHelper.run(RetryHelper.java:74)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:51)
at com.google.cloud.storage.StorageImpl.internalCreate(StorageImpl.java:156)
at com.google.cloud.storage.StorageImpl.create(StorageImpl.java:137)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService$deleteFile$1.invoke(CosmasGoogleCloud
Service.kt:436)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService$deleteFile$1.invoke(CosmasGoogleCloud
Service.kt:65)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudServiceKt.logging(CosmasGoogleCloudService.kt
:51)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudServiceKt.logging$default(CosmasGoogleCloudSe
rvice.kt:41)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService.deleteFile(CosmasGoogleCloudService.k
t:416)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGrpc$MethodHandlers.invoke(CosmasGrpc.java:929)
We need some place for storing information about deleted files.
From user perspective, when he deleted a file he expects its history to be available for a while and expects som usr interface for accessing the history.
From Papeeria-FE perspective when file is gone, it is gone. There is no information about deleted files in the live Papeeria database.
So we may store a sort of "cemetery" in GCS bucket managed by Cosmas. It might be a single object per project with a list of records like
(file_id, file_name, removal_timestamp)
. When user deletes a file, we append a new record to the cemetery and write this object to GCS. This information can later be used for building user interface showing the history of deleted files.
So now we have some code which can safely delete patches and we need a convenient API for it.
When user rejects some patch in the browser client he just clicks a button on corresponding changes. After that we need to identify the patch which needs to be deleted and send request to Papeeria server and eventually to Cosmas.
We thus need some way of patch identification and GRPC method "deletePatch" which will accept the patch identifier, delete patch appropriately and will return the resulting text.
Perhaps a pair "file_id, timestamp" or a triple "project_id, file_id, timestamp" is a good patch identifier provided that all timestamps for the given file are different.However if they are not then we can build patch id from file generation number and patch position or from user_id and MD5 hash of patch text.
If launched with --help
Cosmas shows not really helpful message. I believe there must be a way to print a nice help text.
build/install/Cosmas/bin/cosmas-server --help
Exception in thread "main" com.xenomachina.argparser.ShowHelpException: Help was requested
at com.xenomachina.argparser.ArgParser$1.invoke(ArgParser.kt:608)
at com.xenomachina.argparser.ArgParser$1.invoke(ArgParser.kt:38)
at com.xenomachina.argparser.OptionDelegate.parseOption(OptionDelegate.kt:61)
at com.xenomachina.argparser.ArgParser.parseLongOpt(ArgParser.kt:567)
at com.xenomachina.argparser.ArgParser.access$parseLongOpt(ArgParser.kt:38)
at com.xenomachina.argparser.ArgParser$parseOptions$2.invoke(ArgParser.kt:493)
at com.xenomachina.argparser.ArgParser$parseOptions$2.invoke(ArgParser.kt:38)
at kotlin.SynchronizedLazyImpl.getValue(Lazy.kt:131)
at com.xenomachina.argparser.ArgParser.getParseOptions(ArgParser.kt)
at com.xenomachina.argparser.ArgParser.force(ArgParser.kt:448)
at com.xenomachina.argparser.DefaultKt$default$3.getValue(Default.kt:73)
at com.xenomachina.argparser.ArgParser$Delegate.getValue(ArgParser.kt:342)
at com.bardsoftware.papeeria.backend.cosmas.CosmasServerArgs.getPort(CosmasServer.kt)
at com.bardsoftware.papeeria.backend.cosmas.CosmasServerKt.main(CosmasServer.kt:55)
We have paid projects (those whose owner is a paying user) and free projects and we want file versions to have different lifetime for them, e.g. 24h for free projects and 30d for paid projects. The lifetime is configured per-bucket => we need to store file versions in different buckets for paid and free projects.
We had SSL issues with GCS and our requests failed like this:
11:03:18.955 [grpc-default-executor-627] [ERROR] [CosmasGoogleCloudService] Error while applying patches [fileId=0
d2974935531a8c1bc3965480efa5a46]
com.google.cloud.storage.StorageException: Remote host closed connection during handshake
at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:220)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.create(HttpStorageRpc.java:291)
at com.google.cloud.storage.StorageImpl$3.call(StorageImpl.java:159)
at com.google.cloud.storage.StorageImpl$3.call(StorageImpl.java:156)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:89)
at com.google.cloud.RetryHelper.run(RetryHelper.java:74)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:51)
at com.google.cloud.storage.StorageImpl.internalCreate(StorageImpl.java:156)
at com.google.cloud.storage.StorageImpl.create(StorageImpl.java:137)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService.commitFromMemoryToGCS(CosmasGoogleClo
udService.kt:231)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService.access$commitFromMemoryToGCS(CosmasGo
ogleCloudService.kt:65)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService$commitVersion$1.invoke(CosmasGoogleCl
oudService.kt:180)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService$commitVersion$1.invoke(CosmasGoogleCl
oudService.kt:65)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudServiceKt.logging(CosmasGoogleCloudService.kt
:51)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudServiceKt.logging$default(CosmasGoogleCloudSe
rvice.kt:41)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGoogleCloudService.commitVersion(CosmasGoogleCloudServic
e.kt:151)
at com.bardsoftware.papeeria.backend.cosmas.CosmasGrpc$MethodHandlers.invoke(CosmasGrpc.java:921)
At the same time CPU and disk usage graphs looked like this
Is it possible that retry policy was configured wrong and we eventually collected a big queue of requests which were all retrying (=> failing, logging, etc) ?
Here is a summary of process which we can support now:
Time t1 t2 t3 t4 t5 t6
Alice types "foo" "baz"
Bob types "bar" "qux"
Cosmas does +patch +patch +version +patch +patch +version
"foo" "bar" "foobar" "qux" "baz" "foobar quxbaz"
We have versions and we have a separate patch list. However, we need them to be synchronized, because we want to support the following user scenario: Alice takes version produced at time t3 and applies patches one-by-one.
Arguably the best way to get consistent version and patch list to the next version is to keep patch list and version together in the same structure.
So, the plan: replace file contents which is written to GCS on commit version request with file contents + accumulated patch list structure (which might be a protocol buffer serialized to bytes). Then we empty the patch lists for the committed files and start accumulating patches again.
We already use slf4j and logback for logging and we have logback.xml in our resources. What do we need to use them properly:
/var/log/cosmas/cosmas.log
and employs rolling policy<logger>
records (at least one for CosmasGoogleCloudService
) using that new appenderLOG.info("commitVersion")
. Together with MDC and pattern it will nicely look in the log file like this: commitVersion: projectId=123456 fileId=
and will be greppableWe want to implement the following scenario to support real-time collaboration of many users in the same project:
This way we 1) will not create new versions very frequently and 2) will save a set of possible related changes in different files in the same moment
This requires changing the behavior (and most likely the name) of CreateVersion request -- it should buffer the updates instead of writing them immediately and addign a new method CommitVersion which will actually write all buffered updates in the given project.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.