GithubHelp home page GithubHelp logo

exasol / bucketfs-java Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 1.0 289 KB

Bucket FS client library in Java

License: MIT License

Java 99.84% Python 0.16%
bucketfs exasol-integration foundation-library java

bucketfs-java's People

Contributors

ckunki avatar dejanmihajlovic avatar jakobbraun avatar kaklakariada avatar morazow avatar pj-spoelders avatar redcatbear avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

rohankumardubey

bucketfs-java's Issues

Support TLS

Situation

BucketFS support TLS and in production using encryption is important. The library should support that feature.

Situation

  1. BucketFS Java supports TLS
  2. Handbook updated to explain preconditions (like where the certificates come from)

Fix code smells in 2.0.0

Situation

Version 2.0.0 showed 4 code smells in the SONAR report.

Acceptance Criteria

  • Zero code smells on SONAR.

Distinugish better between error causes

Situation

Method "getHttp()" wraps HTTP status codes into an I/O error. This is not only conceptually wrong, but also prevents proper distinction between error causes.

Acceptance Criteria

  • Download and upload error are distinguished
  • 404 is reported as "file not found"
  • 403 is reported as "access denied"

"null/default" in log message

Problem

I think this log message is weird:

INFO: Uploading "file postgresql.jar" to bucket "null/default": "http://localhost:32770/default/postgresql.jar"

Points to fix:

  1. "file postgresql.jar" - > file "postgresql.jar"
  2. null/default -> default

I'm not sure where the null comes from. I create an object like this and it works fine:

   final WriteEnabledBucket bucket = WriteEnabledBucket.builder()//
                .ipAddress("localhost") //
                .httpPort(32770) //
                .name("default") //
                .writePassword("password") //
                .build();

Creating new buckets

Problem

As a user, I'd like to be able to create new buckets via this library

Conditional upload

Uploading large files can be slow. To avoid that slowing down your tests, BFSJ should be able to check if the file already exists and compare checksums. It will then only upload the file if the checksums differ.

Fix CVE-2023-42503 in `org.apache.commons:commons-compress`

Error:  Failed to execute goal org.sonatype.ossindex.maven:ossindex-maven-plugin:3.2.0:audit (default-cli) on project bucketfs-java: Detected 1 vulnerable components:
Error:    org.apache.commons:commons-compress:jar:1.22:test; https://ossindex.sonatype.org/component/pkg:maven/org.apache.commons/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
Error:      * [CVE-2023-42503] CWE-20: Improper Input Validation (5.5); https://ossindex.sonatype.org/vulnerability/CVE-2023-42503?component-type=maven&component-name=org.apache.commons%2Fcommons-compress&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1

Add specification and design

Acceptance Criteria

  • User requirement specification exists
  • Design document exists
  • Requirement tracing into implementation and tests is green

replace copied classes by classes from exasol-testcontainers

Issue #44 required some interface changes.

bucketfs-java's integration tests are using exasol-testcontainers.
In order to test the changed interface bucketfs-java needed to replicate some of the classes of exasol-testcontainers with changed implementation.

After

  1. releasing bucketfs-java ✔️
  2. migrating exasol-testcontainers to bucketfs-java version 2.3.0 ✔️
  3. releasing a new version v2 of exasol-testcontainers ⚠️

... the replicated classes can be removed and bucketfs-java can use the original classes from dependency to exasol-testcontainers v2.

This applies to the following classes in bucketfs-java:

  • LogBasedBucketFsMonitor
  • TimestampState
  • TimestampRetriever
  • TimestampStateTest

Adapt to Exasol version 8

Build with Exasol 8 is currently failing.

The goal is to get tests running with Exasol 8.

Error message:

E-BFSJ-11: Unable to list contents of 'EXAClusterOS/' in bucket 'http://localhost:32770/default/': No such file or directory.

Prepare for release with RD 0.5.0

Situation

Release Droid 0.5.0 learned to post announcements on the community channel. For this to work the project repository needs a couple of modifications.

Acceptance Criteria

  1. Successfully released with 0.5.0
  2. Community draft exists

Sync interval includes upload

Currently the BUCKET_SYNC_TIMEOUT_IN_MILLISECONDS includes also the time the upload takes. For large files over slow internet connections this quickly timeouts. (current timeout is 60s)

Release new major version 3.0.0

Rationale: Breaking changes in 2.6.0 should clearly be indicated by update of major version

  • renamed class BucketFsSerivceConfigurationProvider to BucketFsServiceConfigurationProvider
  • renamed method ReadEnabledBucket.Builder.ipAddress() to host()

Additional actions:

  • Tag release 2.6.0 as "Pre-Release" in GitHub.

Reformat user guide

Currently, the IntelliJ IDEA code formatter breaks the code formatting in the Java listings in the user_guide. After this is fixed, we should reformat the file.

File overwrite error with Exasol 6.2.14

Situation

When running a matrix test against Exasol 6.2.14 the sync check fails with the following error:

2021-03-29 14:05:03.083 [WARNING] Leaving container running since reuse is enabled. Don't forget to stop and remove the container manually using docker rm -f CONTAINER_ID. 
[ERROR] Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 218.208 s <<< FAILURE! - in com.exasol.bucketfs.SyncAwareBucketIT
[ERROR] testReplaceFile{Path}  Time elapsed: 5.61 s  <<< FAILURE!
java.lang.AssertionError: 
Upload number 2: file B
Expected: "abcdeABCDE\n"
     but: was "0123456789\n"
	at com.exasol.bucketfs.SyncAwareBucketIT.testReplaceFile(SyncAwareBucketIT.java:156)

Acceptance Criteria

  • Overwrite test green with Exasol 6.2.14

List Files and Folders Hierarchically

When BucketFS contains multiple files with common prefix (aka "folder") the list returned by ReadEnabledBucket.parseContentListResponseBody() contains the common folder multiple times.

AC

  • the list returned by ReadEnabledBucket.parseContentListResponseBody() contains common folders only 1 time
  • folders are represented with a suffix to compensate for files and folders sharing the same name

Enable listing buckets

Currently list operation is implemented in class ReadEnabledBucket.
Listing Buckets in contrast to paths inside a bucket requires nearly the same implementation.
Actually the source needs to be refactored to move parts of the listing implementation to a separate class to enable reuse.

Extract BucketFS code from exasol-testcontainers

The BucketFS implementation is also required for other execution models for example a cluster on AWS.

Right now for such use cases we have to reimplement the uploading and checking code agian and again.

It would be very helpful to extract the BuckeFS code into a separate module and add a interface for accessing the log file.

One could go even further and extract a ExasolTestInterface with the public API of this packet and make it independent of the TestBackend (Testcontainers / local instance started via docker-compose / cluster on AWS / cluster on Azure / ...)

Fix vulnerabilities in org.apache.commons:commons-compress:jar:1.24.0:test

Error:  Failed to execute goal org.sonatype.ossindex.maven:ossindex-maven-plugin:3.2.0:audit (default-cli) on project bucketfs-java: Detected 1 vulnerable components:
Error:    org.apache.commons:commons-compress:jar:1.24.0:test; https://ossindex.sonatype.org/component/pkg:maven/org.apache.commons/[email protected]?utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
Error:      * [CVE-2024-25710] CWE-835: Loop with Unreachable Exit Condition ('Infinite Loop') (8.1); https://ossindex.sonatype.org/vulnerability/CVE-2024-25710?component-type=maven&component-name=org.apache.commons%2Fcommons-compress&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1
Error:      * [CVE-2024-26308] CWE-770: Allocation of Resources Without Limits or Throttling (7.5); https://ossindex.sonatype.org/vulnerability/CVE-2024-26308?component-type=maven&component-name=org.apache.commons%2Fcommons-compress&utm_source=ossindex-client&utm_medium=integration&utm_content=1.8.1

Remove duplicate classes from dependencies

In exasol/exasol-testcontainers#224 we found that this project contains dependencies that bring the same classes:

[INFO] --- duplicate-finder-maven-plugin:1.5.1:check (default) @ trino-exasol ---
[INFO] Checking compile classpath
[INFO] Checking runtime classpath
[INFO] Checking test classpath
[WARNING] Found duplicate (but equal) classes in [com.exasol:bucketfs-java:3.0.0, com.exasol:exasol-testcontainers:6.5.1]:
[WARNING]   com.exasol.config.BucketConfiguration
[WARNING] Found duplicate (but equal) classes in [jakarta.json:jakarta.json-api:2.1.1, org.glassfish:jakarta.json:2.0.1]:
[WARNING]   jakarta.json.JsonArray
[WARNING]   jakarta.json.JsonArrayBuilder
[WARNING]   jakarta.json.JsonBuilderFactory
[WARNING]   jakarta.json.JsonMergePatch
[WARNING]   jakarta.json.JsonNumber
[WARNING]   jakarta.json.JsonObject
[WARNING]   jakarta.json.JsonObjectBuilder
[WARNING]   jakarta.json.JsonPatch
[WARNING]   jakarta.json.JsonPatchBuilder
[WARNING]   jakarta.json.JsonPointer
[WARNING]   jakarta.json.JsonReader
[WARNING]   jakarta.json.JsonReaderFactory
[WARNING]   jakarta.json.JsonString
[WARNING]   jakarta.json.JsonStructure
[WARNING]   jakarta.json.JsonValue
[WARNING]   jakarta.json.JsonWriter
[WARNING]   jakarta.json.JsonWriterFactory
[WARNING]   jakarta.json.stream.JsonGenerator
[WARNING]   jakarta.json.stream.JsonGeneratorFactory
[WARNING]   jakarta.json.stream.JsonLocation
[WARNING]   jakarta.json.stream.JsonParserFactory
[WARNING] Found duplicate and different classes in [com.exasol:bucketfs-java:3.0.0, com.exasol:exasol-testcontainers:6.5.1]:
[WARNING]   com.exasol.config.BucketFsServiceConfiguration
[WARNING] Found duplicate and different classes in [jakarta.json:jakarta.json-api:2.1.1, org.glassfish:jakarta.json:2.0.1]:
[WARNING]   jakarta.json.EmptyArray
[WARNING]   jakarta.json.EmptyObject
[WARNING]   jakarta.json.Json
[WARNING]   jakarta.json.JsonException
[WARNING]   jakarta.json.JsonValueImpl
[WARNING]   jakarta.json.spi.JsonProvider
[WARNING]   jakarta.json.stream.JsonCollectors
[WARNING]   jakarta.json.stream.JsonGenerationException
[WARNING]   jakarta.json.stream.JsonParser
[WARNING]   jakarta.json.stream.JsonParsingException
[WARNING] Found duplicate classes/resources in test classpath.

SyncAwareBucket: improve strategy to verify file upload

Currently SyncAwareBucket validates upload of files by searching for time stamp in the log file e.g. /exa/logs/cored/bucketfsd.*.log considering the following constraints

  • adjust for potentially different time zone settings
  • repeated upload to the same file path, overwriting older versions of the file
  • wait until COS has synchronized the file to all nodes of the cluster
  • accuracy of log file timestamps is only 1 second.

This requires some prerequisites to be met

  • switch off log rotation to ensure to have only a single log file
  • ensure sufficient permissions to write and read log files

Sample error message in case SyncAwareBucket fails to verify successful upload of a file:

Timeout waiting for object 'replace_me.txt' to be synchronized 
in bucket 'bfsdefault/default' after 2022-08-11T06:52:22.312Z.

To handle repeated upload to the same file path, overwriting older versions of the file:

  • exasol-testcontainers maintains a list of files uploaded in the current session to identify repeated uploads to the same file path.
  • In this case SyncAwareBucket needs to identify the latest upload.
  • As accuracy of log file timestamps is only 1 second SyncAwareBucket in edge cases waits for a second.

The current ticket proposes to change SyncAwareBucket:

  • to not look for timestamps anymore
  • but rather to remember the line number in the log file.

This could simplify the detection and remove some constraints and potential mistakes

  • time zone settings
  • accuracy of log file timestamps

Wrong error message on failed download

Situation

If a download fails, the error message claims that an upload failed (c&p error).

com.exasol.bucketfs.BucketAccessException: Unable to upload file "/tmp/junit16962328828293423090" from  URI: http://127.0.0.1:32770/default/test.txt
   at com.exasol.bucketfs.ReadEnabledBucket.downloadFile(ReadEnabledBucket.java:131)

Acceptance Criteria

  • Error message correctly states that download failed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.