Associated Blog : https://aws.amazon.com/blogs/machine-learning/use-amazon-sagemaker-feature-store-in-a-java-environment/
This example has been tested on a SageMaker Notebook Instance
git clone https://github.com/aws-samples/amazon-sagemaker-feature-store-in-java.git
- Install maven using the below commands in a new terminal.
cd /opt
sudo wget https://apache.osuosl.org/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz
sudo tar xzvf apache-maven-3.6.3-bin.tar.gz
export PATH=/opt/apache-maven-3.6.3/bin:$PATH
- Back in the cloned folder, run command
git fetch --all
git checkout main
cd Java
mvn compile; mvn exec:java -Dexec.mainClass="com.example.sage.FeatureStoreAPIExample"
openjdk 11.0.8-internal 2020-07-14
OpenJDK Runtime Environment (build 11.0.8-internal+0-adhoc..src)
OpenJDK 64-Bit Server VM (build 11.0.8-internal+0-adhoc..src, mixed mode)
Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: /opt/apache-maven-3.6.3
Java version: 11.0.8-internal, vendor: N/A, runtime: /home/ec2-user/anaconda3/envs/JupyterSystemEnv
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.14.214-118.339.amzn1.x86_64", arch: "amd64", family: "unix"
AWS SageMaker SDK for Java documentation (Doc)
AWS SageMaker Feature Store SDK for Python documentation (Doc)
- Read and infer csv data
- Make feature definitions
- Make feature records
- Delete featureGroups that collide with current featureGroup names
- Create new featureGroups with current definitions
- Ingest(put) to featureGroups using multi-threading
- List feature groups
- Describe featureGroups and get records
- Delete existing records
- Delete created featureGroups
- Check Offline Store (Optional - code included)
- CreateFeatureGroupReques (Doc)
- ListFeatureGroupsRequest (Doc)
- DescribeFeatureGroupRequest (Doc)
- DeleteFeatureGroupRequest (Doc)
- GetRecordRequest (Doc)
- PutRecordRequest (Doc)
- DeleteRecordRequest (Doc)
- CheckOfflineStorage
The FeatureStoreAPIExample example makes use of custom utility functions for Feature Group and record operations:
-
CsvIO class
- readCSVIntoList
-
FeatureGroupOperations class
- createFeatureGroups
- deleteFeatureGroups
- getAllFeatureGroups
- describeFeatureGroups
- getRecord
- deleteRecord
- runFeatureGroupGetTests
- deleteExistingFeatureGroups
-
FeatureGroupRecordOperations class
- getStringTimeStamp
- buildRecord
- makeRecordsList
- makeColumnDefinitions
- getDataType
-
Ingest class Extends Thread class
- getIngestMetrics
- ingestRecords
- putRecordsIntoFG
- getNumIngested
- deepCopy
- run
- batchIngest
-
PerfMetrics class
- percentile
- startTimer
- endTimer
- addInterval
- addMultiIntervals
- printMetrics
- getLatencies
- getTotalTime