GithubHelp home page GithubHelp logo

aws-solutions / aws-data-lake-solution Goto Github PK

View Code? Open in Web Editor NEW
384.0 384.0 160.0 2.55 MB

A deployable reference implementation intended to address pain points around conceptualizing data lake architectures that automatically configures the core AWS services necessary to easily tag, search, share, and govern specific subsets of data across a business or with other external businesses.

Home Page: https://aws.amazon.com/solutions/implementations/data-lake-solution/

License: Apache License 2.0

JavaScript 90.00% Shell 1.52% HTML 7.80% CSS 0.68%

aws-data-lake-solution's People

Contributors

beomseoklee avatar celijose avatar georgebearden avatar shsenior avatar tomnight avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-data-lake-solution's Issues

deprecated messages when running the test run-unit-tests.sh

./run-unit-tests.sh

[Init] Clean old dist and node_modules folders

find /home/ec2-user/environment/aws-data-lake-solution/deployment/../source -iname node_modules -type d -exec rm -r {} ; 2> /dev/null
find /home/ec2-user/environment/aws-data-lake-solution/deployment/../source -iname dist -type d -exec rm -r {} ; 2> /dev/null
find ../ -type f -name 'package-lock.json' -delete

[Test] Helper

npm WARN deprecated [email protected]: CoffeeScript on NPM has moved to "coffeescript" (no hyphen)
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: This package is unmaintained. Use @sinonjs/formatio instead
npm WARN deprecated [email protected]: This package has been deprecated in favour of @sinonjs/samsam
npm WARN deprecated [email protected]: no longer maintained
npm WARN deprecated [email protected]: please upgrade to graceful-fs 4 for compatibility with current and future versions of Node.js
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: connect 2.x series is deprecated
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
npm WARN deprecated [email protected]: Use uuid module instead
npm WARN deprecated [email protected]: This module moved to @hapi/hawk. Please make sure to switch over as this distribution is no longer supported and may contain bugs and critical security issues.
npm WARN deprecated [email protected]: This version has been deprecated in accordance with the hapi support policy (hapi.im/support). Please upgrade to the latest version to get the best features, bug fixes, and security patches. If you are unable to upgrade at this time, paid support is available for older versions (hapi.im/commercial).
npm WARN deprecated [email protected]: This version has been deprecated in accordance with the hapi support policy (hapi.im/support). Please upgrade to the latest version to get the best features, bug fixes, and security patches. If you are unable to upgrade at this time, paid support is available for older versions (hapi.im/commercial).
npm WARN deprecated [email protected]: This module moved to @hapi/sntp. Please make sure to switch over as this distribution is no longer supported and may contain bugs and critical security issues.
npm WARN deprecated [email protected]: This version has been deprecated in accordance with the hapi support policy (hapi.im/support). Please upgrade to the latest version to get the best features, bug fixes, and security patches. If you are unable to upgrade at this time, paid support is available for older versions (hapi.im/commercial).
npm notice created a lockfile as package-lock.json. You should commit this file.
npm WARN [email protected] requires a peer of elasticsearch@^13.2.0 but none is installed. You must install peer dependencies yourself.
npm WARN [email protected] requires a peer of sinon@>=4.0.0 <8.0.0 but none is installed. You must install peer dependencies yourself.
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: [email protected] (node_modules/fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for [email protected]: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"})

added 1018 packages from 1662 contributors and audited 4976 packages in 28.993s
found 93 vulnerabilities (17 low, 55 moderate, 21 high)
run npm audit fix to fix them, or npm audit for details

[email protected] test /home/ec2-user/environment/aws-data-lake-solution/source/resources/helper
mocha lib/*.spec.js

"Delete a package" documentation gives conflicting info

http://docs.awssolutionsbuilder.com/data-lake/user-guide/working-with-packages/#delete-a-package for version 2.0 has conflicting information.

The text says:
"Deleting a package will remove it from the data lake, it will not delete any files from Amazon S3."
The screenshot shows:
"Deleting this package will remove this entry from the data lake and delete the dataset files from Amazon S3.

I believe the text reflects earlier Data Lake Solution behavior, and should be corrected ("it will not delete" > "and will delete"), as deleting a package does indeed now remove the files from S3 as well.

API calls suddenly unauthorized.

  1. deployed Data Lake 2.1 with myself as Data Lake Admin
  2. Generate API Access key for myself using the Web Console
  3. Generate the API Secret key for myself using the Console
  4. deployed a custom lambda python function that makes call a rest call using the requests package to Data Lake API Gateway using the above keys ( yes, I calculate the AWS Version 4 signature ). This has worked in the past. My access key on the Web Console matches my access key in the dynamodb table 'data-lake-keys'. my secret key generate in the Web Console does NOT match my secret key listed on my Cognito user. I did see a similar post on non-adfs deployments that says there is something missing between the api and the authorizer function - but that was back in 2018.

Blank Manifest files generated from Cart

We have several data packages within our Data Lake -- a mix of data packages created with manifest files that point to individual files in S3:

{
    "dataStore": [
        {
            "includePath": "s3://my-bucket/test/test.csv"
        },
        {
            "includePath": "s3://my-bucket/test/test2.csv"
        },
        {
            "includePath": "s3://my-bucket/test/test3.csv"
        }
    ]
}

as well as manifests that just have include paths to a "subfolder" which contain files.

{
    "dataStore": [
        {
            "includePath": "s3://my-bucket/test/"
        },

    ]
}

In both cases, after the Glue Crawlers successfully runs, we see the individual files listed as Tables in the 'Integrations' tab for the Data Package for packages created with manifest that list out each individual files. For data packages created with manifest files that point to just a "subfolder" within the bucket that contain multiple files - a single table appears in the Integrations tab. Exploring this table via the Glue link or the Athena query view, suggest its consolidate the records across the three files into a single table - even if some of the files share only some common fields in their schema but are not completely identical. Is this expected?

However, our real question/issue is when adding these two Data Packages to our cart and Generating S3 Signed URL manifests - what we are getting are essentially blank manifests, with only the following content:

{"entries":[]}

fs.writeFile throws ERR_INVALID_CALLBACK error

Versions:
NodeJS version: v12.12.0
OS: Mac OS X 10.14.6

The following error occurs while building the solutions with the command
./build-s3-dist.sh $DEPLOY_BUCKET $VERSION_CODE

Detail error message:

fs.js:135
  throw new ERR_INVALID_CALLBACK(cb);
  ^

TypeError [ERR_INVALID_CALLBACK]: Callback must be a function. Received undefined
    at maybeCallback (fs.js:135:9)
    at Object.writeFile (fs.js:1234:14)
    at Object.<anonymous> (/Users/-/Documents/workspace/aws-data-lake-solution/deployment/manifest-generator/app.js:70:4)
    at Module._compile (internal/modules/cjs/loader.js:956:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:973:10)
    at Module.load (internal/modules/cjs/loader.js:812:32)
    at Function.Module._load (internal/modules/cjs/loader.js:724:14)
    at Function.Module.runMain (internal/modules/cjs/loader.js:1025:10)
    at internal/main/run_main_module.js:17:11 {
  code: 'ERR_INVALID_CALLBACK'
}

Fix:

In file aws-data-lake-solution/deployment/manifest-generator/app.js Line 70 should be:
fs.writeFileSync('../dist/data-lake-site-manifest.json', JSON.stringify(_manifest, null, 4));
instead of:
fs.writeFile('../dist/data-lake-site-manifest.json', JSON.stringify(_manifest, null, 4));

Okta federation instructions forgets to update lambda variables

The instructions for federation via Okta (Appendix B in deployment guide) updates app-variables.js variable FEDERATED_LOGIN, but does not update the lambda environment variable FEDERATED_LOGIN, which causes the wrong group source (cognito instead of custom:groups) to be used in access-validator, which causes all kinds of UI group related weirdness.

A quick fix could be to modify the cloudformation data-lake-deploy.template to pass through FederatedLogin: true to the DataLakeServicesStack only, and re-run a change-set.

aws data lake API query reference

Hello

just like the "CLI command reference" located here is it possible to have a description and an example for each API?

maybe you have already created this documentation?

READ.ME Wrong Instruction - point 7

The command on point 7 should be:
aws s3 cp ./dist s3://$DEPLOY_BUCKET/data-lake/$VERSION_CODE --recursive --acl bucket-owner-full-control
Instead Of :
aws s3 cp ./dist s3://$DEPLOY_BUCKET/data-lake/latest --recursive --acl bucket-owner-full-control

Create search index issue

@shsenior thank you very much for updating the source code.

Following the new update, we tried to deploy the data lake using our own S3 bucket as the artifact repo.

Currently, we are getting some strange errors when initializing elastic search index for the first time. Any suggestions would be much appreciated.

Here is the section where cloudformation template failed:

DataLakeSearchIndex: DependsOn: "DataLakeWebsite" Type: "Custom::LoadLambda" Properties: ServiceToken: Fn::GetAtt: - "DataLakeHelper" - "Arn" Region: - Ref: "AWS::Region" clusterUrl: !Join ["", ["https://", !GetAtt DataLakeStorageStack.Outputs.EsCluster, ":80" ]] searchIndex: "data-lake" customAction: "createSearchIndex"

And the output log we get from Cloud Watch is:

`
REPORT RequestId: ccad318c-6d16-11e7-840a-0bac46c1d377 Duration: 14223.29 ms Billed Duration: 14300 ms Memory Size: 256 MB Max Memory Used: 42 MB
START RequestId: d94fe778-6d16-11e7-866b-db0445fd4b93 Version: $LATEST
2017-07-20T06:44:16.242Z d94fe778-6d16-11e7-866b-db0445fd4b93 Received event:
{
"RequestType": "Create",
"ServiceToken": "arn:aws:lambda:us-east-1:213684302576:function:data-lake-helper",
"ResponseURL": "https://cloudformation-custom-resource-response-useast1.s3.amazonaws.com/arn%3Aaws%3Acloudformation%3Aus-east-1%3A213684302576%3Astack/CGR-Data-Store/30967630-6d14-11e7-aead-500c286f3262%7CDataLakeSearchIndex%7Cde56a193-5075-4286-8e73-a93bb70d5247?AWSAccessKeyId=AKIAJNXHFR7P7YGKLDPQ&Expires=1500540255&Signature=oupbZp2ws4Vl8zDdnudXtzJxinc%3D",
"StackId": "arn:aws:cloudformation:us-east-1:213684302576:stack/CGR-Data-Store/30967630-6d14-11e7-aead-500c286f3262",
"RequestId": "de56a193-5075-4286-8e73-a93bb70d5247",
"LogicalResourceId": "DataLakeSearchIndex",
"ResourceType": "Custom::LoadLambda",
"ResourceProperties": {
"ServiceToken": "arn:aws:lambda:us-east-1:213684302576:function:data-lake-helper",
"customAction": "createSearchIndex",
"clusterUrl": "https://search-data-lake-yp762xsz7qjxfnoe4ddm36pnu4.us-east-1.es.amazonaws.com:80",
"Region": [
"us-east-1"
],
"searchIndex": "data-lake"
}
}

Elasticsearch ERROR: 2017-07-20T06:44:17Z
Error: Request error, retrying
PUT https://search-data-lake-yp762xsz7qjxfnoe4ddm36pnu4.us-east-1.es.amazonaws.com:80/data-lake => write EPROTO 140676374841152:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:../deps/openssl/openssl/ssl/s23_clnt.c:794:

at Log.error (/var/task/node_modules/elasticsearch/src/lib/log.js:225:56)
at checkRespForFailure (/var/task/node_modules/elasticsearch/src/lib/transport.js:258:18)
at HttpAmazonESConnector. (/var/task/node_modules/http-aws-es/node6.js:77:11)
at ClientRequest.bound (/var/task/node_modules/elasticsearch/node_modules/lodash/dist/lodash.js:729:21)
at ClientRequest. (/var/task/node_modules/aws-sdk/lib/http/node.js:89:19)
at emitOne (events.js:101:20)
at ClientRequest.emit (events.js:188:7)
at TLSSocket.socketErrorListener (_http_client.js:309:9)
at emitOne (events.js:96:13)
at TLSSocket.emit (events.js:188:7)

Elasticsearch ERROR: 2017-07-20T06:44:17Z
Error: Request error, retrying
PUT https://search-data-lake-yp762xsz7qjxfnoe4ddm36pnu4.us-east-1.es.amazonaws.com:80/data-lake => write EPROTO 140676374841152:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:../deps/openssl/openssl/ssl/s23_clnt.c:794:

at Log.error (/var/task/node_modules/elasticsearch/src/lib/log.js:225:56)
at checkRespForFailure (/var/task/node_modules/elasticsearch/src/lib/transport.js:258:18)
at HttpAmazonESConnector. (/var/task/node_modules/http-aws-es/node6.js:77:11)
at ClientRequest.bound (/var/task/node_modules/elasticsearch/node_modules/lodash/dist/lodash.js:729:21)
at emitOne (events.js:101:20)
at ClientRequest.emit (events.js:188:7)
at TLSSocket.socketErrorListener (_http_client.js:309:9)
at emitOne (events.js:96:13)
at TLSSocket.emit (events.js:188:7)
at onwriteError (_stream_writable.js:346:10)

Elasticsearch WARNING: 2017-07-20T06:44:17Z
Unable to revive connection: https://search-data-lake-yp762xsz7qjxfnoe4ddm36pnu4.us-east-1.es.amazonaws.com:80/

Elasticsearch WARNING: 2017-07-20T06:44:17Z
Unable to revive connection: https://search-data-lake-yp762xsz7qjxfnoe4ddm36pnu4.us-east-1.es.amazonaws.com:80/
`

package ids beginning with dash cant be deleted properly

I've run into an issue where package IDs starting with '-' can't be fully deleted. Elasticsearch barfs which results in the package being deleted, but still searchable in a list.

from version 2.1.0

2019-10-19T10:44:25.506Z	678bd893-cc8c-49dd-b4c1-b6de3c98da68	{ body: { package_id: '-n0RdiWKK' },
resource: '/search/index',
httpMethod: 'DELETE',
headers: ......

path: '/data-lake/_search',
query: { q: 'package_id:-n0RdiWKK' },

displayName: 'BadRequest',
message: '[parse_exception] parse_exception: Encountered " "-" "- "" at line 1, column 11.\nWas expecting one of

I assume this could be fixed by specifying a more limited 64-char set for shortid (in content-package.js:125 ) with shortid.characters(....).. or maybe by escapting/quoting the package_id when building the elasticsearch query (if that's feasible).

If you bulk-delete via the CLI, you'll be left only with packages beginning with '-', so I assume a dash in the middle still works OK.

Error during deployment: create failed DataLakeKibanaCognito

While deploying to the Frankfurt region, I got an error deploying. I just ran the default cloudformation template.

While deploying, it gave an error deploying the resource with logical id DataLakeKibanaCognito.

the cloudformation console showed:
Failed to create resource. https://console.aws.amazon.com/cloudwatch/home?region=eu-central-1#logEventViewer:group=/aws/lambda/data-lake-helper;stream=2020/02/11/[$LATEST]29339c5ea4a94ec58a8afb15c81f1c0f

Any ideas?

Thanks!
Nick.

Installing CLI on Windows

I have used the CloudFormation template to generate the aws-data-lake-solution in my AWS account.

I would like to import several files / folders simultaneously into a package in the Data Lake Console (cloudfront) from my computer. I cannot do this using the Console and I have thought of the CLI. All the computers in our network are Windows. There are no instructions as to how to install the CLI for Windows. Any help ?

Bug report: Unable to create "S3LoggingBucket" due to InvalidBucketAclWithObjectOwnership

Hello,

I'd like to report a bug in the Data Lake Solution v2.2. This can be found on the Service Catalog 'Getting started library', 'Data Lake on AWS'.

The CloudFormation stack fails to create the S3 bucket "S3LoggingBucket", with the following error:

Bucket cannot have ACLs set with ObjectOwnership's BucketOwnerEnforced setting (Service: Amazon S3; Status Code: 400; Error Code: InvalidBucketAclWithObjectOwnership).
Following is the code, which can also be found here - https://github.com/aws-solutions/aws-data-lake-solution/blob/main/deployment/data-lake-deploy.template#L471


    S3LoggingBucket:
        DeletionPolicy: Retain
        Type: AWS::S3::Bucket
        Metadata:
            cfn_nag:
                rules_to_suppress:
                  - id: W35
                    reason: "This S3 bucket is used as the destination for storing access logs"
                  - id: W51
                    reason: "The bucket is not public. When using the CF template in PROD, create a bucket policy to allow only administrators/ auditors access to the bucket"
        Properties:
            BucketName: !Join ["-", [!FindInMap ["SourceCode", "General", "SolutionName"], !Ref "AWS::AccountId", !Ref "AWS::Region", "s3-access-log"]]
            AccessControl: LogDeliveryWrite
            BucketEncryption:
                ServerSideEncryptionConfiguration:
                    - ServerSideEncryptionByDefault:
                        SSEAlgorithm: AES256
            PublicAccessBlockConfiguration:
                BlockPublicAcls: true
                BlockPublicPolicy: true
                IgnorePublicAcls: true
                RestrictPublicBuckets: true

Issue: Because ACLs are enabled (AccessControl: LogDeliveryWrite), then Object Ownership must be set with Bucket owner preferred. It can be added with the following property:

          OwnershipControls:
            Rules:
              - ObjectOwnership: BucketOwnerPreferred

"AccessControl" is actually a legacy property and not recommended any longer for most use cases, except in unusual circumstances where you must control access for each object individually.

Therefore, if the AccessControl property is disabled, the object ownership will be for the bucket owner enforced by default. If we remove "AccessControl" property, the resource is created successfully.

Hope this is helpful! Thank you.

References:
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-s3-bucket.html#cfn-s3-bucket-accesscontrol
https://docs.aws.amazon.com/AmazonS3/latest/userguide/about-object-ownership.html

Issue creating package

Hello guys,

After deploy my datalake with CloudFormation template, I logged in DataLake Console and tried to create a new package.
I've supplied required parameters and everything apparently worked fine, but behind the scenes I got two 502 errors. Console couldn't find crawler metadata and tables.
As I could see, nothing was created in AWS Glue and even if I browse to package edition screen and try to upload a file or even a manifest, nothing happens and I'm not able to save my package.

I really don't know where I can see any glue. I searched every Cloud Watch log entry and found nothing.

Could you help me?

Regards,
Clovis Chedid

S3 prefix

Hi there,

is there a way to change the s3 packages prefix? I'd like to avoid to save packages in the bucket root but for example add them in a packages/ prefix.

Any suggestions?

Hyphen character in package name is causing issues.

A hyphen in the start of the package name seems to be causing issues. Are there any workarounds?

user$ datalake get-package-crawler --package-id '-Ugxxxxxx'
error: unknown option `-g'

user$ datalake list-package-tables --package-id '-Ugxxxxxx'
error: unknown option `-g'

We have tried single quotes, double quotes, and prepending with a \ character.

Sample of working queries:

$ datalake get-package-crawler --package-id "a6vxxxxxx"
{"name":"delighted a6vxxxxxx","status":"READY","lastRun":"SUCCEEDED"}

Background of the issue: The package should have been deleted but is still showing up on the console.

Data lake solution - user's permission issue

Hi,

Reference: http://docs.aws.amazon.com/solutions/latest/data-lake-solution/deployment.html

I created a "Data lake" from the cloud formation template in Oregon region and all seems to have completed successfully from a cloud formation perspective. As the admin I able to login into Data lake UI but not able to create new package, it's throwing below error:

Service Error
{"error":{"message":"User is not authorized to perform the requested action."}}
Also unable to get users and settings information under Administration tab:
Service error
An unexpected error has occurred while retrieving users.
Service error
An unexpected error has occurred while retrieving settings.

Any suggestions to resolve this issue will be very helpful to us

Thanks,
Hari.

502 error from /prod/cart

Hi,

After enabling SAML, i am getting 502 error from cart service.
I doubt some issue with SAML, as the request is going to elasticserach.
Is there any configuration i need to do after SAML for ElasticSeach and Kibana.

Thanks
Ravi S

Script to build the project

Hi,

Is there a script to build the whole project ?

To give context, we would like to build the whole project and upload it to our S3 bucket which is ready to be deployed.

Many thanks,
Rui

Data Lake CI/CD

We are currently trying to figure the best way of doing CI/CD with the data lake solution.

Ideally, we'd like to have a separation of dev and prod environment for the data lake, so we can test changes and add new features. Additionally, it would be really useful if we could deploy different component of the data lake without impacting on the rest of the stack. e.g. deploy changes on the frontend, deploy new lambda function.

Are the above CI/CD use cases currently supported by the data lake cloudformation templates? Do you have any recommendations on how to archive them?

Can not see package on dashboard

Hi,

I have launched cloud formation of AWS Data Lake. There was no error. I CAN create package from file & S3 with manifest as well. Still after I go back to dashboard I CANNOT see any package created. The only way to view, edit packages is to open saved the link to it. This is because I CAN see that package exists in bucket. I log into AWS Data Lake as admin.

Secondly, when I do searching I do not receive any results. I guess it is related to the first issue.

Thanks for any hints.

latest npm package for elasticsearch 16.x is not working.

Here: https://github.com/awslabs/aws-data-lake-solution/blob/fa800dd1b2339184377742a6dc96b1e53f47b6ff/source/api/services/search/package.json#L11

must be changed with something like:

"elasticsearch": "^15.1.1",

in order to work with the current code.

Or the current code should be changed in order to work with indexes with more than one type (see also https://discuss.elastic.co/t/unable-to-create-index-with-more-that-1-type-in-6-x/106089 and https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch)

Missing API support for /packages

Minor bug. The client-side packageFactory.js makes calls to GET /packages but this is not actually supported by the API Gateway code in content-package.js. Existing code does support POST /packages but also needs to support GET /packages. The new code will call _package.getPackages(...)

Fails to deploy, bucket interpolation bug

The data lake solution fails to deploy, with the following CloudFormation error:

Template bucket referenced by https://s3.amazonaws.com/mybucket-us-east-1/data-lake/latest/data-lake-storage.yaml does not exist.

The deploy instructions tell the user to set DEPLOY_BUCKET to the desired S3 bucket that will host deployment assets, to then build the deployment assets, and finally copy them to the aforementioned bucket.

The build-s3-dist.sh script replaces BUCKET_NAME in the deployment YAML file with the value that the user supplied in DEPLOY_BUCKET. However, the nested stacks (e.g. DataLakeStorageStack) use the following:

!Join ["-", [!FindInMap ["SourceCode", "General", "S3Bucket"], Ref: "AWS::Region"]]

This results in CloudFormation stack creation failure because CloudFormation is looking for the YAML file in a bucket with the region suffix:

s3.amazonaws.com/mybucket-us-east-1/data-lake/latest/data-lake-storage.yaml

when, in fact, the YAML file is in a bucket without the region suffix:

s3.amazonaws.com/mybucket/data-lake/latest/data-lake-storage.yaml

Adding API Access for user not providing Secret Access Key via pop up per docs

Did a fresh install of this via the provided CloudFormation template just a few days ago in a new AWS account. After reviewing the docs, which indicate that in order to access the application via the CLI or API you need to enable API access for your users, I followed the steps to enable API access for myself (as admin). The access key appears in the API Access tab but I cannot seem to retrieve my secret-access key. Per the documentation quoted below -- there should be some one time pop up (sounds analogous to generating command line credentials in IAM), however, this pop up is not appearing after clicking "Generate Access Key" (I've tired in Chrome and Firefox) . I tried reviewing the developer tools within Chrome and inspected the response payloads for the calls on this screen that might contain this information but no luck.

  1. Open the Data Lake console.

  2. In the navigation pane, under the My Account menu, select Profile.

  3. Your Data Lake API Endpoint and Access Key are located under the API Access section of your profile.

  4. To generate a Secret Access Key for your account, click on the Generate button under the API Access section of your profile. Once a Secret Access Key has been generated for your account, it will be displayed for a one time download opportunity in the API Credentials pop-up. Your credentials will look something like this:

    Data Lake API Endpoint: samplek19ruh.execute-api.us-east-1.amazonaws.com/Prod
    Data Lake Access Key: SJxiA_EXAMPLEKEY
    Data Lake Secret Access Key: f10e347df150638393502dEXAMPLEKEY

reference: http://docs.awssolutionsbuilder.com/data-lake/api/working-with-api/

Bug report: failed to create "DataLakeHelper" due to "nodejs12.x" which is no longer supported

Hello,

The CloudFormation stack fails to create the DataLakeHelper because of "nodejs12.x" is no longer supported.

The complete error:
Resource handler returned message: "The runtime parameter of nodejs12.x is no longer supported for creating or updating AWS Lambda functions. We recommend you use the new runtime (nodejs18.x) while creating or updating functions. (Service: Lambda, Status Code: 400, Request ID: b7bf5f63-e293-41f1-9ce5-b8c8f700dd53)" (RequestToken: 2becaa7e-5e0c-91b4-6260-6bb52a9872f2, HandlerErrorCode: InvalidRequest)

(ids modified)

Cloudformation failing

The following resource(s) failed to create: [ConsoleCFDistribution, DataLakeServicesStack, DataLakeSearchIndex

Need assistance in debugging

cfn_fail

SAML configuration not working.

Hi,
we are using centrify as saml provider, but federated template is failing with below error.
i am giving AD FS Hostname: https://XXXXXXX-dev.my.centrify.com

2019-05-17T19:06:39.399Z e8f9cc5c-722d-4784-9f01-76d84d63ccdd Failed to create data lake Cognito identity provider:

InvalidParameterException: Non-ok status code 404 returned from remote metadata source https://XXXXXXX-dev.my.centrify.com/FederationMetadata/2007-06/FederationMetadata.xml

2019-05-17T19:06:39.399Z e8f9cc5c-722d-4784-9f01-76d84d63ccdd RESPONSE BODY:

{ "Status": "FAILED", "Reason": "https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/lambda/data-lake-helper;stream=2019/05/17/[$LATEST]67c0ab46e276450da7462b28a9dfa0a7", "PhysicalResourceId": "FederateLogin", "StackId": "arn:aws:cloudformation:us-east-1:006976719545:stack/abc-datalake-federated-template/bf89dfa0-78d2-11e9-a00d-0a5c603a1bba", "RequestId": "8ffe0c4b-4e1e-4e58-9395-c89dab7a7305", "LogicalResourceId": "FederateLogin", "Data": { "Error": "Failed to create data lake Cognito identity provider" }

I know this url doesnot exists,
https://XXXXXXX-dev.my.centrify.com/FederationMetadata/2007-06/FederationMetadata.xml

but i dont think centrify works like MS ADFS,

May be i am setting up something wrong.

data-lake-deploy.template does not exist in "dist" folder after build

Hi,

After this step "Build the data lake solution for deployment", I didn't find "data-lake-deploy.template" or "data-lake-deploy-federated.template" in the dist folder.

I tried to copy the "data-lake-deploy.template" and "data-lake-deploy-federated.template" to "dist" folder and then upload to s3 bucket. However, I got "TemplateURL must be an Amazon S3 URL" error when running "Deploy the data lake solution" using Cloudformation.

Here are my environment variables:
export AWS_REGION=us-east-1
export VERSION_CODE=v2.0.0
export DEPLOY_BUCKET=jslake

How do I store Parquet files?

There is no way to upload a folder or link the existing folder in S3 to Data Lake Package content. My data is in Parquet format. How do I go about this kind of partition formats if I want to use the Data Lake solution?

Failed to call createSearchIndex

Hi Team,

We are trying to deploy your data lake solution using cloudformation. However, we are currently getting is "Custom Resource failed to stabilize in expected time" error in Cloudformation when "Custom:LoadLambda" is trying to call createSearchIndex. Any suggestions would be much appreciate.

Here is the snippet of cloudformation template that causing the problem:

DataLakeSearchIndex: Type: "Custom::LoadLambda" Properties: ServiceToken: Fn::GetAtt: - "DataLakeHelper" - "Arn" Region: - Ref: "AWS::Region" clusterUrl: !Join ["", ["https://", !GetAtt DataLakeStorageStack.Outputs.EsCluster ]] searchIndex: "data-lake" customAction: "createSearchIndex"

Best wishes,
Rui

Signing API Request to application

I'm having trouble getting the API access to the application working. I'm getting Unauthorized responses, so I'm assuming I'm missing something with respect to the signing process for the requests. I am using the provided JS code to generate this signature for my request, as well as, implemented the signing functionality in another language - which I'm getting the same hash output as the provided JS function, for the same input data. Could you clarify the some of the specifics regarding the steps to sign the requests for this application? (Reference: http://docs.awssolutionsbuilder.com/data-lake/api/working-with-api/)

  1. per the documentation, the endpoint parameter to the signing function should not include the leading "https://" or end in a slash ("/") ?
  2. the example apiEnpoint has "Prod" in the URL cased with a leading capital 'P' -- however, when I inspect the network traffic via Developer tools, the Request URL indicates this is lowercase "prod" for our implementation -- should we use the casing that matches what we see in our deployment?
  3. the apiEndpoint parameter is always just the same value for your specific instance of the application, regardless of what specific endpoint you are signing a request for -- e.g POST to /packages/new vs. a GET to /cart, would both take the same apiEndpoint value as input to the signing function?
  4. The strings "DATALAKE4", "datalake" and "datalake4_request" that are included in the provided function to create the keys for the hashing or used as values to actually hash the data for various steps remain constant for all requests to the application, right? The only application deployment instance specific information is the accesKey, secretKey, dateStamp and apiEndpoint?

Lastly, I noticed that traffic from the the web application uses Cognito authernatication with a JWT in the "Auth" header - "tk:" instead of "ak". Is an acceptable access pattern for integrating with the API to hook into Cognito with a custom service user that can retireve its own JWT before running API calls against the API with this token instead of the signing process?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.