GithubHelp home page GithubHelp logo

gns-science / nshm-toshi-api Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 680 KB

An extensible API where task metadata, and important input and output files relating to data-intensive science processes are retained. Custom task schemas can be defined to support their meta-data needs.

License: GNU Affero General Public License v3.0

Python 100.00%

nshm-toshi-api's People

Contributors

chrisbc avatar chrisdicaprio avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

nshm-toshi-api's Issues

Create skeleton API for demo

As NSHM testers

We want to store and catalogue the test results from tests on local machines, servers, clusters etc

So that we can compare historic outputs, etc

Done when

  • secure with APIKey (for demo only)
  • serverless deploy
  • store and get tuple of (binary, json_meta)

Feature: Inversion Solution support for labelled table relations

The SRM team want to run different hazard analysis and visualise these using maps and plots , aventually these will come from openquake (ref #68 ). Some of these will produce quite large tables (esp gridded) and will be produced independently, we want to link them as they're produced and retain maximum flexibility/scalablity.

Done When:

  • API user can link table with type, created, table_id using a standard Mutation query
  • API table-link accepts meta-data so additional data may be collected, describing the table properties
  • Add meta-data to Table

Fix node ID uniqueness bug

Expected this , but it was not found until we tested in anger on beavan cluster.

The current approach using S3 object counts is not bulletproof under high load.

Setup CI/CD pipeline using serverless stack

AS NSHM team

we want to configure CI/CD on this project

so that deployments to test and prod environments are robust and environments are stable

Done when

  • we define a branch user-test that is linked to serverless (sls) stage test
  • main branch is linked to sls stage prod
  • typical CI/CD behaviour on each (PR->test->merge->deploy

possible guide:
deploy using github actions (https://medium.com/better-programming/set-up-a-ci-cd-pipeline-for-aws-lambda-with-github-actions-and-serverless-in-under-5-minutes-fd070da9d143)

Feature: RuptureSet subclass of file

As API users we want to capture the specifics of RuptureSet files

so that they're more easily used in UI & client code.

example http://simple-toshi-ui.s3-website-ap-southeast-2.amazonaws.com/FileDetail/RmlsZToxMjkwOTg0

Done When:

  • API client can create a new RuptureSet
  • API client can fetch a new RuptureSet
  • API clients should be able to access these objects as Files using either File NodeID or RuptureSet NodeID
  • API search supports RuptureSet
  • Same fields as File , and also...
    • created (date/time)
    • Producer ID (link to writer task)
    • metrics kv list (from the task output)
    • fault_model field (from arguments)
  • same upload download features as File

Add GeneralTask as new schema type

We want to capture metadata and related inputs/outputs for arbitrary tasks that may not happen often enough to justify automation and/or a custom schema type. We'll call this a GeneralTask * as it may be used for many purposes.

NB briefly considered calling this type VersatileEvent

Attributes are:

  • related files (with reader/writer role)
  • agent_name: the name of the person or process responsible for the task
  • title
  • description
  • created

APi errors

Inversion runs:
R2VuZXJhbFRhc2s6ODA0NlFNVTc0 and R2VuZXJhbFRhc2s6ODExOVI0VHhB

in batch fargate log ...

2022-02-07T18:04:29.516+13:00 self._toshi_api.automation_task.upload_task_file(task_id, java_log_file, 'WRITE')
...
2022-02-07T18:04:29.517+13:00 requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: https://aihssdkef5.execute-api.ap-southeast-2.amazonaws.com/prod/graphql

and in Toshi API log

Previous request

2022-02-07T18:03:59.483+13:00 SearchManager.index_document https://search-nzshm22-toshi-api-es-prod-cj4taqcgnefophpxzan55xeswa.ap-southeast-2.es.amazonaws.com/toshi_index/_doc/ThingData_81865pDzH_object.json
...
2022-02-07T18:03:59.537+13:00 b'{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse [files]"}],"type":"mapper_parsing_exception","reason":"failed to parse [files]","caused_by":{"type":"illegal_state_exception","reason":"Can\'t get text on a START_OBJECT at 1:78"}},"status":400}'

...
and then

2022-02-07T18:03:59.954+13:00 START RequestId: a9ff3ded-cde6-4ebc-93ce-803742f23d35 Version: $LATEST
2022-02-07T05:04:29.981Z a9ff3ded-cde6-4ebc-93ce-803742f23d35 Task timed out after 30.02 seconds

Feature: Table Schema type so we can collect MFDs etc

We want to collect spreadsheet like tables

so that they can be associated with tasks or files and consumed easily by UI etc

Done When:

  • table column names are configurable
  • table column types are configurable
  • read / write entire table + row as object
  • Inversion Task accepts mfd_table property
  • [ ]

Slow search with total_count field

eg ....


query q1 {
  search(search_term: "inversion") {
    search_result {
      total_count
      edges {
        node {
          __typename
          ... on Node {
            __isNode: __typename
            id
          }
          ... on RuptureGenerationTask {
            created
            id
            duration
            state
            result
            
          }
          ... on GeneralTask {
            description
            title
            created
            children {
              total_count
            }
          }
          ... on File {
            id
            file_name
            file_size
          }
        }
      }
    }
  }
}

is too slow with total_count field. Shouldn't this be as fast with & without.

Initial beavan cluster rupture generation with API (smoke test)

SRM Team,

we want to run some real-world smoke testing on beavan,

so we can see how things behave

Done when

  • finalise API schema for opensha Rupture Generation
  • publish python API client in nshm-toshi-client
  • modify client-side automation to support API interface
  • configured on beavan
  • running tests

Feature: dynamodb migration

We want to use dynamodb for more responsive user experience and better integrity on mutations

  • add pynamodb model for each data class (File, Thing, etc)
  • on read 1st read pynamodb, if not there read from S3
  • on write write just to pynamodb
  • show pattern for adding test coverage

BUG: produced_by_id incorrect on sub-query

find why this produces two different node IDs...
in TEST...

  node(id:"SW52ZXJzaW9uU29sdXRpb246MTU0NC4wdlA4QmQ=")
  {
    ... on InversionSolution {

      produced_by_id
      produced_by {
        id
      }
    }
  }}

Incomplete set of objects returned by get_all()

Seems that the objects aren't all returned by this S3 API call... in data_s3.base_s3_data.py

def get_all(self):
        """
        Returns:
            list: a list containing all the objects materialised from the S3 bucket
        """
        task_results = []
        for obj_summary in self._bucket.objects.filter(Prefix='%s/' % self._prefix):
            prefix, task_result_id, _ = obj_summary.key.split('/')
            assert prefix == self._prefix
            task_results.append(self.get_one(task_result_id))
        return task_results

In S3 docs on filtering we see:

The response might contain fewer keys but will never contain more

Note we'll be adding proper relay pagination support soon, but for now lets do this so we can see all the contents.

BUG: GeneralTask does not support the 'parents' relation

this mutation will fail when the child_id refers to a GeneralTask as this does not support the 'parents' attribute

mutation new_gt_link {
  create_task_relation(
    parent_id: "R2VuZXJhbFRhc2s6Mg=="
    child_id: "R2VuZXJhbFRhc2s6NA=="
  )    
....

The relation is created but any query on it via taskrelation will fail

suggest we add this attribute so that GeneralTasks can also have parent tasks.

Add HazardAnalysisTask

As SRM Team

we want to configure a HazardAnalysisTask with input & outputs, input arguments and run metrics

so the task results can be recorded and available for further analysis

Input file (opensha solution)
Output (hazard curve(s))
task arguments (from NSHMInversionRunner)
Metrics  - similarity

S3 data read/write consistency

Related to the #41 is looks that rapid-fire bursts of updates to a single object - as will occur at the beginning of a cluster job with many sub-tasks can cause the read consistency to fail. It seems likely that writes are buffered at S3 and reads in close proximity will not immediately 'see' the updated status.

This was found on beavan cluster with 40 rupture build tasks submitted via run_rupture_sets.py to TEST API. In this case, while no errors were reports client-side the parent general task R2VuZXJhbFRhc2s6MjA4ckx0Y3M= has just 22 children instead of the expected 40. All the child tasks, files and relationships have been written correctly.

Done when:

  • confirm issue is 'avoidable' by inserting small start offsets to tasks to give S3 enought time to reach consistency. Suggest 200-500ms per task should be plenty.
  • test a write-through cache solution between the data manger and the S3 read/write operations. This will need to manage it's memory footprint. (expiring cache objects based on last access time?)

Migration: convert File objects to RuptureSet or InversionSolution objects

We want to migrate File objects to the new types as appropriate so the users can benefit from the new features for historic objects.

NB: API clients should be able to access these objects as Files using either File NodeID or Subclass NodeID

Done When

  • convert to RuptureSet object when (define criteria [max_jump_dist, fault_model])
  • convert to InversionSolution object when (define criteria [completion_energy, ])
  • for each conversion....
    • delete old ES index entry (until
    • add new ES index entry
    • log object ID and other info for data audit
  • pre conversion
    • copy PROD data to test bucket and validate entire process there, including UI
    • make backup Bucket
    • user comms

Feature: Add InversionSolution to schema as File sub-class

As API users we want to capture the specifics of InversionSolution files

so that they're more easily used in UI & client code.

Done When:

  • API client can create a new InversionSolution
  • API client can fetch a new InversionSolution
  • API clients should be able to access these objects using either File NodeID or InversionSolution NodeID
    NOTE that Both File.ID and InversionSolution.ID will resolve to the same FileData object, and this will be returned cast to the clazzname = InversionSolution.
  • API search supports InversionSolution
  • Same fields as File , and also...
    • created (date/time)
    • Hazard Table ID
    • MFD table ID
    • Producer ID (link to writer task)
    • metrics kv list (from the task output)
  • same upload download features as File
  • A field to get the object ID as it's File superclass (maybe helpful UI during migrations)

Add InversionTask

As SRM Team

we want to configure an InversionTask with input & outputs, input arguments and run metrics

so that these task results are recorded and available for further analysis

  • Input file (opensha ruptureset)
  • Output file (opensha solution)
  • task arguments (from NSHMInversionRunner):
        double totalRateM5 = 5d; // expected number of M>=5's per year TODO: OK? ref David Rhodes/Chris Roland? [KKS, CBC]
        double bValue = 1d; // G-R b-value
        // magnitude to switch from MFD equality to MFD inequality
        double mfdTransitionMag = 7.85; // TODO: how to validate this number for NZ? (ref Morgan Page in USGS/UCERF3) [KKS, CBC]
        double mfdEqualityConstraintWt = 10;
        double mfdInequalityConstraintWt = 1000;
        int mfdNum = 40;
        double mfdMin = 5.05d;
        double mfdMax = 8.95;
        GutenbergRichterMagFreqDist mfd = new GutenbergRichterMagFreqDist(
                bValue, totalRateM5, mfdMin, mfdMax, mfdNum);
        int transitionIndex = mfd.getClosestXIndex(mfdTransitionMag);
        // snap it to the discretization if it wasn't already
        mfdTransitionMag = mfd.getX(transitionIndex);
        Preconditions.checkState(transitionIndex >= 0);
        GutenbergRichterMagFreqDist equalityMFD = new GutenbergRichterMagFreqDist(
                bValue, totalRateM5, mfdMin, mfdTransitionMag, transitionIndex);
        MFD_InversionConstraint equalityConstr = new MFD_InversionConstraint(equalityMFD, null);
        GutenbergRichterMagFreqDist inequalityMFD = new GutenbergRichterMagFreqDist(
                bValue, totalRateM5, mfdTransitionMag, mfdMax, mfd.size() - equalityMFD.size());
        MFD_InversionConstraint inequalityConstr = new MFD_InversionConstraint(inequalityMFD, null);

        constraints.add(new MFDEqualityInversionConstraint(rupSet, mfdEqualityConstraintWt,
                Lists.newArrayList(equalityConstr), null));
        constraints.add(new MFDInequalityInversionConstraint(rupSet, mfdInequalityConstraintWt,
                Lists.newArrayList(inequalityConstr)));

        // weight of entropy-maximization constraint (not used in UCERF3)
        double smoothnessWt = 0;

ISSUE: Elastic Search offline in PROD - toshi_index index is down

Looks like something's happened to our ES search index in PROD. Heres' the health panel...
Showing eawrchable document = 0 and the main index ('toshi_index') does not exist...

client (from API lambda logs)

2021-11-19T10:05:19.436+13:00Copy{'error': {'root_cause': [{'type': 'index_not_found_exception', 'reason': 'no such index', 'resource.type': 'index_or_alias', 'resource.id': 'toshi_index', 'index_uuid': '_na_', 'index': 'toshi_index'}], 'type': 'index_not_found_exception', 'reason': 'no such index', 'resource.type': 'index_or_alias', 'resource.id': 'toshi_index', 'index_uuid': '_na_', 'index': 'toshi_index'}, 'status': 404} | {'error': {'root_cause': [{'type': 'index_not_found_exception', 'reason': 'no such index', 'resource.type': 'index_or_alias', 'resource.id': 'toshi_index', 'index_uuid': '_na_', 'index': 'toshi_index'}], 'type': 'index_not_found_exception', 'reason': 'no such index', 'resource.type': 'index_or_alias', 'resource.id': 'toshi_index', 'index_uuid': '_na_', 'index': 'toshi_index'}, 'status': 404}

image

bug: inversion_solution query for old sub-solution returns wrong ID.

re http://simple-toshi-ui.s3-website-ap-southeast-2.amazonaws.com/GeneralTask/R2VuZXJhbFRhc2s6NjAzOWI5TUNV/Details

To reproduce - go the above view, and view Show Reports . All the pages show the same (incorrect) inversion solution SW52ZXJzaW9uU29sdXRpb246MTY3NDIuMFVrbXJl

Done when

  • create a test fixture
  • create a test that reproduces the issue - it will fail
  • make the fix and see the new test pass. In the comments link to this ticket URL.

Add Report Task

We want to capture the publication of various analysis reports

e.g RupSetDiaganostics so that the team have immediate/reliable access to these

Notes:

Attributes

  • report has type (RuptSetDiag, InvSolDiag, (later named-fault-*
  • zip file of report (so we can re-publish)
  • published location URI
  • created
  • meta {k v}

Optimise: simplify file structure used for file_relation_data

The current design has these relations saved independenty to unique object.json files. So, for every link we have:

  ObjectA.son  <-> Relationship.json <-> ObjectB.json
   - other props       - Role           - other props

this is a very flexible solution, supporting many-to-many relationships and also relationship properties (e.g ROLE) . The price is that the extra file read/writes make certain API operations overly slow and IO intensive.

Proposed:

 ObjectA.json                        <-> ObjectB.json
  - relatedTo [(objectID, Role)]         - relatedTo [(objectID, Role)] 
  - other props                        - other props

Snags:

  • as currently the Relationship object is a graphql Node, so what do we have that relies on the abilitiy to resolve Node id? If nothing then there's no client impact
  • old data will need to be migrated
  • now to test the stability and actual performance impacts of this change

Feature: Query for InversionSolution for AutomationTask (AT) where AT.id_in: [is0, id1...]

We want used to select given IDs client-side, to need a way to retrieve just those efficiently.

fragment AT on AutomationTask {
  files {
    edges {
      node {
        file {
          __typename
          ... on InversionSolution {
            id #etc
          }
        }
      }
    }
  }
}

#current option (not useable in standard relay clients  really...
query multi_at {
 node0: node(id:"QXV0b21hdGlvblRhc2s6Mjk1OVZmTlpj"){
    id
     __typename
		... AT
  }
 #http://simple-toshi-ui.s3-website-ap-southeast-2.amazonaws.com/AutomationTask/QXV0b21hdGlvblRhc2s6Mjk0MmhWck13 
 node1: node(id:"QXV0b21hdGlvblRhc2s6Mjk0MmhWck13"){
    id
     __typename
		... AT
  }  
}

#proposed approach 
query new_AT_id_in_demo {
  automation_task(id_in:["QXV0b21hdGlvblRhc2s6Mjk1OVZmTlpj", "QXV0b21hdGlvblRhc2s6Mjk0MmhWck13" ]) {
    edges {
      node {
        id
        ... AT{
        }
      }
    }
  }
}```

Fully implement simplified Rupture Generation task

We want to complete work to remove brittle schema fields and replace with KV objects

Done when:

  • KV attributes for environment (rupt_gen_task)
  • KV attributes for metrics (rupt_gen_task)
  • KV attributes for file_metadata (file)
  • remove old-style rupt_gen_task and rename new implementation

NB this work is based on branch simplified_rupture_gen_schema

Add RupturesetDiagnosticsTask

As SRM Team

we want to configure a RupturesetDiagnosticsTask with input & outputs, input arguments and run metrics

so the task results can be recorded and available for further analysis

properties:

  • Input file (opensha solution)
  • Output report
  • task arguments (from NSHMInversionRunner)
  • metrics

Feature: Add cloudwatch instrumentation to measure performance

We want to record some metrics to get performance baselines in preparation for dynamodb and other performance related improvements.

Done when:

  • add cloudwatch module and configure ACL in serverless
  • identify and instrument the main API performance points (metrics)
  • configure a Toshi-API Cloudwatch dashboard to monitor the metrics

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.