GithubHelp home page GithubHelp logo

arcticreport's People

Contributors

dvirlar2 avatar jeanetteclark avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

arcticreport's Issues

Decide on a caching mechanism for query results

The two primary query functions in the package: query_objects and query_version_chains take 20 minutes and 100 minutes to run, respectively. query_objects returns a data.frame with a row for every object in the ADC. query_version_chains takes the result of query_objects and assigns an arbitrary series identifier to each version chain. The rest of the functionality in the package is slicing, dicing, summarizing, and plotting metrics based on those two tables.

Since the functions take so long to run, it is definitely not practical to run these two functions often. For CI, we could build a status page that runs everything once a day or so, and fills in tables for the quarterly metrics when those milestones show up. For local testing, or creating one-off kind of plots, it would be beneficial to set up a standard way of caching those query results for ease of use.

Open to any suggestions. The bigger of the two tables is about 100MB when saved to disk.

make `plot_cumulative_metric` smarter

It works in certain cases, and you can plot cumulative count or size of either data files or metadata files, but it could be smarter at:

  • units (for the size metric)
  • axis labels
  • where in time the plot starts
  • placement/size of the ADC start line

make `query_objects` faster

I had the idea on a call that there is a way we can make query_objects much faster by keeping parts of the cache that are still relevant. Below are some changes that would need to be made

  • add dateModified to the fields returned by the query function
  • if a cache is found, filter to keep all objects with a dateModified older than the datetime at runtime
  • query for objects only with a dateModified more recent than the datetime at runtime
  • save the cache
  • if no cache is found, then make sure the datetime you are querying against is very far in the past so that you get all of the objects
  • remove the cache_tolerance parameter

review list of creator names that are removed

the count_creators function has a list of creators that are removed. The comment in the code says:

# Grep-based filters
# Bryce created these (and we can expand these) based upon what I saw in the results
# that looked like organizations or non-persons of some sort or another

We should review this list against the list of unique creators and decide if we want to expand, revise, or altogether remove this list.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.