GithubHelp home page GithubHelp logo

cafincubator / midden Goto Github PK

View Code? Open in Web Editor NEW
14.0 14.0 16.0 149.79 MB

A research metadata catalog and metadata editor that integrates into common workflows used in academic research.

License: Creative Commons Zero v1.0 Universal

HTML 31.29% CSS 1.61% C# 65.59% JavaScript 1.29% Batchfile 0.22%
academic data data-catalog data-management data-science metadata research research-data-management

midden's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

midden's Issues

Error when previewing datasets: map container already initialized

To reproduce:

Go to "Catalog", click magnifying glass to preview dataset, close, click any other magnifying glass to preview second dataset

Error:

crit: Microsoft.AspNetCore.Components.WebAssembly.Rendering.WebAssemblyRenderer[100]
      Unhandled exception rendering component: Map container is already initialized.
      Error: Map container is already initialized.
          at i._initContainer (https://unpkg.com/[email protected]/dist/leaflet.js:5:37578)
          at initialize (https://unpkg.com/[email protected]/dist/leaflet.js:5:26026)
          at new i (https://unpkg.com/[email protected]/dist/leaflet.js:5:2616)
          at Object.t.map (https://unpkg.com/[email protected]/dist/leaflet.js:5:141663)
          at Module.create (https://meta.cafltar.org/geojsonMap.js:23:24)
          at https://meta.cafltar.org/_framework/blazor.webassembly.js:1:3942
          at new Promise (<anonymous>)
          at Object.beginInvokeJSFromDotNet (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:3908)
          at Object.w [as invokeJSFromDotNet] (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:64218)
          at _mono_wasm_invoke_js_blazor (https://meta.cafltar.org/_framework/dotnet.5.0.9.js:1:190800)

Rewrite pages and components to have consistent state and business logic

I need to clean up some of the architecture. Right now some "pages" handle no logic and all the logic, event handing, etc. occur in the components. For example, the Catalog page has a CatalogViewer. The CatalogViewer handles event updates to State and such things. This forced me to create a second component for ProjectCatalogViewer which is very similar to the CatalogViewer. But then I have a MetadataView page that handles the event logic and passes a Metadata to a MetadataDetails component.

I'm not being consistent with the way I implement logic

I think a better way to deal with this might be for the page to handle updating states and passing that along to the component? This will also allow the CatalogViewer and VariableViewer to reference the same underlining List<Metadata> instead of each having to create a subset of the list (in case of Project-specific or Zone-specific pages)

Proposal: Rewrite CatalogViewer and VariableViewer to only iterate on a List<Metadata> (the variable viewer will still need to create a List<CatalogVariableViewerViewModel> from the List<Metadata>)

Enhance UI for Projects

These are tasks / suggestions that were discussed during a presentation of how Projects were implemented in the latest dev build:

  • Projects page (catalog/projects)
    • Should render projects with markdown instead of simple text. Also set a max height and either use a scroll bar or a "more..." link to expand
    • Show number of datasets per project; don't worry about performance until it becomes an issue. Can add a field like datasetCount in catalog.json that is populated by the collator
  • Strip whitespace when assigning the relationship between projects and datasets via LINQ to avoid input issues
  • Insights page
    • Show datasets without projects and/or projects without datasets to highlight potential issues
  • Catalog page (catalog)
    • Remove tabs for Datasets, Variables, Projects. Instead, make them sub-items with each having their own url; this enables back button better
  • Don't use .mippen; use .md files instead (this has been addressed between that meeting and this note)
  • Create a project editor that is a markdown editor and allows downloading of the project file and loading (like the metadata editor)
    • Add the "Project" editor as a sub-menu to the current "Editor" nav menu. Also add a "Dataset" sub-item under the same parent that links to the current editor
  • Consider a "Variable" tab in the Project-detail pages; this in addition to the current datasets that are listed

Emphasize zone more when displaying datasets

In the metadata / dataset viewer, it's not clear what zone the data belong to. Zone (and project) is only indicated through the breadcrumb path. It would be good to put more emphasis on zone (and maybe project) when displaying datasets.

Revisit links and pages

We need a deep philosophical/metaphysical/existential discussion on pages, links, and nav.

Midden used to be just datasets but now there are tags, projects, zones, etc. It makes some sense to list these elements but this complicates navigation.

In the 0.2-dev.2 build there are catalog pages for the above elements. But things get weird fast. For example:

catalog/dataset lists all datasets for all zones. catalog/zones lists all the data zones. If you follow a zone link it goes to catalog/zones/{specific-zone} that lists all the datasets in that zone. But shouldn't this be the dataset catalog, just filtered by zone? Something like: catalog/datasets/zones/{specific-zone}? But if we have that, then what does catalog/datasets/zones list? All data for all zones? That's the same as catalog/datasets!

Redesign Home

The Home page in the web app needs some love.

Consider:

  • Icons/graphics for each link (Insights, Catalog, Editor) instead of just text
  • Functional insights like "Recent Updates" to show new datasets
  • Graphics/logos?
  • Help documents? Introduction material (what is Midden?)?

Reevaluate how zones, projects, datasets are displayed

  • Catalog: Put dataset name first in card title, make it a link to the metadata page; remove "View" button
    • Consider adding zone and project id in card body
    • Use tag style for this? Or put in Tag section? Probably put up top, above description, as tags

Error when using a "\" in the item name attribute

An error occurs when loading the catalog.json into the web Catalog when using a "" in the item name attribute in a midden file. Looks like the editor can create it, but when I tried to see the item in the catalog, I was getting an error (not able to open the item).

Include version in all web app pages

The Midden version is currently only seen in the Editor. Either including it in the header or the footer should work so that it can be visible in Insights, Catalog, and homepage.

Clickable Keywords

Make keywords clickable so you can quickly see other datasets/metadata associated with that keyword. Would also be handy if you can assign projects as keywords on datasets, linking multiple projects to one dataset.

Error: Unhandled exception rendering component: Cannot read properties of null (reading 'removeChild')

Reproduce:

In Catalog, click the preview button, then click the "View Page" button.

The follow is the error message:

blazor.webassembly.js:1 
        
       crit: Microsoft.AspNetCore.Components.WebAssembly.Rendering.WebAssemblyRenderer[100]
      Unhandled exception rendering component: Cannot read properties of null (reading 'removeChild')
      TypeError: Cannot read properties of null (reading 'removeChild')
          at e (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:10331)
          at e (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:10303)
          at Object.e [as removeLogicalChild] (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:10303)
          at e.applyEdits (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:33040)
          at e.updateComponent (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:32271)
          at Object.t.renderBatch (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:12134)
          at Object.window.Blazor._internal.renderBatch (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:61913)
          at Object.w [as invokeJSFromDotNet] (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:64435)
          at _mono_wasm_invoke_js_blazor (https://meta.cafltar.org/_framework/dotnet.5.0.9.js:1:190800)
          at wasm_invoke_iiiiii (wasm://wasm/00aba242:wasm-function[5611]:0xdda7f)
Microsoft.JSInterop.JSException: Cannot read properties of null (reading 'removeChild')
TypeError: Cannot read properties of null (reading 'removeChild')
    at e (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:10331)
    at e (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:10303)
    at Object.e [as removeLogicalChild] (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:10303)
    at e.applyEdits (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:33040)
    at e.updateComponent (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:32271)
    at Object.t.renderBatch (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:12134)
    at Object.window.Blazor._internal.renderBatch (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:61913)
    at Object.w [as invokeJSFromDotNet] (https://meta.cafltar.org/_framework/blazor.webassembly.js:1:64435)
    at _mono_wasm_invoke_js_blazor (https://meta.cafltar.org/_framework/dotnet.5.0.9.js:1:190800)
    at wasm_invoke_iiiiii (wasm://wasm/00aba242:wasm-function[5611]:0xdda7f)
   at Microsoft.JSInterop.WebAssembly.WebAssemblyJSRuntime.InvokeUnmarshalled[Int32,RenderBatch,Object,Object](String identifier, Int32 arg0, RenderBatch arg1, Object arg2, Int64 targetInstanceId)
   at Microsoft.JSInterop.WebAssembly.WebAssemblyJSRuntime.InvokeUnmarshalled[Int32,RenderBatch,Object](String identifier, Int32 arg0, RenderBatch arg1)
   at Microsoft.AspNetCore.Components.WebAssembly.Rendering.WebAssemblyRenderer.UpdateDisplayAsync(RenderBatch& batch)
   at Microsoft.AspNetCore.Components.RenderTree.Renderer.ProcessRenderQueue()

Likely an issue with modal component?

Handling of sub/filtered catalogs should be improved

A lot of cleanup can be done with how sub-catalogs (aka filtered catalogs) are handled.

  • URL should be more like an rest api; /catalog/{zoneName} should be something like /catalog/zones/{zoneName}
    • Issue #71 is related: url should be something like /tags/{tagName}
  • Page header can be improved. "Work Zone Catalog" seems awkward. Maybe something like: "Catalog filtered by zone: Work". Probably better options than this.
  • There's a lot of code copy/pasted for these filtered catalogs; see ZoneCatalogMetadataViewer, ProjectCatalogMetadataViewer, CatalogMetadataViewer (and soon TagMetadataViewer). Figure out a way to unify this. Create single component with parameters for what's being filtered? Pass a linq function to component?

Project links in dataset catalog should link to a filtered catalog by just project

Currently, the dataset catalog shows a "path" where it's "{ZoneName} / {ProjectName}". The ZoneName links to a dataset catalog filtered by zone. The ProjectName links to a dataset catalog filtered by ZoneName AND ProjectName.

It would be more useful, and more intuitive, if the ProjectName links to a dataset catalog filtered ONLY by project.

To make this more intuitive, the formatting of the ZoneName and ProjectName as a "path" should be removed.

Provide context to input fields in the Editor

Currently, there is no context provided for the various input fields within the editor. Consider adding an icon that shows a tooltip or popup when hovered or clicked. This tooltip/popup should give a description of the field and, possibly, simple instructions.

Flesh out wiki

More instruction is needed for users of Midden. Should consider using the Wiki for:

  • Best practices of data organization - Use of data zones, projects, datasets in own directory
  • Catalog/Editor: Setup/installation - Supported platforms: Github Pages, Azure Static Web Sites, self hosting)
  • Catalog/Editor: Configuration - Customizing website
  • CLI: Installation and overview
  • CLI: Configuration
  • Example workflow: Use editor to create metadata, download, save to data store, use CLI to collate, update catalog.json, update Midden (if needed)

Error downloading icon-512.png

Midden gives the following error when loading any page:

Error while trying to use the following icon from the Manifest: https://meta.cafltar.org/icon-512.png (Download error or resource isn't a valid image)

Likely a residual reference to the removed image.

Create readme

Currently, there is no readme. One should:

  • Describe project, scope, license, etc.
  • Contribution guidelines (so optimistic!)
  • Explain current features; supported data stores, supported static website hosts
  • Provide some guidance on setup and use
  • Roadmap?

Include "LastUpdated" to metadata and display in Catalog

Some datasets are updated periodically such as timeseries data or drone flights. It would be useful for those browsing the metadata to know when datasets have been updated.

The crawlers could read file metadata and determine when files in the dataset folder were last updated.

One issue is that the catalog will only reflect information accurate to when the catalog itself was last updated. This could cause information to be misleading as data could be updated after the catalog was generated.

[CLI] Specify the data store in the path variable

It is reasonable that a data zone is not necessarily linked 1:1 to the technology of the data store. For example, "raw" data could be in azure data lake, google drive, or an FTP. Similarly, 'scratch' data could be in dropbox, drive, onedrive, etc. It's probably best practice to have a 1:1 relationship, but discussion should occur whether or not MIdden should enforce that.

Consider adding a prefix to the "datasetPath" variable. e.g. "GoogleWorkspaceSharedDrive//relative/path/to/dataset".

[Editor] Create a simplified view

In earlier version of Midden, the number of fields displayed/required was related to the data zone the metadata is for. Now that Midden allows custom data zones, it's difficult to assign required fields (unless this is customizable in the app-config.json file, but that would be a beast to deal with). Now all metadata fields are displayed at all times. This is overwhelming and goes against one of the tenants of Midden (which is basically, get some metadata, with low barrier, even if it's just a one sentence description).

As a (temporary?) fix, implement a "complex view" toggle that shows all fields. The default view just shows name, zone, project, description, contact, variables (which only includes name, description, units).

Fix repo oddities

Some goofiness leaked into the repo over time. Fix 'em:

  • Remove azure stat web app yml file (or rename?)
  • Reset catalog.json to correct "template" version instead of CAF version

Add favicon

Midden needs a killer favicon instead of the default

Consider data lineage

If a field like "derivedFrom" is added to the dataset metadata, then a lineage graph could be constructed

Create Dashboard page

A Dashboard page needs to be created. This will:

  • Provide summaries of the data catalog using figures and statistics (bar chart of datasets per zone, datasets per project, common tags, and so on)
  • Provide suggestions, or insights that guide suggestions; e.g. how many datasets do not have any tags?
  • [Likely much later] Metrics on use; What are most visited datasets?

Adopt specific metadata standard?

Currently, the .midden metadata schema have fields that were hand picked from various metadata standards (ISO 19115, Project Open Data) and chosen as a result of decisions made internally within USDA LTAR. The fields were named using a self-defined naming scheme. This decision was made to simplify the creation of metadata by the researchers and also for convenience of development.

Care was taken to ensure that the metadata were at least mostly compatible for export to ISO 19115, Project Open Data, and EPA guidelines. However, there are advantages of adopting a single standard instead of using a combination of them (and thus no standard). This should be discussed.

Allow uploading of data dictionaries to define variables

It's fairly common practice for data dictionaries are already defined within spreadsheets or as separate CSV files. We should allow users to upload data dictionaries instead of making them reproduce the work through the editor.

Functionality should allow various formats. At a minimum, should support the CAF default; csv file with FieldName, Description, Units columns.

Consider project-level metadata

Projects in Midden are essential to organization yet there is no metadata to specify project information (purpose, members, start date, end date, whatever). Consider a metadata file to specify project information.

This will allow an "overview" of projects within an organization and better inform potential datasets, grouped methods, results, maybe documents (manuscripts? related literature?)

This will require a separate metadata file extension? Or a top-level specification in a midden file on whether it's a dataset or project. Maybe a new aggregator?

Datasets catalog should support paging

Currently, the entire catalog.json file is loaded into memory and persists throughout the app lifecycle. This isn't sustainable for large catalogs.

At some point, optimization steps should be taken to ease the computational burden. Consider paging and/or streaming.

[CLI] Supports paths in Shared Drive crawler

Create a private function to return files objects instead of IDs for Drive that include parent info. Use instead of getFilesnames. Change getFilesnames to return the actual names instead of id

Make it easier to find project info when viewing a dataset

Currently, there are no easy ways to view project information while viewing dataset details. There is no link that goes to the project page (clicking the breadcrumbs goes to a dataset view for the specific zone and project).

Consider adding a link (with respective icons) for the zone and project that the dataset belongs to. Pretty much exactly how the dataset card shows it. Probably under the title.

Could also have a pop-up that displays project info?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.