shyentist / fish-r-man Goto Github PK

A bundle of analytics tools for fisheries scientists

License: GNU Affero General Public License v3.0

R 81.80% CSS 8.26% TeX 9.93%

dashboard fisheries fisheries-management fisheries-research fishing-effort global-fishing-watch globalfishingwatch marine-biology open-source r r-package shiny

fish-r-man's Introduction

Hi there 👋

My name is Pasquale Buonomo, but people mess it up all the time, so you can either try your luck or call me Pico. I am a human, I have many interests, and I dislike being labeled after just one of them at a time. For the sake of introduction, I prefer describing myself as a marine biologist turned programmer.

I am...

💻 Proficient in: R, HTML, CSS, JavaScript, TypeScript, SQL.
🎣 Maintaining fishRman, an R package for fisheries scientists.
📚 Looking to collaborate on scientific research on fisheries, conservation, ecology, and much more.
🐲 A D&D, Pathfinder, and DURF player and Master. Let's talk!
📫 Reachable via: E-mail ([email protected]) | Reddit | Linkedin

fish-r-man's People

Contributors

Stargazers

Watchers

Forkers

paplomatasp neilo99 frfusch21

fish-r-man's Issues

Convert button does not enable Visualize button

Most probably the geometry column is called "geometry" instead of "geom" or the other way around. On it.

List of ISO codes linked within the UI, where the filter for vessel nationality is

Linking a list of ISO Alpha-3 codes like this one would considerably improve the user experience, who will not have to google the list themselves to use this feature of the app.

The best way to do this is probably hyperlinking from the Query tab sidebar, where 'Flag' is. Something along the line of 'Flag (list of ISO Alpha-3 codes)

Additions to Handbook

Several additions are needed in order to make the dataset more clear to a beginner.

Explain the meaning of each column
Clarify how the filters' names match the columns' names in the Query tab (i.e. Latitude == cell_ll_lat)
Explain why some columns are only present in one of the tables
For more details on the origins of the dataset, link to the work of Kroodsma et al. 2018
Show how fishRman can be used to study what the Chinese fleet was doing in Ecuador last Summer
A less ambiguous use of the terms 'fishing effort' might be needed. At a quick glance, a more neutral 'AIS data' might be better.

Create a second Tab

The second tab must be browsable independently of the user's actual usage of the first tab (the tabs are not consequential as in "Only load the second tab when the first has been completed"). Basically the same as GitHub does in "Code-Issues-Pull requests-Actions-Projects..." as you can see right under the repo's name or in the image below (fancy styling is optional).

The first Tab must be named Query, the second Analysis. First Tab is the default Tab when opening the app.

Table with the most common spatial calculations already performed

The table would be just beneath the plot, and would show stats like (for the 100th of degree table)

Area covered by fishing
Area covered by 90% (maybe customizable? both?) of fishing activity
Area covered by vessel transit
Area covered by 90% (maybe customizable? both?) of vessel transit
Area covered by all MMSI
Area covered by 90% (maybe customizable? both?) of MMSI present

For reference, something similar to the Indicators 5, 6, and 7, described in: Appendix XIII, Table 1 “Definition of environmental indicators to measure the effects of fisheries on the marine ecosystem” of Commission Decision (93/2010/EU)

Limit how much data can be loaded onto the app, either from query or from file

Too many times the online app crashes because users try to download the entire dataset or a significant portion of it, causing the server to shut down the connection for lack of memory. This spoils the experience of users in good faith, and it is on me to avoid that.

maxRequestSize for the files uploaded and SELECT COUNT (*) to check the query before running it will most likely be the solution to this issue.

Testing

The app is now ready for some testing. I am trying to use the package shinytest, and I am encountering some issues due to the connection to the database. The rest of the app seems to be test-able via snapshot-based tests, so I will probably write the manual steps for the query, the results of which would have to be checked against the csv files I will upload. From that point onward, the steps will be automated with the snapshot-based method.

Hopefully.

First documentation

Obvious need of good, basic documentation, explaining in layman terms (the project is meant to be used by non-programmers) how to query the data, and what the data can be used for. Description of columns will most likely be from GFW's website, with proper reference (important!), while the instructions on how to query the data must be original content.

Move from the screenshot based testing to Unit testing

Now that much of the code has been made modular, it can be unit tested. This should make the later implementation of automated testing easier.

Populate second tab ('Analysis')

This will be the core of the project and where I expect the largest growth in the long run. Here, several kinds of analysis will be proposed to the users, who can check the ones that suit their interests. Before working on this issue, please contact me directly to work out how to best direct everyone's efforts in accordance to the analytic methods of their knowledge.

Preferred analyses are the ones that can be achieved through the use of Global Fishing Watch data as uploaded by the users and little more. All data must be publicly available, for instance:

R packages with geographic areas, measure unit conversions, and such.
Environmental, economic, historical data that can be queried according to the uploaded data (i.e. catch data in the same time span as the uploaded data)

Add the possibility to also plot EEZ, 24 Nautical Miles zones and 12 NM Zones

The data is available at marineregions

This would help to understand the context of the research and where exactly the data point is when close to such borders.

Hyperlinks should open a new Tab

More than once I have lost all the progress (not much to lose, but still) clicking on the hyperlinks, so I would prefer for them to open in a new Tab.

This can be done with target="_blank" and rel="noreferrer noopener" attributes to the <a> elements.

Deploying fishRman as an "offline" solution

To solve the server active hours cap of shinyapps and BigQuery's traffic limit, I thought I might deploy the app as an executable, standalone version. If one comments out the json references, it should automatically show the user a "Choose which Google Account to use" page, and everything should work as usual.

I found this guide that I will use as soon as I have time.

Add a banner to the first page

Following the second review for JOSS, one useful feedback is to add a banner to the first page, since fishRman will likely be the “first contact” for many fisheries researchers, who learn about it from their colleagues who have shared the shinyapps link (not the repo with the documentation).

Something like "This is the fishRman app to help you explore and analyze Global Fishing Watch Data. Learn more about Global Fishing Watch and the data available for querying with these links"

Preferably, clicking on "these links" should open a pop-up as the ones used for Warnings, with a list of links.

Allow the user's authentication for BigQuery

Instead of hard coding DB authentication using the author's billing account, it would be best to allow the users to log in to their own Google Cloud accounts. This will allow more monthly traffic, since every user will use their own billing plan. Also kind of buggy, since this requires to launch the code in 2 separate events, one for the connection, with manual authentication, and one for the app itself.

Detail state of the field in JOSS paper

Following the second review for JOSS, one useful feedback is to detail what the current situation is for people who would like to use GFW data. Currently, the only official guide for use seems to be this GFW webpage, specifically for an R proficient target.

Implement spatial analysis

Working with latitude and longitude, of course there is the need for spatial analysis. I am already working on it with the possibility to assess the cumulative distribution of fishing effort, hopefully at the percentage chosen by the users via a slider (or other input).

Update paper according to review feedback

Following the feedback from the first review, the paper must be updated. The feedback is also quoted below.

"In the paper, I think the statement of need and the state of the field could be more detailed; are there citations and descriptions for who is using the data, how they are using it, what issues they are encountering, and how fisheRman helps? Who is the desired user, what do they have to know, how will this integrate into their workflow?"

Add the possibility to only analyze data that is within a certain area

This can be done starting from spatial data, either loaded or converted, and "clipping" (st_contains and st_crop should do the job) against another file that the user must load.

Update to new data available (GFW updated their data)

https://globalfishingwatch.org/data/new-fishing-data-paves-the-way-for-improved-analysis/

GFW has released 2017-2020 data, with more categories. The possible inputs have to be adjusted accordingly.

Beautify the ui

For now, the ui is just an ensemble of fields to interact with. A proper design for the page is needed. To be discussed. Not urgent.

write tests

Write a test for each exported function of the package

New and improved documentation

Before uploading fishRman with its new UI and functionalities, new, complete documentation is needed. One thing that is certain is that the odt file has to go away, in favour of something that can have proper version control. For now, I am thinking about an Rmarkdown file, but maybe an html would be more flexible in case of expansion to a full knowledge base.

All opinions, feedback, and advices are welcome.

ID column not found when clipping data to most GeoPackages

When clicking the checkbox Use only data contained in the area?, the code looks for an 'ID' column to drop in order to only have the 'geom' column present. This was based on the wrong assumption that the column 'ID' was created by the st_read() function and not something that was already present in the geopackage.

I am already working on solving this, probably resorting to a less ambiguous selection of the only column I want rather than dropping the columns I do not want.

Add a "?" symbol beside MMSI input box to let users know about SQL features

A message should appear when "?" is clicked or hovered, saying something like

"Type 123 to search for MMSI number 123, exactly.
Type %123 to search for MMSIs ending with 123.
Type 123% to search for MMSIs starting with 123.
Type %123% to search for MMSIs containing 123."

Scripting SQL Query constructor

According to the input from the user, the script must build a working SQL query. I am already working on it.

Allow the user to analyze the data they just queried without first downloading and uploading it

As of now, users query in the 'Query' tab, download the data, upload it onto the 'Analysis' tab, and then they'll be able to analyze the data. I can't manage to let the users just use the data they just queried for their analysis, no down-/up- -load needed, since the data is already there.

This would make the UX much faster and enjoyable.

Add a new tab to the app for documentation

As of now, the Handbook is only found in the repo, and only by looking for the pdf among the other files. This is definitely not user-friendly, nor the best user experience. The Handbook should be in a 'Documentation' tab alongside 'Query' and 'Analysis'. The new tab should include a sidebar with a list of documents that can be read, and a main area where the document is displayed.

Apart from the Handbook, even the Open-Source licence and README file could be listed, for transparency and to possibly increase involvement and participation.

Background slideshow

Instead of having a static background, it would be pleasant to have a slideshow of pictures relating to fisheries. I am already working on it, and I am having issues with the browser's compatibility. It seems that the CSS's animation and @keyframes method I am using is not supported by Firefox, IE and some others, but I am looking for a work-around.

If anyone knows a better way to do this, please contact me.

Breaking larger functions into smaller ones

At the start of this project, it made sense (to me) to have functions that controlled many "objects" in the application. Now that the code has grown, it has become redundant, and the quality of the code is clearly a mixture of experimenting in Shiny and understanding Shiny.

There are very large functions that assign outputs, update inputs, and so on and so forth. A good degree of code keeping must be performed before this issue can be considered closed, although there is no objective way to determine a target.

New tests to account for what has changed

Not only redo the ones already in place, but also add one to check whether the EEZ, Contiguous zones, and National seas are consistent.

Update documentation

New paragraphs should be added, explaining the new functionalities such as "crop an area from an uploaded gpkg layer", "cumulative distribution", "download cropped data", and the non-spatial data and summaries being derived from spatial data.

Host Handbook and JOSS paper elsewhere

The Handbook and the JOSS paper, in their doc and pdf forms, are not in line with CRAN submission requisites. They should be hosted elsewhere and linked to in the package.

An option would be a different GitHub repo, to track issues and updates but, to really get the most out of github, they will have to be (in time) produced via LaTeX, HTML, Markdown, or other trackable format.

The LaTeX files leading to the JOSS paper will also be deleted, since they are no longer needed. Still, they are in the commit history if ever needed for reference.

Enhance descriptive analysis

For now, the summary by date is per day, so I was thinking of adding a summary per month and per year. I will work on it in the week to come.

Too much "buffer" when plotting data spanning over a very small area

This is because I simply set the limits of the plot to be the minimum x and y -1 and the maximum x and y +1. The result is that for areas that are 1 degree x 1 degree, or similarly small, the plot is more buffer than data. I will fix it making the buffer 5% or 10% of (maximum x - minimum x) and (maximum y - minimum y).

Make assign() more CRAN-friendly

Using assign() (to bind variables dynamically as the user inputs them) results in the Note: "no visible binding for global variable" when using devtools::check(). I need to find a way to do this so that it is 'CRAN-compliant'.

Insert a download button for the queried data

The button should let the user download:

A csv file containing the queried table.
A txt file containing metadata such as the SQL query used for retrieving the data, the date of download, an appropriate reference to the GlobalFishingWatch data, an appropriate reference to the FishRMan project.

Cumulative distribution at X%

To be able to know/show which areas of the sea are interested the most by fishing (i.e. only the top 90% of fishing activity) would be a huge boost to fishRman's usefulness.

I think the best place in the ui for the input would be on the right side of the plot, where the resolution input is located.

Ignore fields where inputs are invalid

When deleting the default input for numericRangeInput() or dateRangeInput(), and then running the query via Filter button, the app crashes because it never checks for the validity of the input. It seems that the input accepts only numbers, anyway, and that silly queries like ('vessel_hours' > 1 AND 'vessel_hours' < -1) are still accepted, just with no results (of course). The only way I could find for an accepted value to crash the app is just NA and NULL values.

I am solving this with a check on if (!is.na(input$field) && !is.null(input$field)) rather than nothing or if (field %in% input$filter_columns_ui.

Improve commentary to the code

A clearer commentary may help programmers who want to join the project understand how the code works.

Prevent attempt to visualize empty dataframes

Convert button is activated at the end of a query and/or at the upload of a csv file with the right column names, but it never checks whether the dataframe to convert is empty. A gpkg file with no entries could also be uploaded. This leads to an Error when trying to visualize the (missing) data via the Visualize button.

Add, at some point of the workflow, a check for the spatial data to have at least one row.

Add a loading screen

Now, when the users attempt a query that is not instantaneous, they have to trust R is working, and will likely press buttons or reload multiple times. There is the need for a loading screen that prevents any action from the user until the last action has been completed.

Preferably, the loading screen would show an estimate of the time remaining to complete a task, or a percentage.

Querying more than once requires change of Tab

To query more than once, one must visit the Analysis tab after each click of the Filter button.

This happens, probably, because eventReactive() does not function unless its result is observed or requested by another function. I am on it, and will fix this ASAP.

Add info about the time series used in the plot

Having that information somewhere in the plot reduces ambiguity after the png file is downloaded. I have some options in mind:

A title like "GFW data from [start-date] to [end-date]"
A caption under the plot saying "GFW data from [table name] where [SQL query filters as shown in the Query tab]. This would be the less ambiguous way, but also less user-friendly, since it would look something like "GFW data from global-fishing-watch.gfw_public_data.fishing_effort_byvessel_v2 where date >= '2012-01-01' AND date < '2012-01-02' AND cell_ll_lat >= 40 AND cell_ll_lat < 46 AND cell_ll_lon >= 10 AND cell_ll_lon < 16"

I will think about it. If anyone reads this issue, and has a better solution, please comment below.