GithubHelp home page GithubHelp logo

northdecoder / nasamining Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jonroberts/nasamining

0.0 0.0 0.0 70.08 MB

Space Apps NYC project to mine NASA datasets to better find interesting connections between data.

License: MIT License

Python 50.65% HTML 33.56% JavaScript 15.79%
keyword-extraction nasa nlp

nasamining's People

Contributors

jonroberts avatar l0qii avatar mattl920 avatar northdecoder avatar

Watchers

 avatar

nasamining's Issues

Flask app: client search query returning 404

When I click on the client search button this message is shown in the python debug console:

GET /getCoOccuringKWsGraph?q=galaxy&threshold=-0.9 HTTP/1.1" 404 -

Does a route need to be provided in the app?

Schema depenedent mongoose conflicts with external json prescription

The intention of the mongoose library is to create a schema for interacting with the data. However, the data is already defined externally by some other program and the need is to just interact with what is already there. This is in conflict with the way mongoose works.

Maybe adding the mongodb library will work more like the mongojs that was replaced a few commits ago.

Upgrade extract.py for python3

  • change print to include ( )
  • add a requirements.txt file for required python packages for the project
  • add installation notes for usage of the requirements.txt; ie pip install -r requirements.txt

Add module authenticate_to_mongo

Several scripts authenticate to the same database.
Write one script as a module to do the authentication, then include that module in each script as necessary.

Make naming and path of rootCA.pem in installation.md match what the code requires

In the installation.md file.

.env file must reference rootCA.pem like this

  • FILENAME_SSLCA="rootCA.pem"

directory ~/nasaMining/mongoWork/secrets should contain files:

  • nada.md rootCA.pem

directory ~/nasaMining/frontEnd/secrets should contain an exact copy of the same files as above:

  • nada.md rootCA.pem

Both .env files must be exact copies of each other.

directory ~/nasaMining/mongoWork should contain the file:

  • .env and many others

directory ~/nasaMining/frontEnd should contain the file:

  • .env and several other files and directories

Change reference from 'ngram_keywords' to 'description_ngram_np'

In file /docs/inspecting-the-json-structure.md change

FROM: ngram_keywords
TO: description_ngram_np

Reasoning
The default usage example in script extract.py is
--field description_ngram_np
which creates a key by that name in the json,
therefore that is was should be inspected.

Server code docs.js failing at find

In the application browser enter a keyword into the search field and then click the search button.
In the developer console see the URI generated

GET http://23.239.4.62:3000/getCoOccuringKWsFlat?q=any_keyword_here

see the error message

HTTP/1.1 500 Internal Server Error 128ms

on the server log see

indexPath:  /home/myusernamehere/nasaMining/frontEnd/public/index.html
TypeError: Cannot read property 'find' of undefined
    at exports.getCoOccuringKWsFlat (/home/myusernamehere/nasaMining/frontEnd/routes/docs.js:261:21)

The server code is looking for something that does not exist around docs.js line 261...

Maybe related to mongoose library issue #9.

Dry out insert agency ngram kwds

There are several files in /mongowork that are identical except for the path to the data to be loaded
For example:

  • insert_energy_ngram_kwds.py
  • insert_defense_ngram_kwds.py
  • insert_nsf_ngram_kwds.py
  • insert_statedept_ngram_kwds.py
  • insert_commerce_ngram_kwds.py
  • insert_epa_ngram_kwds.py
  • insert_usda_ngram_kwds.py
  • insert_state_ngram_kwds.py

Would it be DRYer to have one script that loads each of the paths to the data from a file.

Fix data path help class in extract.py

FROM

          python3 extract.py --input data/nasa.json \\
                             --source data.nasa.gov/data.json \\
                             --output data/nasa_keywords.json \\
                             --field ngram_keywords \\
                             --passes 5 \\
                             --threshold 10 \\
  • TO
          python3 extract.py --input ../data/nasa.json \\
                             --source data.nasa.gov/data.json \\
                             --output ../data/nasa_keywords.json \\
                             --field ngram_keywords \\
                             --passes 5 \\
                             --threshold 10 \\

where the ../ is added to the data path


To match the field in the extract.py example,
in docs/inspecting-the-json-structure.md change

FROM
new_keywords

  • TO
    ngram_keywords

  • In extract.py, following the existing message " Tokenizing descriptions", add warning
logging.warning( "Depending upon processing power, tokenizing may take up to nine minutes to complete!")

  • In extract.py change the default --field to ngram_keywords

Add helpful comments to buildDB.py

During the build the user needs some feedback in the write loop that the code is still working.

  • Add a dot for every thousand records loaded and report the total at the end.
  • Comment on the path of the .env file used by load_dotenv, because this is a different path-to-file than is used by the javascript. ie near .env comment add 'see the help document at ../docs/installation.md#add-environment-variables-file-1'

Change method name from 'db_jsonfromnasa' to 'db_json_from_agency'

The data could be from any agency not necessarily just NASA.

In file ~/mongoWork/authenticate_to_mongo.py
Change method name from
FROM
def db_jsonfromnasa
TO
def db_json_from_agency

Any files that call db_jsonfromnasa need to be updated as well.

egrep -lir --include=*.py "(db_jsonfromnasa)" .
# it looks like just this one file
./mongoWork/buildDB.py

This issue needs to be done before issue #33.

Refactor buildDB.py

  • add authenticate_to_mongo module to buildDB.py
  • add test to handle the possibility of overwriting an existing collection
  • add logger
  • add helper class

Missing favicon.ico

Start the server in directory nasaMining/frontEnd
with node --trace-warnings server.js
Then open the app in a browser with the supplied URL.

See this message in the server log.
Error: ENOENT: no such file or directory, stat '/home/northdecoder/nasaMining/frontEnd/public/favicon.ico'

The favicon is indeed missing. It needs to be created and uploaded.
Consider putting it in the directory public/img

Resource wordnet not found

Ran

python3 keyword_similarity.py

LookupError:


Resource wordnet not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('wordnet')

For more information see: https://www.nltk.org/data.html

Attempted to load corpora/wordnet

Searched in:
- '/home/northdecoder/nltk_data'
- '/usr/nltk_data'
- '/usr/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'


Remove the bak directory

It looks like the bak directory has remnants of a first proposal that may no longer be needed. Previous tagged versions in this project have this directory if we need to recover it.

In pair_freq.py change the order of looking for dataset

Since the data is currently in json list format, look for that first, therefor the TypeError message will be eliminated.

python3 pair_freq.py \
            --input ../data/nasa_keywords.json \
            --field description_ngram_np \
            --output ../data/nasa_np_strengths.json
INFO:root:TypeError:
INFO:root:list indices must be integers or slices, not str
INFO:root:Hint: keyword name `dataset` is not available in the input,
INFO:root:attempting to access the list directly.
INFO:root:Saving 258055 records into a json array to file ../data/nasa_np_strengths.json

Remove previous mongoDB authentication from all *.py

โœ”๏ธ Issue #34 must be done first!

Find all *.py that have the previous mongoDB authentication
and then perform the following:

ADD

import authenticate_to_mongo #local module

DELETE

client = MongoClient('proximus.modulusmongo.net:27017')
client.tepO9seb.authenticate('nasahack', 'hacking4nasa')

CHANGE

  • FROM:
    db = client.tepO9seb
  • TO
    db = authenticate_to_mongo.db_json_from_agency()

nodejs: server.js needs upgrading to current syntax

  • in file /nasaMining/frontEnd/server.js

    • nodejs app: express deprecated res.sendfile: Use res.sendFile instead
  • in file /nasaMining/routes/docs.js

    • assign database credentials to current database
    • refactor mongojs login
  • in file .gitignore

    • add the .env file
  • in file /nasaMining/frontEnd/public/index.html

    • change http to https

Reorder installation instruction, promote buildDB.py

In order to run the webpage server the instruction node server.js the mongoDB database must be populated with the preprocessed data.

In file docs/installation.md move the instruction to run buildDB.py before the node instruction.

Pickle file artifacts under version control

Is it mandatory that the pickle files be under git version control?

  • Why are the pickle files created?
  • Which code creates pickle files?
  • Which code uses the pickle files?
    # to start find all python files with pkl in the text
    find ~/nasaMining -name "*.py" -print0 | xargs -0 grep "pkl" | less

Add arguments parsing in buildDB.py

To make script more flexible.
add arguments for:

  • input file path
  • augmented_help
  • environment, ie production or development
  • forcedrop; yes OR no
  • collection_name to build or drop

change:

  • buildDB_help to buildDB_augmented_help which is similar to help in extract.py

  • detect whether input is dictionary or a list and process accordingly

nodejs app: search query returning 404

in the client when I press the search button

I see that the nodejs server logs error message:
Error: ENOENT: no such file or directory, stat '/home/runner/FlusteredCalculatingInstructions/nasaMining/frontEnd/public/getCoOccuringKWsGraph'

the route getCoOccuringKWsGraph does not exist on the server!

In file nasaMining/frontEnd/public/index.html

Change FROM:


        $.ajax({
            url: 'getCoOccuringKWsGraph',

Change TO:

        $.ajax({
            url: 'getCoOccuringKWs',

nodejs app: connection to database is failing

MongoServerSelectionError: connection to nnn.nnn.nnn.nnn:12345 closed at Timeout._onTimeout

Need to add the ssl certificate to the call.

In researching the syntax found that mongojs is unmaintained, probably need to convert code to depend on mongoose

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.