northdecoder / nasamining Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jonroberts/nasamining

0.0 0.0 0.0 70.08 MB

Space Apps NYC project to mine NASA datasets to better find interesting connections between data.

License: MIT License

Python 50.65% HTML 33.56% JavaScript 15.79%

keyword-extraction nasa nlp

nasamining's People

Contributors

Watchers

nasamining's Issues

in extract.py change input_json to path_to_input_json

Refactor to make the code more readable.

Flask app: client search query returning 404

When I click on the client search button this message is shown in the python debug console:

GET /getCoOccuringKWsGraph?q=galaxy&threshold=-0.9 HTTP/1.1" 404 -

Does a route need to be provided in the app?

Fix spelling error in inspecting-the-json-structure.md

Change

FROM:
Inspecing the structure of nasa.json
TO:
Inspecting the structure of nasa.json

Change 'master' branch name to 'stable'

To be in line with current trends.

Change from ngram_keywords to description_ngram_np

I get the name now.
description_ngram_np describes exactly where the ngrams came from, and that the ngrams are noun phrases!
Revert back to the original name.

Schema depenedent mongoose conflicts with external json prescription

The intention of the mongoose library is to create a schema for interacting with the data. However, the data is already defined externally by some other program and the need is to just interact with what is already there. This is in conflict with the way mongoose works.

Maybe adding the mongodb library will work more like the mongojs that was replaced a few commits ago.

Upgrade extract.py for python3

change print to include ( )
add a requirements.txt file for required python packages for the project
add installation notes for usage of the requirements.txt; ie pip install -r requirements.txt

Create installation instructions

how to:

clone the project
set up environment variables and secrets
open firewall ports and services

Add module authenticate_to_mongo

Several scripts authenticate to the same database.
Write one script as a module to do the authentication, then include that module in each script as necessary.

Make naming and path of rootCA.pem in installation.md match what the code requires

In the installation.md file.

.env file must reference rootCA.pem like this

FILENAME_SSLCA="rootCA.pem"

directory ~/nasaMining/mongoWork/secrets should contain files:

nada.md rootCA.pem

directory ~/nasaMining/frontEnd/secrets should contain an exact copy of the same files as above:

nada.md rootCA.pem

Both .env files must be exact copies of each other.

directory ~/nasaMining/mongoWork should contain the file:

.env and many others

directory ~/nasaMining/frontEnd should contain the file:

.env and several other files and directories

Upgrade mongodb package from =3.6.10 to ~4.3.x

TODO

uri-encode connection string with package mongodb-connection-string-url, which depends on whatwg-url ?

Add helper class to keyword.py

Update pythonpath to find the authenticate module

This file
frontEnd/flask/spacetag/controllers/controller1.py
needs to find the imported authenticate module via the
PYTHONPATH. (I guess?)

Figure out where to update the path accordingly.

Change reference from 'ngram_keywords' to 'description_ngram_np'

In file /docs/inspecting-the-json-structure.md change

FROM: ngram_keywords
TO: description_ngram_np

Reasoning
The default usage example in script extract.py is
--field description_ngram_np
which creates a key by that name in the json,
therefore that is was should be inspected.

Server code docs.js failing at find

In the application browser enter a keyword into the search field and then click the search button.
In the developer console see the URI generated

GET http://23.239.4.62:3000/getCoOccuringKWsFlat?q=any_keyword_here

see the error message

HTTP/1.1 500 Internal Server Error 128ms

on the server log see

indexPath:  /home/myusernamehere/nasaMining/frontEnd/public/index.html
TypeError: Cannot read property 'find' of undefined
    at exports.getCoOccuringKWsFlat (/home/myusernamehere/nasaMining/frontEnd/routes/docs.js:261:21)

The server code is looking for something that does not exist around docs.js line 261...

Maybe related to mongoose library issue #9.

Create presentations directory

add presentations/2015 folder, then move *.pptx and *.key files into that folder.

Dry out insert agency ngram kwds

There are several files in /mongowork that are identical except for the path to the data to be loaded
For example:

insert_energy_ngram_kwds.py
insert_defense_ngram_kwds.py
insert_nsf_ngram_kwds.py
insert_statedept_ngram_kwds.py
insert_commerce_ngram_kwds.py
insert_epa_ngram_kwds.py
insert_usda_ngram_kwds.py
insert_state_ngram_kwds.py

Would it be DRYer to have one script that loads each of the paths to the data from a file.

Add tests for buildDB.py

Seems like a good place to start adding tests.

augmented help text a04fd84
type of input json loaded which is a list a04fd84
type of input json loaded which is a dictionary 2f67d21
successful write to the database a04fd84

Add absolute path in BuildDB.py to access data files

For portability and testing the data files should be referenced
by the absolute full path, not the shortened relative path.
Addressed by commit 31dd4a0
This issue also affects the authenticate_to_mongo module.
Addressed by commit ff7ab8e

Do not reference venv in files

Some files have a line near the top like:

#!./venv/bin/python

It is probably not necessary to reference venv

Fix data path help class in extract.py

FROM

          python3 extract.py --input data/nasa.json \\
                             --source data.nasa.gov/data.json \\
                             --output data/nasa_keywords.json \\
                             --field ngram_keywords \\
                             --passes 5 \\
                             --threshold 10 \\

          python3 extract.py --input ../data/nasa.json \\
                             --source data.nasa.gov/data.json \\
                             --output ../data/nasa_keywords.json \\
                             --field ngram_keywords \\
                             --passes 5 \\
                             --threshold 10 \\

where the ../ is added to the data path

To match the field in the extract.py example,
in docs/inspecting-the-json-structure.md change

FROM
new_keywords

TO
ngram_keywords

In extract.py, following the existing message " Tokenizing descriptions", add warning

logging.warning( "Depending upon processing power, tokenizing may take up to nine minutes to complete!")

In extract.py change the default --field to ngram_keywords

Add helpful comments to buildDB.py

During the build the user needs some feedback in the write loop that the code is still working.

Add a dot for every thousand records loaded and report the total at the end.
Comment on the path of the .env file used by load_dotenv, because this is a different path-to-file than is used by the javascript. ie near .env comment add 'see the help document at ../docs/installation.md#add-environment-variables-file-1'

Change method name from 'db_jsonfromnasa' to 'db_json_from_agency'

The data could be from any agency not necessarily just NASA.

In file ~/mongoWork/authenticate_to_mongo.py
Change method name from
FROM
def db_jsonfromnasa
TO
def db_json_from_agency

Any files that call db_jsonfromnasa need to be updated as well.

egrep -lir --include=*.py "(db_jsonfromnasa)" .
# it looks like just this one file
./mongoWork/buildDB.py

This issue needs to be done before issue #33.

Refactor buildDB.py

add authenticate_to_mongo module to buildDB.py
add test to handle the possibility of overwriting an existing collection
add logger
add helper class

Missing favicon.ico

Start the server in directory nasaMining/frontEnd
with node --trace-warnings server.js
Then open the app in a browser with the supplied URL.

See this message in the server log.
Error: ENOENT: no such file or directory, stat '/home/northdecoder/nasaMining/frontEnd/public/favicon.ico'

The favicon is indeed missing. It needs to be created and uploaded.
Consider putting it in the directory public/img

Resource wordnet not found

Ran

python3 keyword_similarity.py

LookupError:

Resource wordnet not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('wordnet')

For more information see: https://www.nltk.org/data.html

Attempted to load corpora/wordnet

Searched in:
- '/home/northdecoder/nltk_data'
- '/usr/nltk_data'
- '/usr/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'

Remove the bak directory

It looks like the bak directory has remnants of a first proposal that may no longer be needed. Previous tagged versions in this project have this directory if we need to recover it.

In pair_freq.py change the order of looking for dataset

Since the data is currently in json list format, look for that first, therefor the TypeError message will be eliminated.

python3 pair_freq.py \
            --input ../data/nasa_keywords.json \
            --field description_ngram_np \
            --output ../data/nasa_np_strengths.json
INFO:root:TypeError:
INFO:root:list indices must be integers or slices, not str
INFO:root:Hint: keyword name `dataset` is not available in the input,
INFO:root:attempting to access the list directly.
INFO:root:Saving 258055 records into a json array to file ../data/nasa_np_strengths.json

Add UML directory in docs

Need some diagrams to clarify workflow.

Remove previous mongoDB authentication from all *.py

✔️ Issue #34 must be done first!

Find all *.py that have the previous mongoDB authentication
and then perform the following:

ADD

import authenticate_to_mongo #local module

DELETE

client = MongoClient('proximus.modulusmongo.net:27017')
client.tepO9seb.authenticate('nasahack', 'hacking4nasa')

CHANGE

FROM:
```
db = client.tepO9seb
```

db = authenticate_to_mongo.db_json_from_agency()

nodejs: server.js needs upgrading to current syntax

in file /nasaMining/frontEnd/server.js
- nodejs app: express deprecated res.sendfile: Use res.sendFile instead
in file /nasaMining/routes/docs.js
- assign database credentials to current database
- refactor mongojs login
in file .gitignore
- add the .env file
in file /nasaMining/frontEnd/public/index.html
- change http to https

add argument parsing to nasa_kw_pair_freq.py

parse arguments for:

infile
outfile
kw_field

add:

helper class

How to acquire the data?

In frontEnd/routes/docs.js I found the link
https://data.nasa.gov/data.json
which crashes my browser when I try it there, because it does not return in a reasonable time.

Maybe this link will provide similar information
https://data.nasa.gov/Software/NASA-open-source-code-projects-with-A-I-generated-/3efg-u4v8

actual data:
https://data.nas.nasa.gov/openinnovation/download_data.php?file=/openinnovationdata/catalog.json

Reorder installation instruction, promote buildDB.py

In order to run the webpage server the instruction node server.js the mongoDB database must be populated with the preprocessed data.

In file docs/installation.md move the instruction to run buildDB.py before the node instruction.

Pickle file artifacts under version control

Is it mandatory that the pickle files be under git version control?

Why are the pickle files created?
Which code creates pickle files?

Which code uses the pickle files?

# to start find all python files with pkl in the text
find ~/nasaMining -name "*.py" -print0 | xargs -0 grep "pkl" | less

Add arguments parsing in buildDB.py

To make script more flexible.
add arguments for:

change:

buildDB_help to buildDB_augmented_help which is similar to help in extract.py
detect whether input is dictionary or a list and process accordingly

Add command line argument processing for starting the server

The logger and other runtime parameters will be controlled by the command line arguments something like:

node server.js --loglevel=info --debuglevel=1 --runlevel=[production | development]
# OR
node server.js --help

Reference:

consider docopt:

nodejs app: search query returning 404

in the client when I press the search button

I see that the nodejs server logs error message:
Error: ENOENT: no such file or directory, stat '/home/runner/FlusteredCalculatingInstructions/nasaMining/frontEnd/public/getCoOccuringKWsGraph'

the route getCoOccuringKWsGraph does not exist on the server!

In file nasaMining/frontEnd/public/index.html

Change FROM:


        $.ajax({
            url: 'getCoOccuringKWsGraph',

Change TO:

        $.ajax({
            url: 'getCoOccuringKWs',

nodejs app: connection to database is failing

MongoServerSelectionError: connection to nnn.nnn.nnn.nnn:12345 closed at Timeout._onTimeout

Need to add the ssl certificate to the call.

In researching the syntax found that mongojs is unmaintained, probably need to convert code to depend on mongoose

northdecoder / nasamining Goto Github PK

nasamining's People

Contributors

Watchers

nasamining's Issues

Reference:

Recommend Projects

Recommend Topics

Recommend Org

Jobs