northdecoder / nasamining Goto Github PK
View Code? Open in Web Editor NEWThis project forked from jonroberts/nasamining
Space Apps NYC project to mine NASA datasets to better find interesting connections between data.
License: MIT License
This project forked from jonroberts/nasamining
Space Apps NYC project to mine NASA datasets to better find interesting connections between data.
License: MIT License
Refactor to make the code more readable.
When I click on the client search button this message is shown in the python debug console:
GET /getCoOccuringKWsGraph?q=galaxy&threshold=-0.9 HTTP/1.1" 404 -
Does a route need to be provided in the app?
Change
FROM:
Inspecing the structure of nasa.json
TO:
Inspecting the structure of nasa.json
To be in line with current trends.
I get the name now.
description_ngram_np
describes exactly where the ngrams came from, and that the ngrams are noun phrases!
Revert back to the original name.
The intention of the mongoose
library is to create a schema for interacting with the data. However, the data is already defined externally by some other program and the need is to just interact with what is already there. This is in conflict with the way mongoose works.
Maybe adding the mongodb
library will work more like the mongojs
that was replaced a few commits ago.
pip install -r requirements.txt
how to:
Several scripts authenticate to the same database.
Write one script as a module to do the authentication, then include that module in each script as necessary.
In the installation.md
file.
.env
file must reference rootCA.pem
like this
FILENAME_SSLCA="rootCA.pem"
directory ~/nasaMining/mongoWork/secrets
should contain files:
nada.md
rootCA.pem
directory ~/nasaMining/frontEnd/secrets
should contain an exact copy of the same files as above:
nada.md
rootCA.pem
Both .env files must be exact copies of each other.
directory ~/nasaMining/mongoWork
should contain the file:
.env
and many othersdirectory ~/nasaMining/frontEnd
should contain the file:
.env
and several other files and directoriesTODO
mongodb-connection-string-url
, which depends on whatwg-url
?This file
frontEnd/flask/spacetag/controllers/controller1.py
needs to find the imported authenticate module via the
PYTHONPATH. (I guess?)
Figure out where to update the path accordingly.
In file /docs/inspecting-the-json-structure.md
change
FROM: ngram_keywords
TO: description_ngram_np
Reasoning
The default usage example in script extract.py
is
--field description_ngram_np
which creates a key by that name in the json,
therefore that is was should be inspected.
In the application browser enter a keyword into the search field and then click the search button.
In the developer console see the URI generated
GET http://23.239.4.62:3000/getCoOccuringKWsFlat?q=any_keyword_here
see the error message
HTTP/1.1 500 Internal Server Error 128ms
on the server log see
indexPath: /home/myusernamehere/nasaMining/frontEnd/public/index.html
TypeError: Cannot read property 'find' of undefined
at exports.getCoOccuringKWsFlat (/home/myusernamehere/nasaMining/frontEnd/routes/docs.js:261:21)
The server code is looking for something that does not exist around docs.js
line 261...
Maybe related to mongoose library issue #9.
add presentations/2015 folder, then move *.pptx and *.key files into that folder.
There are several files in /mongowork
that are identical except for the path to the data to be loaded
For example:
Would it be DRYer to have one script that loads each of the paths to the data from a file.
Some files have a line near the top like:
#!./venv/bin/python
It is probably not necessary to reference venv
FROM
python3 extract.py --input data/nasa.json \\
--source data.nasa.gov/data.json \\
--output data/nasa_keywords.json \\
--field ngram_keywords \\
--passes 5 \\
--threshold 10 \\
python3 extract.py --input ../data/nasa.json \\
--source data.nasa.gov/data.json \\
--output ../data/nasa_keywords.json \\
--field ngram_keywords \\
--passes 5 \\
--threshold 10 \\
where the ../
is added to the data path
To match the field in the extract.py example,
in docs/inspecting-the-json-structure.md
change
FROM
new_keywords
ngram_keywords
logging.warning( "Depending upon processing power, tokenizing may take up to nine minutes to complete!")
ngram_keywords
During the build the user needs some feedback in the write loop that the code is still working.
The data could be from any agency not necessarily just NASA.
In file ~/mongoWork/authenticate_to_mongo.py
Change method name from
FROM
def db_jsonfromnasa
TO
def db_json_from_agency
Any files that call db_jsonfromnasa
need to be updated as well.
egrep -lir --include=*.py "(db_jsonfromnasa)" .
# it looks like just this one file
./mongoWork/buildDB.py
This issue needs to be done before issue #33.
authenticate_to_mongo
module to buildDB.pyStart the server in directory nasaMining/frontEnd
with node --trace-warnings server.js
Then open the app in a browser with the supplied URL.
See this message in the server log.
Error: ENOENT: no such file or directory, stat '/home/northdecoder/nasaMining/frontEnd/public/favicon.ico'
The favicon is indeed missing. It needs to be created and uploaded.
Consider putting it in the directory public/img
Ran
python3 keyword_similarity.py
LookupError:
Resource wordnet not found.
Please use the NLTK Downloader to obtain the resource:
import nltk
nltk.download('wordnet')
For more information see: https://www.nltk.org/data.html
Attempted to load corpora/wordnet
Searched in:
- '/home/northdecoder/nltk_data'
- '/usr/nltk_data'
- '/usr/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
It looks like the bak
directory has remnants of a first proposal that may no longer be needed. Previous tagged versions in this project have this directory if we need to recover it.
Since the data is currently in json list format, look for that first, therefor the TypeError message will be eliminated.
python3 pair_freq.py \
--input ../data/nasa_keywords.json \
--field description_ngram_np \
--output ../data/nasa_np_strengths.json
INFO:root:TypeError:
INFO:root:list indices must be integers or slices, not str
INFO:root:Hint: keyword name `dataset` is not available in the input,
INFO:root:attempting to access the list directly.
INFO:root:Saving 258055 records into a json array to file ../data/nasa_np_strengths.json
Need some diagrams to clarify workflow.
โ๏ธ Issue #34 must be done first!
Find all *.py
that have the previous mongoDB authentication
and then perform the following:
ADD
import authenticate_to_mongo #local module
DELETE
client = MongoClient('proximus.modulusmongo.net:27017')
client.tepO9seb.authenticate('nasahack', 'hacking4nasa')
CHANGE
db = client.tepO9seb
db = authenticate_to_mongo.db_json_from_agency()
in file /nasaMining/frontEnd/server.js
in file /nasaMining/routes/docs.js
in file .gitignore
in file /nasaMining/frontEnd/public/index.html
parse arguments for:
add:
In frontEnd/routes/docs.js I found the link
https://data.nasa.gov/data.json
which crashes my browser when I try it there, because it does not return in a reasonable time.
Maybe this link will provide similar information
https://data.nasa.gov/Software/NASA-open-source-code-projects-with-A-I-generated-/3efg-u4v8
actual data:
https://data.nas.nasa.gov/openinnovation/download_data.php?file=/openinnovationdata/catalog.json
In order to run the webpage server the instruction node server.js
the mongoDB database must be populated with the preprocessed data.
In file docs/installation.md move the instruction to run buildDB.py
before the node instruction.
Is it mandatory that the pickle files be under git version control?
# to start find all python files with pkl in the text
find ~/nasaMining -name "*.py" -print0 | xargs -0 grep "pkl" | less
To make script more flexible.
add arguments for:
change:
buildDB_help to buildDB_augmented_help which is similar to help in extract.py
detect whether input is dictionary or a list and process accordingly
The logger and other runtime parameters will be controlled by the command line arguments something like:
node server.js --loglevel=info --debuglevel=1 --runlevel=[production | development]
# OR
node server.js --help
consider docopt:
in the client when I press the search button
I see that the nodejs server logs error message:
Error: ENOENT: no such file or directory, stat '/home/runner/FlusteredCalculatingInstructions/nasaMining/frontEnd/public/getCoOccuringKWsGraph'
the route getCoOccuringKWsGraph does not exist on the server!
In file nasaMining/frontEnd/public/index.html
Change FROM:
$.ajax({
url: 'getCoOccuringKWsGraph',
Change TO:
$.ajax({
url: 'getCoOccuringKWs',
MongoServerSelectionError: connection to nnn.nnn.nnn.nnn:12345 closed at Timeout._onTimeout
Need to add the ssl certificate to the call.
In researching the syntax found that mongojs
is unmaintained, probably need to convert code to depend on mongoose
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.