oturns / geosnap Goto Github PK
View Code? Open in Web Editor NEWThe Geospatial Neighborhood Analysis Package
Home Page: https://oturns.github.io/geosnap-guide
License: BSD 3-Clause "New" or "Revised" License
The Geospatial Neighborhood Analysis Package
Home Page: https://oturns.github.io/geosnap-guide
License: BSD 3-Clause "New" or "Revised" License
I am trying to run the python program in the different directory, but it fails like below
http://173.255.192.133/~suhan/STARS/python/testsr.py
The same works in the directory Serge set up: http://173.255.192.133/~suhan/cgi-bin/testsr.py
It looks like I need to change the path the python in the first line of code. But I am not sure which path that I need to enter. Any thought?
I read below:
Some of these potential problems are:
The Python script is not marked as executable. When CGI scripts are not executable most web servers will let the user download it, instead of running it and sending the output to the user. For CGI scripts to run properly on Unix-like operating systems, the +x bit needs to be set. Using chmod a+x your_script.py may solve this problem.
On a Unix-like system, The line endings in the program file must be Unix style line endings. This is important because the web server checks the first line of the script (called shebang) and tries to run the program specified there. It gets easily confused by Windows line endings (Carriage Return & Line Feed, also called CRLF), so you have to convert the file to Unix line endings (only Line Feed, LF). This can be done automatically by uploading the file via FTP in text mode instead of binary mode, but the preferred way is just telling your editor to save the files with Unix line endings. Most editors support this.
Your web server must be able to read the file, and you need to make sure the permissions are correct. On unix-like systems, the server often runs as user and group www-data, so it might be worth a try to change the file ownership, or making the file world readable by using chmod a+r your_script.py.
The web server must know that the file you’re trying to access is a CGI script. Check the configuration of your web server, as it may be configured to expect a specific file extension for CGI scripts.
On Unix-like systems, the path to the interpreter in the shebang (#!/usr/bin/env python) must be correct. This line calls /usr/bin/env to find Python, but it will fail if there is no /usr/bin/env, or if Python is not in the web server’s path. If you know where your Python is installed, you can also use that full path. The commands whereis python and type -p python could help you find where it is installed. Once you know the path, you can change the shebang accordingly: #!/usr/bin/python.
The file must not contain a BOM (Byte Order Mark). The BOM is meant for determining the byte order of UTF-16 and UTF-32 encodings, but some editors write this also into UTF-8 files. The BOM interferes with the shebang line, so be sure to tell your editor not to write the BOM.
If the web server is using mod_python, mod_python may be having problems. mod_python is able to handle CGI scripts by itself, but it can also be a source of issues.
it works on my user directory, so one other thing to check would be to see if eli can add a cgi-bin directory, put a script in there and test it.
as we continue to abstract osnap into cases beyond the US, it might be useful to provide an API that gives access to international data sources like GADM.
I'm not sure how good the indices are for each country (e.g. FIPS codes aren't included as an attribute for the USA counties on there) so it might cause headaches when users have to attach their own census data, but it's something to consider
others?
OSNAP web interface will need to allow for account generation for users via the web, and isolation of user created data.
This is related to #40.
We need to keep the environment.yml updated for dependencies so devs have a unified env.
ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
pyarrow or fastparquet is required for parquet support
If the install needs more than conda, we should document that as well.
per @weikang9009's suggestion, it would be good to include a mapping of counties to MSAs so that metro-scale analyses can use faster table filters instead of clip operations
it would be really useful to have a .tsplot()
method on the Dataset
class that arranges plots for each available time period on the same figure
we need to ensure the integrity of the shape data included in the package to make sure the spatial weights matrices are built correctly
The data.Dataset.data namespace is a bad one (for which I'm responsible). I propose renaming the Dataset class to Community
, and restructuring it for clarity
A community is a collection of neighborhoods with different properties describing its various boundaries (tracts, counties, states, over several time periods), and compositional attributes, such as data from surveys, or sensors, or geocoded misc.
goals:
something like
from osnap import Community
# attribs are pd.DataFrames
Community.attrib.census
Community.attrib.osm
Community.attrib.misc
# boundaries are gpd.GeoDataFrames
Community.boundaries.units # the currently-set primitive units like geopandas.geometry
Community.boundaries.zipcodes
Community.boundaries.counties
Community.boundaries.tracts_1990
Community.boundaries.tracts_2010
I'm thinkning this would simplify class instantiation and the source
argument would move to a method (e.g. Community().from_ltdb(...)
. I'm also imagining methods to collapse to a single wide-form geodataframe and instantiate from one
I'm not particularly wedded to any of the names specifically, but soliciting input on whether folks think this would be a useful direction
Documentation should include:
osnap/data/msas.parquet.zip
The following code, taken from the example notebook, failed to run on my machine 😞:
import osnap
import libpysal as lps
import geopandas as gpd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
sns.set_context('notebook')
va = gpd.read_file(lps.examples.get_path('virginia.shp'))
virginia = osnap.data.Dataset(name='Virginia', source='ltdb', boundary=va)
This code raises:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-3-dbd75e222aaa> in <module>
----> 1 virginia = osnap.data.Dataset(name='Virginia', source='ltdb', boundary=va)
~/Desktop/osnap/osnap/data/data.py in __init__(self, name, source, states, counties, add_indices, boundary, **kwargs)
391 if source == "ltdb":
392 _df = pd.read_parquet(
--> 393 os.path.join(_package_directory, "ltdb.parquet.gzip"))
394 elif source == "ncdb":
395 _df = pd.read_parquet(
~/miniconda3/envs/osnap/lib/python3.7/site-packages/pandas/io/parquet.py in read_parquet(path, engine, columns, **kwargs)
286
287 impl = get_engine(engine)
--> 288 return impl.read(path, columns=columns, **kwargs)
~/miniconda3/envs/osnap/lib/python3.7/site-packages/pandas/io/parquet.py in read(self, path, columns, **kwargs)
129 kwargs['use_pandas_metadata'] = True
130 result = self.api.parquet.read_table(path, columns=columns,
--> 131 **kwargs).to_pandas()
132 if should_close:
133 try:
~/miniconda3/envs/osnap/lib/python3.7/site-packages/pyarrow/parquet.py in read_table(source, columns, use_threads, metadata, use_pandas_metadata, nthreads)
1072 return fs.read_parquet(source, columns=columns,
1073 use_threads=use_threads, metadata=metadata,
-> 1074 use_pandas_metadata=use_pandas_metadata)
1075
1076 pf = ParquetFile(source, metadata=metadata)
~/miniconda3/envs/osnap/lib/python3.7/site-packages/pyarrow/filesystem.py in read_parquet(self, path, columns, metadata, schema, use_threads, nthreads, use_pandas_metadata)
180 use_threads = _deprecate_nthreads(use_threads, nthreads)
181 dataset = ParquetDataset(path, schema=schema, metadata=metadata,
--> 182 filesystem=self)
183 return dataset.read(columns=columns, use_threads=use_threads,
184 use_pandas_metadata=use_pandas_metadata)
~/miniconda3/envs/osnap/lib/python3.7/site-packages/pyarrow/parquet.py in __init__(self, path_or_paths, filesystem, schema, metadata, split_row_groups, validate_schema, filters, metadata_nthreads)
858 self.common_metadata_path,
859 self.metadata_path) = _make_manifest(
--> 860 path_or_paths, self.fs, metadata_nthreads=metadata_nthreads)
861
862 if self.common_metadata_path is not None:
~/miniconda3/envs/osnap/lib/python3.7/site-packages/pyarrow/parquet.py in _make_manifest(path_or_paths, fs, pathsep, metadata_nthreads)
1033 if not fs.isfile(path):
1034 raise IOError('Passed non-file path: {0}'
-> 1035 .format(path))
1036 piece = ParquetDatasetPiece(path)
1037 pieces.append(piece)
OSError: Passed non-file path: /Users/alex/Desktop/osnap/osnap/data/ltdb.parquet.gzip
A quick filesystem check indicates that this is because the file the code looks for is not among those populated during the download process:
ipdb> path
'/Users/alex/Desktop/osnap/osnap/data/ltdb.parquet.gzip'
ipdb> import os
ipdb> os.listdir('/Users/alex/Desktop/osnap/osnap/data/')
['cenpy_fetch.py', '__init__.py', '__pycache__', 'README.md', '.gitignore', 'counties.parquet.gzip', 'tracts.parquet.gzip', 'variables.csv', 'msas.parquet.gzip', 'data.py', 'states.parquet.gzip']
I think that you have been working on it. But it would be really nice to add descriptions of all parameters for Kmeans in the near future. I was looking at the page below:
https://github.com/knaaptime/osnap/blob/master/examples/02_creating_community_datasets.ipynb
In the example above,
I was just thinking that it would be nice to add something to show the data used on the map such as
sacramento.census[sacramento.cesus.year==1990]['median_household_income']
after the plot below
sacramento.plot(column='median_household_income', year=1990)
We need to add tests for the following
- [x] Community plotting plotting directly is a lot easier with the new structure since geoms are attached, but should we still try and implement top-level plotting?
from_lodes
constructorfrom_geodataframes
constructorfrom_census
constructorthe old decennial census downloaders need to be rewritten to support the new and improved cenpy API. We also need to add downloaders for ACS (which will be easier now)
Trying to install on the ucr cluster I'm hitting:
Processing dependencies for osnap==0.1.0
error: networkx 2.2 is installed but networkx<2.0.0 is required by {'region'}
Will investigate. Wanted to flag it here in case other had hit this?
one of the performance bottlenecks on the web interface is that we're using the old CGI interface which is unsustainable for 2 reasons:
it is a script based interface, so each time a script is called it spawns a new python interpreter (which means programs cant really be dynamic)
it is slow
instead, we should consider a modern web framework like django
Currently there are no instructions for NCDB.
I think it is better to mention that "we suggest to use Anaconda3 for osnap since it has been fully tested in Anaconda3".
Currently the instruction says, "The recommended method for installing OSNAP is with anaconda." But it does not specify either 2 or 3.
When I tried to install osnpa using yml file you provide in Anaconda2, I had errors and could not resolve errors by myself.
Just wanted to remind you that please add census attributes osnap.data.Community in doc_build\html\api.html when you make changes next time
During handling of the above exception, another exception occurred:
ModuleNotFoundError Traceback (most recent call last)
in
----> 1 import osnap
2 import libpysal as lps
3 import geopandas as gpd
4 import matplotlib.pyplot as plt
5 import seaborn as sns
~/Dropbox/o/osnap/osnap/init.py in
2
3 from . import analytics
----> 4 from . import data
5 from .data import metros
~/Dropbox/o/osnap/osnap/data/init.py in
----> 1 from .data import Dataset, metros, read_ltdb, read_ncdb
~/Dropbox/o/osnap/osnap/data/data.py in
13 quilt.install("spatialucr/census")
14 quilt.install("spatialucr/census_cartographic")
---> 15 from quilt.data.knaaptime import census
16 import matplotlib.pyplot as plt
17 import pandas as pd
ModuleNotFoundError: No module named 'quilt.data.knaaptime'
since we're not rounding the ltdb data any more, some of the cluster assignments have changed causing tests to fail
current behavior selects every combination of (2-digit)states
+ (3-digit)counties
. It should also allow lists of 5-digit state-county fips for finer-grain selection
as described here
queries on the web-based interface are slow, so it would be useful to add a function that loads all the necessary geoms into memory so they dont need to be created at each instantiation of Community
Not sure why the relative path of LTDB full count file is different from the sample file?
For instance, the path to the full count 1980 is "LTDB_Std_1980_fullcount.csv":
fullcount80 = _ltdb_reader(
fullcount_zip, "LTDB_Std_1980_fullcount.csv", year=1980)
while the path for the sample file lives in another folder "ltdb_std_all_sample/ltdb_std_1980_sample.csv":
sample80 = _ltdb_reader(
sample_zip,
"ltdb_std_all_sample/ltdb_std_1980_sample.csv",
dropcols=["pop80sf3", "pop80sf4", "hu80sp", "ohu80sp"],
year=1980,
)
It makes more sense to adopt either form to keep consistency.
It would be helpful to document the original source for the cbsa definitions as these can change over time and we may need to cross-reference different time slices.
For example, we don't have parquet files for the following:
It makes sense that ltdb is not covering PR areas, but the AZ, OH and 11640 are less clear why they are not in the data.
I am trying to run the example python code to test if it is possible to use osnap in the Web-enabled python.
Here is the code (osnap_example.py) that I am testing:
///////////////////////////////////////////////////////////////////////////////////////////////
#!/home/suhan/public_html/cgi-bin/anaconda3/envs/osnap/bin/python
import cgitb
cgitb.enable()
import platform
import numpy
pltform = platform.platform()
print("Content-type: text/html\n\n")
print("\n")
print("<div style="width: 100%; font-size: 40px; font-weight: bold; text-align: center;">")
print("Python Script Test Page")
print("CGI Works!")
print("
\n\nPlatform: %s
"%pltform)import osnap
//////////////////////////////////////////////////////////////////////////////////////////////////////
I tried to run this code both in Terminal (image on the left) and on the Web (image on the right).
The code that run in the Terminal did not give me any error in terms of importing OSNAP (the left mage). But when I run the same code on the Web (http://173.255.192.133/~suhan/cgi-bin/osnap/osnap_example.py), it gives me an PermissionError: [Errno 13] , so OSANP modules cannot be used.
I think they are related to the file permission issues, it seems like I need to change permission level of some files. But I am not sure what to change exactly. It seems to be related to something like " rwxrwxrwx " Is anyone familiar with the permission issues in Linux?
we could use fuzzywuzzy or similar to add a named-place based api similar to what cenpy and osmnx do
cenpy.base.Connection('2000sf3')
and cenpy.base.Connection('DecennialSF31990')
, not 2000sf1
or DecennialSF11990
concept
in the census API.currently, this means building up analogue columns by querying into the pandas dataframe, connection.variables
:
import cenpy
c2000sf3 = cenpy.base.Connection('2000sf3')
has_ancestry = c2000sf3.variables['concept'].str.lower().apply(lambda x: 'ancestry' in x)
c2000sf3.variables[has_ancestry]['concept']
So, if we can get the column names that correspond, then:
c2000sf3.query(columns=columns, geo_unit='block group',
geo_filter=dict(state='06',county='073', tract='*'))
should work.
the package is currently too big to be hosted on PyPI (must be <60mb) so we should probably move all spatial data to the spatialucr quilt account
I might be missing sth, but based on this line:inflate_cols = ["mhmval", "mrent", "hinc"]
, I think only these three variables are adjusted for inflation while the other economic variables like per capita income incpc
are not?
df = df.round(0)
Can we make this optional? Rounding to integers will lead to a loss of information and sometimes users might need the info for the analysis.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.