Comments (7)
Good start on the formatting. Using the definitions of each section to create the correctly formatted dictionary results. Everything captured in the following notebook.
https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/Notebooks/Testing%20data%20definitions.ipynb
from txrrc_data_harvest.
QAQC of the formatting definitions in progress. Need to ensure the fields are being split and formatted correctly before moving on to dataframes and SQL.
from txrrc_data_harvest.
Created .py files that work in conjunction to be able to test the formatting. https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/WorkingFileForTesting.py
Same data parsing can be found in the notebook: https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/Notebooks/Testing%20data%20definitions.ipynb
from txrrc_data_harvest.
Starting to map out the dependencies of unique keys in the different sections. Tracking most changes in https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/dbf900_layouts.py and will need to move the results the the .txt in the definition file and update the definitions in the jupyter notebook.
from txrrc_data_harvest.
Sections 1, 4, 5, 7, 12, 13, 23, 24, 25, 26, 27 have passed QAQC. Sections like 24 will need to be formatted into json and added to previous record in 23. Section 22 has a known byte error.
Section 2, 3, 6, 8, 9, 10, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22, 28 need QAQC
Sections 2, 14, 21, 22 will need an additional subroutine to decode the 'WB-OIL-GAS-INFO' field into the appropriate oil or gas components
from txrrc_data_harvest.
This is still ongoing, but the bytes rewrite is taking priority to capture the full decimal digits for lat-long and coordinates in section 13.
from txrrc_data_harvest.
https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/WorkingFileForTesting.py now working with:
https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/dbf900_main_bytes.py
https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/dbf900_layouts_bytes.py
https://github.com/mlbelobraydi/TXRRC_data_harvest/blob/master/dbf900_formats_bytes.py
WorkingFileForTesting.py also captures the unique keys and is placing the values in the appropriate dataframes
All sections are being read into dataframes and output to csv files. Format testing is still ongoing so QAQC of the layouts and formats needs to be completed.
from txrrc_data_harvest.
Related Issues (18)
- Identify the codec for the IBM mainframe files that can be read in Python HOT 2
- Create definition libraries for all 28 sections in the dbf900.ebc.gz file HOT 7
- supporting systems with limited memory. HOT 2
- Longitude not being recognized as negative HOT 4
- Change the dbf900_formats section to work with bytes instead of string to keep lat-long accuracy. HOT 2
- Comp-3 function HOT 4
- Preserve original entry of date infomation along with conversion
- Oil Production Layout HOT 3
- bad add: Comp-3 Function HOT 1
- Request for how to setup to start helping with development. This should be in the wiki and have links to things like Anaconda, Github, and basic Python resources. HOT 5
- Gas Production Layout HOT 2
- Testing file to read data from gas ebcdic file to pandas dataframes HOT 2
- Workflow documentation? HOT 1
- TXRRC file location changes HOT 2
- Request: Add support for working with Polars (in addition to Pandas) HOT 2
- Request: Add support for installing via `pip` by publishing to PyPi HOT 1
- python struct format generated from Cobol copybooks at RRC
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from txrrc_data_harvest.