mlbelobraydi / txrrc_data_harvest Goto Github PK
View Code? Open in Web Editor NEWScript for accessing and organizing oil and gas well data from the Texas Railroad Commission
License: The Unlicense
Script for accessing and organizing oil and gas well data from the Texas Railroad Commission
License: The Unlicense
Describe the bug
TXRCC is no longer using an FTP and documentation and code is out of date.
To Reproduce
Attempting to download or connect to any FTP file
Expected behavior
Connections to work
Additional context
It might be good to have a config file that points to the file locations that can universally change any code cascading from that point.
Polars is a python dataframe library which is faster and better for very large data than pandas. Would be awesome to get native support for this project in Pandas!
formats, layout, main need to be adjusted to work with bytes.
Since the definitions have been complete, now a notebook needs to be created to test the process to turn the .ebc file to usable data that can be formatted to JSON or SQL tables. This task is to create a prototype of that process in a notebook.
An initial notebook and working file have been created for the oil and gas layouts. These need to be vetted and tested to ensure the data grabbed from their respective files are getting good data.
Doing this may take at least a bit of refactoring into a normal Python module structure, renaming some files, etc. Overall, it would be an upgrade to this project though!
Thanks for making and maintaining this wonderful project!
Is your feature request related to a problem? Please describe.
The production data has several fields that are comp-3
Describe the solution you'd like
The positions in bytes are less than the ending number of digits. The pic_signed function does not account for this and an additional function needs to be created
Describe alternatives you've considered
Tried to find a way to modify pic_signed and it isn't possible and will require a new function for comp-3
Additional context
Information on Comp-3 can be found here
http://www.3480-3590-data-conversion.com/article-packed-fields.html
Request for how to setup to start helping with development. This should be in the wiki and have links to things like Anaconda, Github, and basic Python resources.
Identify the packages and methods that allow the conversion of the raw file (e.g. ftp://ftpe.rrc.texas.gov/shfwba/dbf900.ebc.gz) to be read in python for modification to other formats or manipulation.
Nice work so far; I was thinking it might be helpful if you were to provide some documentation detailing the order in which the scripts should be ran in, as well as an overview of what each script does (outside of the comments in each notebook). This would make it easier for folks to pick it up and run with it. Looking forward to digging in and see what all this is capable of.
Organization of Gas Production Layout
Is your feature request related to a problem? Please describe.
I wanted to generate python structs for the Cobol copybooks. And for the computational numeric fields emit for specific fields the hex for the signed/unsigned. I am part way there, but I wanted to bring this to your attention to see if you think this would be good. This way, no one would need to hand code parsing of the structures.
Describe the solution you'd like
I would like to be able to use the copybook in a full cobol program, parse the data division, and generate struct formats so that each section can be parsed directly in python without hand coding the parsing lengths as seems to be the direction now. I am working on the Oil Ledger files now with the copybook defined in the Oil Ledger PDF
Describe alternatives you've considered
I considered writing a parser of the copybook myself, but a Cobol84.g4 grammer file exists for ANTLR4, so I can just use that and generate a Listener that I can use to walk the symbol table and generate the struct formats.
Additional context
I am adding unit tests to make sure the code works as I tweak it.
I would like to integrate this into your repo and contribute to that.
My main interest is in parsing as much Oil/Gas well data out so I can continue my machine learning project which will look for aberations in production data for wells over time.
Capturing the oil production layout
Creating file that has the definition of all 28 sections.
@skylerbast, I'll be pushing the bytes version of the code and I'm not sure if the values are signed or unsigned. It is now capturing the last digits to the value, but I'm not sure the below section is working correctly.
If the penultimate nibble == 0xD, then the number is negative. Otherwise,
it is either positive or unsigned.
val = (val * (-1 if signed_raw[-1] >> 4 == 0xD else 1)) / 10**decimal
Would it be possible to chat with you on how this is suppose to work?
Is your feature request related to a problem? Please describe.
No
Describe the solution you'd like
Currently the script opens and decodes the entire file in memory. This can cause issues for systems with limited memory (<8GBram). It may be good to read parts of the file and dump from memory as necessary to keep memory more free.
Describe alternatives you've considered
opening and reading by line
decoding as necessary
writing results to disk and not holding it in memory
Additional context
Any changes will need to be tested with limited memory.
Is your feature request related to a problem? Please describe.
Dates are not always formatted with correct numbers for good datetime conversion. The original data needs to be preserved to get out any good information in the entry for manual correction
Describe the solution you'd like
Preserve original value along with the datetime conversion
Describe alternatives you've considered
If we keep the nulls, the original data needs to be also added back in. Is it possible to parse and correct out of range months and days with a flag column to see actual vs estimated date.
Additional context
This is important for completions tracking. A month and/or year is better than nothing. DSTs need be linked to the right open section.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.