Predict deferment on synthetic oil-well data.
#og.xlsx is the original data file containing the synthetic oil-well data.
#og.py is the script that cleans the original excel file (og.xlsx).
During the data preprocessing phase we:
- Performed exploratory data analysis.
- Searched for Na's or missing data.
- Renamed columns to have no spaces to make querying easier.
- Dropped the 3 unnecessary / insignificant columns (Name ,Type, Casing B Pressure (no data))
- Sliced the dataframe from where the 'Volume' production began (ogclean.iloc[54425:,]) so data starts from when production started. (change this once we finally pick it*******)
- Replaced all negative values for 'FlowlinePressure' to 0 (this meant the sensors were either off or it was 0).
- Converted all the data to float format for statistical analysis.
- Performed statistical analysis to better understand our data.
- Plotted the data to visualize the data over time.
- Checked the correlation between our variables.
- Saved our cleaned dataframe to the excel file 'ogclean.xlsx'.
- Categorized our data as either a HUM,DEF,or NORM.
- Recategorized our data to DEF or NOT DEF to be our Y response for our Nueral.
- Implemented a time lag for our DEF or NOT DEF predictor because our Nueral needs to predict 1 day in advance.