sandialabs / pecos Goto Github PK

View Code? Open in Web Editor NEW

66.0 17.0 47.0 18.68 MB

Python package for performance monitoring of time series data

License: Other

Python 96.23% HTML 2.63% TeX 1.15%

scr-2046

pecos's People

Contributors

Stargazers

Watchers

pecos's Issues

doc: link to pvlib install page

Insert a link in
https://github.com/sandialabs/pecos/blob/master/documentation/installation.rst

http://pvlib-python.readthedocs.io/en/latest/installation.html

Analysis with timezone aware dataframes

If you are working with timezone aware dataframes, there is an issue retaining the timezone across the entire pecos analysis chain. The timezone is not stored properly in the testing results (pm.test_results['State Date'] and pm.test_results['End Date']). It looks like this is a known pandas bug. In pecos, this can lead to a mismatch in analysis that use the test results. I'm working on a fix and should have that committed soon, along with some stronger tests for analysis that use timezones.

`check_delta` function - working example

Hello @kaklise ,
I am trying to understand the intent and actual functioning of the check_delta function in the monitoring module. Here is an example that I put together:

import pandas as pd
from pecos import monitoring

# Constants
UPPER_LIMIT_RANGE_FILTER = 1.2
LOWER_LIMIT_RANGE_FILTER = 0.5

# Create a random data frame
df = pd.DataFrame(data=[["2015-01-01 00:00:00", -0.76, 2, 2, 1.2],
                        ["2015-01-01 01:00:00", -0.73, 2, 4, 1.1],
                        ["2015-01-01 02:00:00", -0.71, 2, 4, 1.1],
                        ["2015-01-01 03:00:00", -0.68, 2, 32, 1.1],
                        ["2015-01-01 04:00:00", -0.65, 2, 2, 1.0]],
                  columns=['DateTime', 'column1', 'column2', 'column3', 'column4'])

# Set the index
df["DateTime"] = pd.to_datetime(df["DateTime"])
df.set_index("DateTime", inplace=True)

# Run delta flag
flags = pd.DataFrame()
for station in df.columns:
    rating = 2.5
    dead_jump_flag_check = monitoring.check_delta(
        data=df[[station]],
        bound=[LOWER_LIMIT_RANGE_FILTER * rating,
               UPPER_LIMIT_RANGE_FILTER * rating],
        key=None,
        window=3600,
        direction=None,
        min_failures=1
    )
    if flags.empty:
        flags = dead_jump_flag_check['mask']
    else:
        flags = pd.merge(left=flags, right=dead_jump_flag_check['mask'], on="DateTime")
print(flags)

My interpretation of the existing functionality here doesn't line up with what I am seeing in the example. The function should be flagging stuck data or abrupt changes. In the description of the function, the difference between max and min values within a rolling window is used for flagging it.

In the above example, I am using window=3600 and min_failures=1, so even if one timestamp fails the condition, the flag should be triggered. I would appreciate if you can explain the following:

What are the max and min values (delta) within a rolling window when the rolling window is 1 hour in hourly data ? Are the bound values even used in this calculation ?
Looking at the example data set which is a dummy data set, looking at column 2 and column 3, I would think both of them should trigger the flag, however the result does not make sense to me. Here is the result:

                     column1  column2  column3  column4
DateTime                                               
2015-01-01 00:00:00     True     True     True     True
2015-01-01 01:00:00    False    False    False    False
2015-01-01 02:00:00    False    False    False    False
2015-01-01 03:00:00    False    False    False    False
2015-01-01 04:00:00    False    False    False    False

Why is row 1 flagged ?
Why are column 2 and column 3 not being flagged despite having stuck values and values that jump around ?

Would appreciate some clarity..

Best
Uday

check_timestamp with irregular time stamps

Some data logging systems record data at irregular time stamps. It would be helpful to me if check_timestamp could handle this case. I'm not sure what the best way to implement it would be. Maybe something like this?

In [36]: index = pd.DatetimeIndex([pd.Timestamp('20161017 00:05:00'), pd.Timestamp('20161017 02:03:00'), pd.Timestamp('20161017 02:50:00')])

In [37]: s = pd.Series([0, 2, 3], index=index)

In [43]: s.resample('3600s').count() == 0
Out[43]:
2016-10-17 00:00:00    False
2016-10-17 01:00:00     True
2016-10-17 02:00:00    False
Freq: 3600S, dtype: bool

Support Python3

Create ci test for python 3.4 and 3.5

jinja2 and pandas requirements

Two requirements issues:

pecos needs jinja2 but it's not included in the 'install_requires' list.
Is the pandas >= 0.18 requirement actually necessary for pecos? I have a mess of code that's tied to 0.17. I ran the pecos test suite using pandas 0.17 and it didn't error, so I'm wondering if there are any other known issues.

Thanks!

dashboard: use folium for mapping sites

See page example at:

https://pypi.python.org/pypi/folium / section: Vincent/Vega Markers
http://python-visualization.github.io/folium/quickstart.html#vincent-vega-markers
http://bl.ocks.org/wrobstory/5609803

matplotlib doesn't support jpg on macosx

The example files all use jpg output files, but there is apparently a bug in matplotlib handling of jpg on macosx. Can I change output files to png to make the examples run on every platform? Is this something Travis CI should catch if there were plots in the test files?

Unable to install pecos via pip

I am unable to install picos via the pip command. Is this a known problem regarding the ReadMe file? Can anyone offer any advice specifically via the pip command?

Pandas 2.0: Dataframe.append() deprecated

The PerformanceMonitoring class defines _append_test_results() to Append QC results to the PerformanceMonitoring object. This function is called throughout the class relies on pd.Dataframe.append() which is now deprecated.

Contract reference value as benchmark

I just read the presentation at
https://pvpmc.sandia.gov/download/5289/

Questions:

for what type of system is pecos currently developed?
in commercial settings, a contractual reference value, e.g. guaranteed PR, guaranteed yield factor, guaranteed availability are used as benchmark to check if systems perform as expected and O&M contractor complies with obligations. Are you planning to add such functionality?

As example, we could generate expected values based on a TMY input and a simple system modelled with pvlib. Then, the performance is validated against that value.

examples: jinja2.exceptions.TemplateNotFound: monitoring_report.html

Example misses some templates:

(pecos_trial) C:\Users\%USER%\AppData\Local\Continuum\Anaconda3\envs\pecos_trial\pecos\examples\pv>python pv_example.py
Reading Campbell Scientific CSV file Baseline6kW_2015_11_11.dat
INFO:pecos.io:Reading Campbell Scientific CSV file Baseline6kW_2015_11_11.dat
Reading Campbell Scientific CSV file MET_2015_11_11.dat
INFO:pecos.io:Reading Campbell Scientific CSV file MET_2015_11_11.dat
Check timestamp

[...]

INFO:pecos.io:Writing test results csv file Results\Baseline_System_test_results.csv
Writing HTML report
INFO:pecos.io:Writing HTML report
Traceback (most recent call last):
  File "pv_example.py", line 128, in <module>
    'Baseline System, Performance Monitoring Report', config)
  File "C:\Users\%USER%\AppData\Local\Continuum\Anaconda3\envs\pecos_trial\lib\site-packages\pecos-0.1.2-py3.5.egg\pecos\io.py", line 240, in write_monitoring_report
    html_string = _html_template_monitoring_report(content, title, logo, encode)
  File "C:\Users\%USER%\AppData\Local\Continuum\Anaconda3\envs\pecos_trial\lib\site-packages\pecos-0.1.2-py3.5.egg\pecos\io.py", line 337, in _html_template_monitoring_report
    template = env.get_template('monitoring_report.html')
  File "C:\Users\%USER%\AppData\Local\Continuum\Anaconda3\envs\pecos_trial\lib\site-packages\jinja2\environment.py", line 812, in get_template
    return self._load_template(name, self.make_globals(globals))
  File "C:\Users\%USER%\AppData\Local\Continuum\Anaconda3\envs\pecos_trial\lib\site-packages\jinja2\environment.py", line 774, in _load_template
    cache_key = self.loader.get_source(self, name)[1]
  File "C:\Users\%USER%\AppData\Local\Continuum\Anaconda3\envs\pecos_trial\lib\site-packages\jinja2\loaders.py", line 235, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: monitoring_report.html

(pecos_trial) C:\Users\%USER%\AppData\Local\Continuum\Anaconda3\envs\pecos_trial\pecos\examples\pv>

get_clock_time() method

Hey!
I've been setting my time filter without using this method, because it tends to be pretty slow (especially for large data). For the simple example timefilter the timeit on my system is:
%timeit 298ms per loop

My work around is to set my time filter using the datetime library and df.index.time. I could do a pull-request illustrating this in an example, if you are interested. My timeit for the simple example timefilter is:
%timeit 6.55ms per loop

Volker

Drop system name from PerformanceMonitoring objects

I'm working on changes that will drop 'system name' from PerformanceMonitoring objects. The user can currently supply a system name when adding a DataFrame or a translation dictionary to the analysis. This has created some unnecessary complexity and assumptions about column names in the code. If the user wants to add DataFrames from multiple systems where 1) repeated column names are a concern or 2) they just want to keep a record of the system, I suggest appending the system name to column names before adding it to the PerformanceMonitoring object. System name will also be dropped from the test results table.

check_timestamp failed due to deprecation of 'is_monotonic'

'is_monotonic' is deprecated since Pandas 1.5.0

Divide by zero error in check_timestamp

When trying to apply the check_timestamp function to high frequency data (i.e. 50kHz which equals 0.00002s) the function produces a zero division error. I have uploaded a sample of data so you can try and reproduce the error if you need. Let me know if there is any way I can help de-bug this issue. Thanks!

50khz_data.csv

Test

Add instructions on conda environment in the docs

As pointed out in #20, documentation should include instructions on how to setup pecos in a conda environment.

User jinja2 for HTML templates instead of manually creating them

Current use of strings to build HTML is fragile and redundant in some cases. Use of existing mature robust HTML python templates will make management development and collaboration easier.

I recommend jinja2
http://jinja.pocoo.org/
https://pypi.python.org/pypi/Jinja2
https://github.com/pallets/jinja

In a nutshell HTML templates are rendered by jinja2 which replaces jinja2 markup with desired content. Jinja2 markup is embedded directly in the HTML and is a modified subset of python. Python variables are passed to the Jinja2 renderer along with the template and expanded in the output according to the mark-up in the template. Base templates can be "included" to remove redundant content from multiple pages, such as headers and footers, scripts and CSS.

make tables look nicer

An easy improvement to make the tables look nicer is to use datatables.net javascript plugin in the html templatate dashboad by inserting the following patch:

--- c:\users\mmikof~1\appdata\local\temp\meld-tmp4qxlpp
+++ C:\Users\mmikofski\Documents\Projects\pecos\pecos\io.py
@@ -320,6 +320,7 @@
     <title>$title</title>
     <meta charset="UTF-8" />
     </head>
+    <body>
     <table border="0" width="100%">
     <col style="width:70%">
     <col style="width:30%">
@@ -396,6 +397,7 @@
     datestr = date.strftime('%m/%d/%Y')
     template = template + pecos.__version__ + ", " + datestr
     template = template + """
+    </body>
     </html>"""

     template = Template(template)
@@ -415,7 +417,10 @@
     template = template + """
     </title>
     <meta charset="UTF-8" />
+    <!-- datatables.net -->
+    <link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/1.10.11/css/jquery.dataTables.css">
     </head>
+    <body>
     <table border="0" width="100%">
     <col style="width:70%">
     <col style="width:30%">
@@ -437,7 +442,7 @@
     template = template + title
     template = template + """
     </H2>
-    <table border="1" class="dataframe">
+    <table id="myTable" border="1" class="dataframe display">
     <thead>
     <tr>
     <th></th>"""
@@ -496,9 +501,15 @@
     datestr = date.strftime('%m/%d/%Y')
     template = template + pecos.__version__ + ", " + datestr
     template = template + """
+    <!-- jQuery (necessary for datatables.net) -->
+    <script src="https://code.jquery.com/jquery-1.12.3.min.js"></script>  
+    <script type="text/javascript" charset="utf8" src="https://cdn.datatables.net/1.10.11/js/jquery.dataTables.js"></script>
+    <script>
+        $(document).ready(function(){
+        $('#myTable').DataTable();
+    });
+    </script>
+    </body>
     </html>"""

-    template = template + """
-    </html>"""
-    
     return template

Example of check_timestamp

@kaklise :
Katherine, Hope you are doing well. Can you kindly post an example for how to use the check_timestamp function. Here is what I tried :

import pandas as pd
from random import random
from pecos import monitoring

# Build a random data set
ts_index = pd.date_range('1/1/2000', periods=1000, freq='T')
v1 = [random() for i in range(1000)]
v2 = [random() for i in range(1000)]
v3 = [random() for i in range(1000)]
ts_df = pd.DataFrame({'v1':v1,'v2':v2,'v3':v3},index=ts_index)

# Test for timestamps  
t = monitoring.check_timestamp(
    data=ts_df,
    frequency=3600
)
print(t["mask"])

Since the index is the time-series for the entire data frame:

Shouldn't the result just have one column indicating True or False ?
Reading through the documentation, the function should test for:

monotonicity (since I am using the default, it should reindex)
missing (should fill in any missing timestamps)
duplicates (drop any duplicates, by retaining the first occurrence)

After performing these actions, I am not sure how to interpret the booleans that it returns which are the size of the data frame. Could you kindly provide an example clarifying this ?

Quality control tests clarification

Pecos seems to be a very intersting & useful tool for develping a meteo data QC workflow.

I would like to test iot on some historical data but have some questions on use:

The pv example config contains Baseline_config.yml several limit values.

Just for clarification:

Were these limits developed out of experience?
Or where these developed based on certain literature, e.g. WMO, FAO, NREL, others?
Is only the PV performance check or all input values, temperature, wind speed, humidity?

Maybe the reason for my question that I cannot see the output due to #20.

Change proposed for send_email

I'm not sure if anyone is using the pecos function 'send_email' to send reports, but I plan to update the function to use more flexible python packages (smtplib and email). This will make the function compatible with a wider range of email servers and operating systems. The updated function will no longer support the Outlook option as it is written, let me know if that's a problem.

Pandas version 0.21.0 dropped support for Python 3.4

Updates to the check_delta test require the latest version of Pandas (0.23.0), which are not backward compatible. Pecos will no longer be tested against Python 3.4

replace mutables in function default arguments

A handful of pecos functions/methods use mutable default arguments. This can lead to unexpected behavior, though I don't think the way that pecos is currently using them would cause a problem. Just something I noticed while trying to become more familiar with the library.

http://docs.python-guide.org/en/latest/writing/gotchas/#mutable-default-arguments

check_range, check_increment, evaluate_string, write_monitoring_report, plot_colorblock

Another reason to consider changing them is to set a good example for less experienced people.

`import pecos` fails due to ModuleNotFoundError: No module named 'pytest'

Trying to import pecos fails after pip install pecos with traceback:

>>> import pecos
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\anaconda3\envs\pecos_pip_install_test\Lib\site-packages\pecos\__init__.py", line 3, in <module>
    from pecos import io
  File "C:\ProgramData\anaconda3\envs\pecos_pip_install_test\Lib\site-packages\pecos\io.py", line 10, in <module>
    import pecos.graphics
  File "C:\ProgramData\anaconda3\envs\pecos_pip_install_test\Lib\site-packages\pecos\graphics.py", line 19, in <module>
    import pytest
ModuleNotFoundError: No module named 'pytest'

This line in pecos/pecos/graphics.py seems to be the source of the issue.

I have created a pull request to fix this issue.

Key not applied in check_custom_static

Custom checks are applied to the entire dataset despite a key being specified. If I replace the following section of the check_custom_static function :

    # Function that operates on the entire dataset and returns a mask and
    # metadata for the entire dataset
    mask, metadata = quality_control_func(self.df)

by this :

# Function that operates on the entire dataset and returns a mask and
# metadata for the entire dataset
mask, metadata = quality_control_func(df)

with df being defined as : df = self._setup_data(key)
it seems to be working.

compatibility with pvlib 0.4

I see that pecos.pv.basic_pvlib_performance_model uses older function signatures for pvlib's sapm and singlediode. It would be straightforward to update these for pvlib's 0.4 api change, possibly with a version check to retain compatibility. However, it seems to me that pvlib 0.4's ModelChain already accomplishes what this function sets out to do and maybe the function can be deprecated? I ran into the issue when trying to run the pv_example.py script.

On a related note, I love the yaml files in the pecos examples and I would like to see pvlib gain the ability to make a Location and PVSystem from a yaml file. That might make it slightly easier to integrate the pvlib changes into the pv_example.py script.

Increment version

Version (0.1.2) should be incremented as it includes changes not present in release 0.1.2 (timezone stuff).

functional approach fails

**Update: I noticed that the Pip version is 0.1.7 while the stable version in GitHub and ReadTheDocs is 0.1.8 **

The functional approach as documented in the "Framework" of the documentation shows the following:

results = pecos.monitoring.check_range(data, [-3,3])

Here is my attempt at doing the exact same thing in multiple different ways:
** ATTEMPT 1 **

# Import required packages 
import numpy as np
import pandas as pd
import re
from pandas.util import testing as tm
import pecos

# Set Seed and generate a dataframe for testing 
tm.N, tm.K = 15, 3
np.random.seed(256)
ts_df = tm.makeTimeDataFrame(freq='D')

# Reset Index
ts_df = ts_df.reset_index()

# Call check_range as shown in the documentation 
check_range_test = pecos.monitoring.check_range(ts_df,[-1,1])

It generates the following exception:
module 'pecos.monitoring' has no attribute 'check_range'

** ATTEMPT 2 **

# Import required packages 
import numpy as np
import pandas as pd
import re
from pandas.util import testing as tm
from pecos import monitoring 

# Set Seed and generate a dataframe for testing 
tm.N, tm.K = 15, 3
np.random.seed(256)
ts_df = tm.makeTimeDataFrame(freq='D')

# Reset Index
ts_df = ts_df.reset_index()

# Call check_range as shown in the documentation 
check_range_test = monitoring.check_range(ts_df,[-1,1])

It generates the following exception:
module 'pecos.monitoring' has no attribute 'check_range'

Can you kindly point out to what I am doing wrong ? Additionally, do you have any further documentation on the functional approach beyond what is on the ReadTheDocs ?

TimeStamps on Console output

I would be interested in seeing a timestamp along with the console output while this runs.
Related to this when I run the PECOS program there appears to be a longer delay in creating the custom.png images over the test_result.png images. is this normal?

use pandas 0.24 has bug

i use original code "simple_example.py" ,original file "simple.xlsx" in example.
environment in pandas 0.24
system show:
TypeError: Invalid comparison between dtype=float64 and Timedelta

i change pandas to 0.23
it's solved.

For people who have encountered the same mistake

Improve efficiency for the 'check_delta' QC test

The 'check_delta' QC test checks the difference between the max and min within a rolling window. This test is not very efficient for large data sets (>100000 pts) because it uses df.rolling().apply() to find the location of the min and max.

sandialabs / pecos Goto Github PK

pecos's People

Contributors

Stargazers

Watchers

Forkers

pecos's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs