bluesky / bluesky-browser Goto Github PK

Search and browse saved and lived data from bluesky.

License: Other

Python 100.00%

bluesky-browser's Introduction

Bluesky Browser

This prototype has been archived. Development of the visualization components is proceeding in the new project [bluesky-mpl](https://github.com/bluesky/bluesky-mpl). Development of the the search GUI is proceeding in the project [Xi-cam.gui](https://github.com/synchrotrons/Xi-cam.gui/pulls).

A library of Qt widgets for searching saved bluesky data and viewing document streams either live or from disk.

This is a prototype that may be fully rewritten, abandoned, or moved into other libraries.

Launching the demo

Create a custom conda environment.

conda create -n bluesky_browser python=3 \
    bluesky jsonschema matplotlib ophyd pyqt \
    pyzmq qtpy suitcase-jsonl tornado traitlets  \
    -c lightsource2-tag
conda activate bluesky_browser

Clone and install.

git clone https://github.com/NSLS-II/bluesky-browser
cd bluesky-browser
pip install -e .

Run the demo.

bluesky-browser --demo

The above generates example data in a temporary directory and launches a Qt application to browse that data. It supposes there are two catalogs of data, abc and xyz, which could be from two instruments or perhaps "raw" data and "processed" data from the same instrument. The catalogs may be searched by date range or any custom Mongo query. Clicking on a search result pulls up a new tab with a more detailed view. There are two viewing areas to facilitate comparing data. Right-click and drag a tab to move it between areas.

To customize and extend this, generate a configuration file

bluesky-browser --generate-config

and edit it. The bluesky-browser will automatically discover and apply the configuration file if it located in the current directory where bluesky-browser is run. (In the future we will add a proper search path with other standard locations.)

Intended Scope

Search saved data from any databroker Catalog (backed by MongoDB or JSONL or ....).
View and compare data from runs. Use "hints" as defaults to guide how to view a given run, and let the user adjust from there.
Perform basic plot manipulations, not rising to the level of a full data analysis GUI (e.g. no nonlinear curve-fitting) but enabling some interactive tuning to provide a useful view of the data.
View live data streaming in from the RunEngine (via some message bus).
Be extensible, providing for the possibility of views that are specific to a beamline or instrument.

Current Features

Search multiple Catalogs (e.g. multiple beamlines) for saved data and sort search results.
View selected search results in individual tabs or "over-plotted" in one tab.
View Header, baseline readings, and line plots from saved or streaming data.
"Over-plot" arbitrary groups of Runs, including saved data, streaming data, or a mix of both.

Roadmap

Get feature parity with Best-Effort Callback.
- Table
- Grid
- PeakStats
Add image stack viewer.
Enable user to change what is plotted interactively. (The hints becomes just a default.)
Add a way to run just the viewer part against live data (from RE).
Add a "Summary" widget to the top of the Header tab.
Add integration with suitcase for file export, starting with CSV.
Add context menus (right click) as an alternative way to do overplotting, etc.
Support "progressive search", iteratively refining search results.

bluesky-browser's People

Contributors

Stargazers

Watchers

Forkers

mrakitin cj-wright danielballan eliotgann whishei tacaswell jklynch dylanmcreynolds

bluesky-browser's Issues

Add content menus (right-click) on search results.

This should provide the same options that the buttons below provide: open in new tab, open in existing tab.

Best way to have callback span multiple starts

We may want a single callback to span multiple runs. For instance if we take a single shot per RE call but want to put them together in a scatter plot. Currently a callback clears/refreshes itself upon start document.

Expected Behavior

Have a way to specify if not to clear/refresh when we get a start in.

Current Behavior

Callback refresh on start.

Possible Solution

Have a dedicated clearing mechanism, which may or may not be called (potentially using a separate 'clear' document/callback message)

Context

Often we make many start documents even though we wish the data to be visualized together. Eg a scan on a grid with many samples may get a start document per sample (so as to preserve the search ability of the sample data) but we want to visualize the max intensity per grid position as a whole.

Instead of hiding buttons (e.g. Open...) that don't apply, disable them.

A good suggestion by @jklynch

Error arising from unexpected integer-typed datum_ids

From SST via @mrakitin

Traceback (most recent call last):
  File "c:\users\greateyes\event-model\event_model\__init__.py", line 501, in event
    datum_doc = self._datum_cache[datum_id]
KeyError: 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\users\greateyes\intake-bluesky\intake_bluesky\core.py", line 463, in read_partition
    self.filler('event', event)
  File "c:\users\greateyes\event-model\event_model\__init__.py", line 599, in __call__
    return super().__call__(name, doc, validate)
  File "c:\users\greateyes\event-model\event_model\__init__.py", line 71, in __call__
    output_doc = getattr(self, name)(doc)
  File "c:\users\greateyes\event-model\event_model\__init__.py", line 507, in event
    raise err_with_key from err
event_model.UnresolvableForeignKeyError: Event with uid fb5bfc8e-597d-4e23-8620-1650a3fb348a refers to unknown Datum datum_id 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\greateyes\bluesky-browser\bluesky_browser\viewer\viewer.py", line 263, in run
    for name, doc in self.entry().read_canonical():
  File "c:\users\greateyes\intake-bluesky\intake_bluesky\core.py", line 434, in read_canonical
    for name, doc in self.read_partition(i):
  File "c:\users\greateyes\intake-bluesky\intake_bluesky\core.py", line 467, in read_partition
    if '/' in datum_id:
TypeError: argument of type 'int' is not iterable

Avoid redundant calls to entry.get() when sending to Viewer.

Per some debug printing with @jklynch, we saw that when an entry is sent to the Viewer, its BlueskyRun is instantiated ~4 times. Reducing that to 1 should speed things up.

Demo opens a ton of headers at once?

The demo keeps opening new header tabs and focusing on them, which is a little annoying as I was trying to look at one header and it keeps moving.

Interact with RemoteDispatcher and Publisher?

Would it be possible to interact with the message buses? That way data could be sent onto the bus for analysis and received back from the bus for plotting. This could also support live data?

Make header table columns configurable

It might be nice to have a configurable header table, different beamlines might need different pieces of metadata to find interesting data.

Numerical columns should sort by value not str(value)

Specifically, the Transient Scan ID sorts 1, 10, 2, 3, etc.

One potential solution: https://wiki.python.org/moin/PyQt/Sorting%20numbers%20in%20columns

BUG: Only functions defined at module global level are safe to pickle.

I'm opening this issue about a bug I have just fixed in case it is useful for later reference. A user tried to test bluesky-browser --demo on Windows and reported a traceback that included this line:

AttributeError: Can't pickle local object 'stream_example_data.<locals>.run_proxy'

See https://stackoverflow.com/a/36995008/1221924 for some explanation of what was wrong. Not sure why this works on Linux and not Windows, but the fix in 418aabe is a clear improvement in any case.

Save buttons on plots

It might be helpful to save out plots as files for report writing.

Issuing running the Demo with latest releases

The demo doesn't run unless after pip installing blueksy-browser you then pip install blueksy --upgrade and pip install event-model --upgrade. We should resolve this.

Suggestions on the DEMO so far

Opening an issue to mention a few points:

Overall I like the interface, and the layout. It is pretty intuitive and would require almost no 'teaching' of the users (the sign of a good UI !-) ). I list a few areas that I like and/or think could be improved.

I like the idea in the roadmap of making the search input boxes and search result row columns configurable, as different things are more/less important for different techniques.
I assume the 'matplotlib figures' will be the output of our 'plotting factories', if not we should consider that.
I think the 'custom query' part needs work. expecting users to know (or learn) the Mongo query langauge, which is not intuitive for people who don't have a coding background, to me will cause issues, and will lead to significant complaints from users.
- My suggestion here is to follow the lead of a lot of retail websites, where queries are entered one at a time with subsequent queries being applied on top of the previous one. Of course a 'clear queries' button will also be needed, as will a list of currently applied queries.
- With this approach you could have simple search inputs like plan_name = scan as the input which is a far more intuitive solution.
I am not convinced of the opening a new 'tab' every time you click on a search result as the default behaviour. I think this will quickly lead to a huge amount of tabs being created.
- I see that it is useful to be able to have more than one tab open, to compare different runs, but think a better solution is a checkbox that says 'keep tab open' which prevents replacing the contents.
I would also add to the default tab factory a 'description' tab which includes a short (3-4 line) description built-up from metadata that provides a bit more info than the search results row but a little less than the detail of the header and baseline. something like below:
- making this configurable would also be a great extension.
- I suggest this as I have regularly had feedback that the Header has too much information, so something that shows the important metadata in an at a glance structure would be helpful, and should probably be the tab that is on top automatically.

Scan_ID: XXX, Uid: XXXXXX, Date: **********************
plan type:  'count'
detectors; [detector list]

An 'export to file' button (like the 'Copy UID to clipboard' button) would also be useful IMO, but accept that this would also have to be somewhat configurable (what type of file.....).

As this is an early prototype I am not listing all of the minor interface issues I have seen, but just FYI their are issues with column sizing etc. which we will need to address before going 'live' with this to avoid those issues overshadowing the good work here.

Header tree doesn't support arrow keys

When I click on the start document I was hopeful that I could then use the right arrow key to expand it and the down arrow key to move among records.

Add integration with suitcase for file export, starting with CSV.

User selects search results and clicks Export. This opens up a dialog box.
Dialog box contains tabs (maybe tabs along the side?), one tab per format.
Each format can be activated on/off and configured (directory, file_prefix, custom options).
Click Go.

We'll want on-disk persistence of these settings so that they retain their previous state from session to session. We can reuse traitlets to do this:

In [35]: class C(traitlets.HasTraits):
    ...:     a = traitlets.Int(6)
    ...:     b = traitlets.Unicode('hello')
    ...:     def serialize(self):
    ...:         return {name: getattr(self, name) for name in self.trait_names()}
    ...:     def deserialize(self, d):
    ...:         for key, value in d.items():
    ...:             setattr(self, key, value)
    ...:     def __repr__(self):
    ...:         return f'<a={self.a} b={self.b}>'
    ...:     

In [36]: c = C()

In [37]: c
Out[37]: <a=6 b=hello>

In [38]: c.a = 100

In [39]: d = c.serialize()

In [40]: d
Out[40]: {'a': 100, 'b': 'hello'}

In [41]: c.b = 'goodbye'

In [42]: c
Out[42]: <a=100 b=goodbye>

In [43]: c.deserialize(d)

In [44]: c
Out[44]: <a=100 b=hello>

Edit: The idea is that we could write d to disk as workspace.json, a snapshot of how that user left the workspace. This is distinct from configuration.

First point I appear to have found a bug. After running the browser I can repeatably cause the window to crash (close) with the following steps,

rescaling to see all of the columns
double clicking on scan_id 5
selecting catalog 'xyz'
clicking on scan_id 1
see the output of the command line below for this entire process

A few parts I really like.

The 'open' and 'Add to tab' buttons that appear when a scan is selected in the search results.
The way that the different 'plots' appear as different sub tabs (which keeps things neat)
The 'open overplotted' button, and the layout of the different .
The way in an 'overplotted' tab that the different added runs are all shown as different windows in the 'Header' and 'Baseline' Tabs.

I know this will recieve some push back, but I think it is a very important point to note. labelling tabs and or figures in a plot based on 'Unique ID' will be wildly unpopular on the floor

This is because the unique id, although guaranteed to be unique is not an easily remebered/understood number. I am not aware of anyone currently using bluesky who references using UID they all use the more human readable scan_id.
As we are now talking of allowing different catalogs to be used together, which makes the scan_id even less unique, I recommend labelling the tabs and plots using a combination of catalog name and scan_id (or date and scan_id) instead of UID, for tabs and 'scan_ID[UID]' for plots.
On a second note here I am not a great fan of the word 'Transient' in front of 'Scan ID' in the list search results, but am not wiling to fall on my sword on that one !-).

I think another few bugs occur for the over-plotting tabs.

The baseline tab does not automatically update with new entries like the 'Header' tab.
New plot tabs are not created when a subsequent scan has a different 'detector'.
Trying to search with a query while an overplotted tab is open, or even once the 'overplotting' tab is available crashes the GUI.

alwalter@awalter01:~/NSLS-II/bluesky-browser$ bluesky-browser --demo
/home/alwalter/miniconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
Serializing example data into directory /tmp/tmpqf4dbhct
Demo Proxy is listening on port 40423 and publishing to 37747.
Demo acquisition has started.
Query {'time': {'$gte': 1556192475, '$lt': 1557402075}} -> 6 results
New streaming Run: uid='b20bbaf7-20b2-40f9-91b4-5ca83cf82a50'
Streaming Run ignored because Streaming is disabled.
Query {'time': {'$gte': 1556192475, '$lt': 1557402075}} -> 1 results
dimensions: [[['time'], 'primary']]
plot det against time
Reloaded search results
Query {'time': {'$gte': 1556192475, '$lt': 1557402075}} -> 7 results
New streaming Run: uid='67050616-305b-4037-bb33-a43d2a2eec1f'
Streaming Run ignored because Streaming is disabled.
dimensions: [[['motor'], 'primary']]
plot det against motor
Query {'time': {'$gte': 1556192475, '$lt': 1557402075}} -> 1 results
Reloaded search results
Traceback (most recent call last):
File "/home/alwalter/NSLS-II/bluesky-browser/bluesky_browser/search.py", line 181, in emit_selected_result
for row in sorted(self.selected_rows)])
File "/home/alwalter/NSLS-II/bluesky-browser/bluesky_browser/search.py", line 181, in
for row in sorted(self.selected_rows)])
IndexError: list index out of range
Aborted

crashed the application

Followed installation guidelines (revised per PR #49) on RHEL7 x86_64.

Application started and presented runs from some (demo?) databroker. Good.
Lots of console traceback output as I moved mouse over the window.
Double clicked on ScanID 1
Tab opened in right-side panel: shows Header, Baseline, ...
Visual inspection of document content (in the different tabs) looks fine.
Crashed the application when I clicked the [x] box on the run's tab in right side panel.

This report on the console:

Traceback (most recent call last):
  File "/home/oxygen18/JEMIAN/sandbox/bluesky-browser/bluesky_browser/viewer/viewer.py", line 207, in close_tab
    self.parent().close_run_viewer(widget)
AttributeError: 'QSplitter' object has no attribute 'close_run_viewer'
Aborted

Very reproducible.

Issues at SMI

(/opt/conda_envs/collection-2019-3.0.1-smi) xf12id@xf12id-ws1:~$ bluesky-browser smi-catalog.yml 
/opt/conda_envs/collection-2019-3.0.1-smi/lib/python3.7/site-packages/dask/config.py:168: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  data = yaml.load(f.read()) or {}
/opt/conda_envs/collection-2019-3.0.1-smi/lib/python3.7/site-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  defaults = yaml.load(f)
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-xf12id'
Traceback (most recent call last):
  File "/opt/conda_envs/collection-2019-3.0.1-smi/bin/bluesky-browser", line 12, in <module>
    sys.exit(main())
  File "/opt/conda_envs/collection-2019-3.0.1-smi/lib/python3.7/site-packages/bluesky_browser/main.py", line 128, in main
    app = build_app(args.catalog, zmq_address=args.zmq_address)
  File "/opt/conda_envs/collection-2019-3.0.1-smi/lib/python3.7/site-packages/bluesky_browser/main.py", line 141, in build_app
    menuBar=app.main_window.menuBar)
  File "/opt/conda_envs/collection-2019-3.0.1-smi/lib/python3.7/site-packages/bluesky_browser/main.py", line 36, in __init__
    catalog=catalog)
  File "/opt/conda_envs/collection-2019-3.0.1-smi/lib/python3.7/site-packages/bluesky_browser/search.py", line 87, in __init__
    self.set_selected_catalog(0)
  File "/opt/conda_envs/collection-2019-3.0.1-smi/lib/python3.7/site-packages/bluesky_browser/search.py", line 170, in set_selected_catalog
    self.selected_catalog = self.catalog[name]()
  File "/opt/conda_envs/collection-2019-3.0.1-smi/lib/python3.7/site-packages/intake/catalog/entry.py", line 78, in __call__
    s = self.get(**kwargs)
  File "/opt/conda_envs/collection-2019-3.0.1-smi/lib/python3.7/site-packages/intake/catalog/local.py", line 275, in get
    plugin, open_args = self._create_open_args(user_parameters)
  File "/opt/conda_envs/collection-2019-3.0.1-smi/lib/python3.7/site-packages/intake/catalog/local.py", line 256, in _create_open_args
    % self._driver)
ValueError: No plugins loaded for this entry: bluesky-mongo-normalized-catalog
A listing of installable plugins can be found at https://intake.readthedocs.io/en/latest/plugin-directory.html .
Exception ignored in: <function SearchState.__del__ at 0x7fd80f55a2f0>
Traceback (most recent call last):
  File "/opt/conda_envs/collection-2019-3.0.1-smi/lib/python3.7/site-packages/bluesky_browser/search.py", line 158, in __del__
AttributeError: 'SearchState' object has no attribute 'reload_thread'
(/opt/conda_envs/collection-2019-3.0.1-smi) xf12id@xf12id-ws1:~$ cat smi-catalog.yml 
sources:
  xyz:
    description: SMI catalog
    driver: bluesky-mongo-normalized-catalog
    container: catalog
    args:
      metadatastore_db: mongodb://xf12id-ca1:27017/datastore
      asset_registry_db: mongodb://xf12id-ca1:27017/filestore
    metadata:
      beamline: SMI

Version:

bluesky-browser           0.1.0a2                  py37_0    nsls-ii-2019C3.0.1