Optional High Level charts API built on top of Bokeh
bokeh / bkcharts Goto Github PK
View Code? Open in Web Editor NEWOptional High Level charts API built on top of Bokeh
License: BSD 3-Clause "New" or "Revised" License
Optional High Level charts API built on top of Bokeh
License: BSD 3-Clause "New" or "Revised" License
Optional High Level charts API built on top of Bokeh
bkcharts
is currently unmaintained. For actively developed, supported, very high level charting on top of Bokeh, see the HoloViews project: http://holoviews.org/Using bokeh 0.10.0
For data streaming, key is to update the data source and the plot gets updated with the latest data. But how to achieve the similar thing for a chart(bar chart)?
Is there any attribute for charts using which its data (data frame or x/y) can be updated?
Kindly advise.
Color values do not seem to be mapping as expected to the given palette on bokeh (0.11.1) with Python 3. I would expect lowest (blue) values to be displayed under the normally distributed data.
import numpy as np
import seaborn as sns
from bokeh.plotting import show, output_file
from bokeh.charts import HeatMap
sz = 1000
gamma = list(np.random.gamma(1, 1, size=sz))
normal = list(np.random.normal(size=sz))
cols = gamma + normal
x = ['gamma']*sz + ['normal']*sz
y = list(range(sz)) + list(range(sz))
data = {'x': x, 'y': y, 'color': cols}
palette = sns.diverging_palette(220, 20, n=9).as_hex()
heatmap = HeatMap(data, x='x', y='y', values='color', stat=None, palette=palette)
output_file('tmp.html')
show(heatmap)
sns.palplot(palette)
sns.plt.show()
I have a Pandas Series.
PC 999
PlayStation 3 815
Xbox 360 789
iPhone 505
Wii 301
PlayStation 4 277
Nintendo 3DS 225
Xbox One 208
Nintendo DSi 200
PlayStation Vita 155
other 540
dtype: int64
I pass this Series to Donut
from bokeh.charts import Donut
from bokeh.io import output_file, show
from bokeh.layouts import gridplot
# series is a variable that I created
chart = Donut(series)
output_file("chart.html")
grid = gridplot([
[chart]
])
show(grid)
I got this chart
The order is wrong.
Should the order of values be the same of it in the Series
I'm running Bokeh 0.11.0 on Mac OS X 10.10.4 with Python 3.5. I have the following code in a Jupyter Notebook:
from bokeh.charts import Bar, show, output_notebook
import pandas as pd
output_notebook()
data_dict = { 'numstudents' : [43, 22, 1,
2, 54, 9,
18, 10, 5,
14, 12, 3,
15, 11, 1,
14, 8, 2],
'language' : ['Matlab','Matlab','Matlab',
'C/C++','C/C++','C/C++',
'Java','Java','Java',
'HTML/CSS','HTML/CSS','HTML/CSS',
'Python','Python','Python',
'Javascript','Javascript','Javascript'],
'skill_level' : ['Beginner','Intermediate','Expert',
'Beginner','Intermediate','Expert',
'Beginner','Intermediate','Expert',
'Beginner','Intermediate','Expert',
'Beginner','Intermediate','Expert',
'Beginner','Intermediate','Expert']
}
data_df = pd.DataFrame(data_dict)
p = Bar(data_df,
values='numstudents',
label='language',
stack='skill_level',
legend='top_right',
title="ECEn 360 Student Self-Reported Programming Skills",
tooltips=[('Students:', '@numstudents'), ('Language:', '@language')]
)
show(p)
The plot builds and displays just fine. The problem I am having is with the tooltip information as illustrated in the upper left of the plot image below. Note that to the right of "Students:" is "???" rather than the number of students at that skill level for that language (in this case it should be the number 9). My understanding from the documentation is that '@numstudents' should refer to the column 'numstudents', but the tooltip doesn't pick up the value from the column as it should, and as it correctly does in the case of the 'language' column.
Would be nice to be able to create charts using DataSource like this:
from bokeh.models.sources import AjaxDataSource
from bokeh.charts import Line
source = AjaxDataSource(data_url='http://localhost:5050/data')
Line(source, index='x', title="Lines", ylabel='y_label')
The documentation for chart defaults seems to be missing.
http://bokeh.pydata.org/en/latest/docs/reference/charts.html#module-bokeh.charts points to http://bokeh.pydata.org/en/latest/docs/user_guide/charts.html#userguide-charts-defaults which points back to the first link to get a list of the Chart Defaults. I can't seem to actually find the list on either of those links, however.
Right now the charts assume a dataframe (e.g. charts.Bar
), but wanting to plot pandas.Series
is also very common. Not sure how much the builders would have to change to support this... It seems like they should just be able to pull the index and values off the Series, instead of looking for it from the args provided into the charting function.
See this notebook: https://notebooks.anaconda.org/birdsarah/scatter-chart-select
The selection is not selecting the items I expect it to inside the box. Other items are remaining selected while some are deselected.
Here is the plot with nothing selected:
With selection box in top left:
And selection box in bottom right:
To my eye, there are dots outside the selection selected.
When I click on a rectangle I get the message "pressed" 3 times !!
I was expecting to see it only once.
Anyone has an explanation for this behavior?
from bokeh.charts import Bar, show
from bokeh.models import TapTool, CustomJS
data = {'ID': [0, 1, 2], 'Total': [5.0, 20.0, 1.0]}
p = Bar(data, label='ID', values='Total', width=400, tools='tap')
tap = p.select(dict(type=TapTool))
tap.callback = CustomJS(code='''alert("pressed");''')
show(p)
Test case:
df = {
'field1': {
0: u'Dreili\u0146i',
1: u'Zasulauks',
2: u'Jaunciems',
3: u'Vec\u0101\u0137i',
4: u'Zolit\u016bde'
},
'field2': {
0: 1393.4975669099756,
1: 940.10734463276833,
2: 616.83713611329665,
3: 1674.2025627044709,
4: 1068.1402252322382
}}
p = Bar(df, 'field1', values='field2')
Produces exception:
Traceback (most recent call last):
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/flask/app.py", line 1836, in __call__
return self.wsgi_app(environ, start_response)
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/flask/app.py", line 1820, in wsgi_app
response = self.make_response(self.handle_exception(e))
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/flask/app.py", line 1403, in handle_exception
reraise(exc_type, exc_value, tb)
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
response = self.full_dispatch_request()
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/flask_debugtoolbar/__init__.py", line 125, in dispatch_request
return view_func(**req.view_args)
File "/Users/yuri/work/ssguru/web/public/flats.py", line 41, in bar_price_mean_by_district
ylabel="Mean price")
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/charts/builders/bar_builder.py", line 311, in Bar
chart = create_and_build(BarBuilder, data, **kw)
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/charts/builder.py", line 67, in create_and_build
chart.add_builder(builder)
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/charts/chart.py", line 149, in add_builder
builder.create(self)
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/charts/builder.py", line 518, in create
chart.add_renderers(self, renderers)
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/charts/chart.py", line 144, in add_renderers
self.renderers += renderers
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/core/property_containers.py", line 18, in wrapper
result = func(*args, **kwargs)
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/core/property_containers.py", line 77, in __iadd__
return super(PropertyValueList, self).__iadd__(y)
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/charts/builders/bar_builder.py", line 204, in yield_renderers
x_label=self._get_label(group['label']),
File "/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/charts/builder.py", line 487, in _get_label
return str(raw_label)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)
Possibly other high level charts affected
These examples should be equivalent:
from collections import OrderedDict
from bokeh.charts import Area, show, output_file, defaults, vplot
import pandas as pd
defaults.width = 400
defaults.height = 400
# create some example data
data = OrderedDict(
samples=[x + 2 for x in range(14)],
python=[2, 3, 7, 5, 26, 221, 44, 233, 254, 265, 266, 267, 120, 111],
pypy=[12, 33, 47, 15, 126, 121, 144, 233, 254, 225, 226, 267, 110, 130],
jython=[22, 43, 10, 25, 26, 101, 114, 203, 194, 215, 201, 227, 139, 160],
)
output_file(filename="area.html")
area1 = Area(
data, title="Area Chart",
x='samples', y=['python', 'pypy', 'jython'], ylabel='memory', legend="top_left"
)
df = pd.DataFrame(data)
df.set_index('samples', inplace=True)
area2 = Area(
df, title="Area Chart",
x='index', y=['python', 'pypy', 'jython'], ylabel='memory', legend="top_left"
)
show(vplot(area1, area2))
Currently, charts produces a synthetic index, even if it does exist already. Discovered this while answering #2918.
Current output is this:
An example illustrates this:
from collections import OrderedDict
from bokeh.charts import Scatter
from bokeh.plotting import show, output_file
output_file('demo.html')
data = OrderedDict([
('a',[[1,1]]),
('b',[[2,2], [3,3]]),
('c',[[4,4], [5,5], [6,6]])
])
scatter = Scatter(data,xlabel='x',ylabel='y',legend='top_left')
show(scatter)
The resulting scatter plot will only show the first point of b
and c
. Hence the plot shows three points, (1,1), (2,2) and (4,4) instead of six points, as one might expect:
The same thing occurs when passing a pandas DataFrame().groupby()
object as input. I can work around it easily enough by adding NaN entries at the end of the shorter vectors, but it would be nice if it just worked.
I'm running bokeh
0.9.1.
Expected behavior:
When I use a TimeSeries chart with custom x axis, If I pass to 'x', 'ylabel' or 'title' parameters a unicode string then the x scale in the chart is correct.
Observed behavior:
When I use a TimeSeries chart with custom x axis, If I pass to 'x', 'ylabel' or 'title' a unicode string then the x scale is misplaced by one place.
from bokeh.charts import TimeSeries
from bokeh.charts import show
from bokeh.charts import output_file
from collections import OrderedDict
df = OrderedDict([(u'Years', ['2005-12-31', '2006-12-31', '2007-12-31', '2008-12-31', '2009-12-31', '2010-12-31', '2011-12-31', '2012-12-31', '2013-12-31', '2014-12-31']), (u'Italy', [0.3358, 0.3121, 0.3704, 0.4668, 0.3766, 0.3385, 0.3804, 0.5319, 0.7068, 0.5174]), (u'France', [0.6582, 0.617, 0.6183, 0.7542, 0.7463, 0.6677, 0.7465, 0.7718, 0.956, 0.906])])
y_labels = [u'Italy', u'France']
# Issue with 'x','title' and 'y' parameters of TimeSeries,
# if I pass unicode strings the x axis
# is shiftfed by one step.
# Python 2.7.10, Bokeh 0.11.1
# I have to convert to plain strings
# x_label = 'Years'
# y_label = 'Euro/Lt'
x_label = u'Years'
y_label = u'Euro/Lt'
line = TimeSeries(df, x=x_label, y=y_labels, ylabel=y_label, legend='top_left', dash=y_labels, color=y_labels)
output_file('ts.html')
show(line)
bokeh info
commandWhen I try to display a "flat" TimeSeries (i.e. all values y equal) I encountered the following bugs:
the line is not displayed (except, apparently, if y=[1,1,1...])
the y axis on the left side shows one tick with the value None instead of the numerical value
Tested in a clean virtualenv with bokeh 0.11.1 and 0.12.4
Minimal example showing cases with y=0,0,0 (no line, None) / y=1,1,1 (line displayed, None) / y=1,2,1 (normal behavior)
from bokeh.charts import TimeSeries, show, output_file, vplot
from datetime import datetime as dt
tsline = TimeSeries(
{'x': [dt(2017,1,1), dt(2017,1,2), dt(2017,1,3)],
'y': [0.,0.,0.] },
x='x', y='y',
)
tsline2 = TimeSeries(
{'x': [dt(2017,1,1), dt(2017,1,2), dt(2017,1,3)],
'y': [1.,1.,1.] },
x='x', y='y',
)
tsline3 = TimeSeries(
{'x': [dt(2017,1,1), dt(2017,1,2), dt(2017,1,3)],
'y': [1.,2.,1.] },
x='x', y='y',
)
output_file("ts.html")
show(vplot(tsline, tsline2, tsline3))
System information:
debian sid, example tested with firefox/chromium
$ bokeh info
Python version : 2.7.13 (default, Dec 18 2016, 20:19:42)
IPython version : 5.1.0
Bokeh version : 0.12.4
BokehJS static path : ~/.virtualenvs/bokeh/local/lib/python2.7/site-packages/bokeh/server/static
node.js version : v6.9.4
npm version : 3.10.10
http://bokeh.pydata.org/en/latest/docs/reference/charts.html#chart-options links to http://bokeh.pydata.org/en/latest/docs/user_guide/charts.html#userguide-charts-defaults which seems to link back to the first link - it's actually a link to the page but it's not clear.
I've been updating my scripts to take advantage of the improved charts interface in 0.10 and can't figure out how to control the order of grouped bars. I've checked the documentation/tutorials but couldn't find any information on this, sorry if I've missed it. Can someone point me in the right direction please?
Here is an example of the kind of data that I'm working with, where I'm plotting numbers for consecutive years. As you can see from the plot, I can't get the grouped bars to be arranged in the same order as the years are listed in the legend.
from bokeh.charts import Bar, output_file, show
import pandas as pd
data = pd.DataFrame({'Group' :['A','A','A','A','B','B','B','B','C','C','C','C','D','D','D','D'],
'Year' :['2009','2010','2011','2012','2009','2010','2011','2012','2009','2010','2011','2012','2009','2010','2011','2012'],
'Height':[1,2,3,4,2,2,2,2,3,0,3,0,4,3,2,1]})
fig = Bar(data, label='Group', values='Height', group='Year', legend='top_right')
output_file('chartsBarTest.html')
show(fig)
Here's how to try:
In [31]: df = [{ "city" : u'R\u012bga', "district" : "centrs", "rooms" : 3, "area" : 110, "project" : u'R\u012bga' }]
In [32]: p = Bar(df,
....: label='district',
....: stack='project')
/Users/yuri/anaconda2/lib/python2.7/site-packages/bokeh/charts/_attributes.py:78: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
df = df.sort(columns=columns)
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-32-01587b1931a7> in <module>()
1 p = Bar(df,
2 label='district',
----> 3 stack='project')
/Users/yuri/anaconda2/lib/python2.7/site-packages/bokeh/charts/builder/bar_builder.pyc in Bar(data, label, values, color, stack, group, agg, xscale, yscale, xgrid, ygrid, continuous_range, **kw)
97 kw['y_range'] = y_range
98
---> 99 return create_and_build(BarBuilder, data, **kw)
100
101
/Users/yuri/anaconda2/lib/python2.7/site-packages/bokeh/charts/_builder.pyc in create_and_build(builder_class, *data, **kws)
62 chart_kws = { k:v for k,v in kws.items() if k not in builder_props}
63 chart = Chart(**chart_kws)
---> 64 chart.add_builder(builder)
65 chart.start_plot()
66
/Users/yuri/anaconda2/lib/python2.7/site-packages/bokeh/charts/_chart.pyc in add_builder(self, builder)
132 def add_builder(self, builder):
133 self._builders.append(builder)
--> 134 builder.create(self)
135
136 def add_ranges(self, dim, range):
/Users/yuri/anaconda2/lib/python2.7/site-packages/bokeh/charts/_builder.pyc in create(self, chart)
301 if chart is None:
302 chart = Chart()
--> 303 chart.add_renderers(self, renderers)
304
305 # handle ranges after renders, since ranges depend on aggregations
/Users/yuri/anaconda2/lib/python2.7/site-packages/bokeh/charts/_chart.pyc in add_renderers(self, builder, renderers)
127
128 def add_renderers(self, builder, renderers):
--> 129 self.renderers += renderers
130 self._renderer_map.extend({ r._id : builder for r in renderers })
131
/Users/yuri/anaconda2/lib/python2.7/site-packages/bokeh/charts/builder/bar_builder.pyc in _yield_renderers(self)
224 color=group['color'],
225 fill_alpha=self.fill_alpha,
--> 226 stack_label=self.get_label(group['stack']),
227 dodge_label=self.get_label(group['group']),
228 **group_kwargs)
/Users/yuri/anaconda2/lib/python2.7/site-packages/bokeh/charts/_builder.pyc in get_label(raw_label)
292 raw_label = raw_label[0]
293
--> 294 return str(raw_label)
295
296 def create(self, chart=None):
UnicodeEncodeError: 'ascii' codec can't encode character u'\u012b' in position 1: ordinal not in range(128)
Similiar to color, I'm looking for a hook into changing the size dimension on Scatter. One spelling that seems to fit in with current api would be something like:
s = Scatter(df, marker=marker(columns=['my_column_to_size_on'], size=[3,6,9,12,15]))
I'm using Python 2.7 and Bokeh 0.12.4 on Ubuntu 14.04. I have a data frame like so:
msrp price
compact 1.0 1.0
sedan 2.0 3.0
suv 3.0 5.0
sport 4.0 7.0
made this way:
import pandas as pd
from bokeh.charts import Histogram, output_file, show
s = pd.Series([1,2,3,4], index=['compact', 'sedan', 'suv', 'sport'], dtype='float64')
s2 = pd.Series([1,3,5,7], index=['compact', 'sedan', 'suv', 'sport'], dtype='float64')
df = pd.DataFrame({'msrp': s, 'price': s2})
output_file('test.html')
p = Histogram(df['msrp'], title='Test')
show(p)
When I run this, I get the following error:
ValueError: expected an element of either Column(Float), Column(Int), Column(String), Column(Date), Column(Datetime) or Column(Bool), got 0 2
dtype: int64
This is puzzling because when I examine the msrp
series, I get:
>>> df['msrp']
compact 1.0
sedan 2.0
suv 3.0
sport 4.0
Name: msrp, dtype: float64
Note that dtype
reads as a Float. what am I doing wrong? I should note that all other chart types work properly.
UPDATE
The example on the docs dont work either:
from bokeh.sampledata.autompg import autompg as df
p = Histogram(df['hp'], title='Test')
Same error. Is this a known issue? If so, the docs should be updated...
In the current Chart Heatmap, the default colormap doesn't represent a scale of values:
Those colors just feel like set at random and don't have associated with them a scale or order where it's obvious what the density or values for each of them are, like in the employment example:
I believe that the current chart.Heatmap might be confusing and not very useful to users. I'd suggest changing the palette to a scale-aware one.
In [236]: cat('asd', 'dfe')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-236-60732eba4df3> in <module>()
----> 1 cat('asd', 'dfe')
/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/charts/attributes.pyc in cat(columns, cats, sort, ascending, **kwargs)
399 kwargs['ascending'] = ascending
400
--> 401 return CatAttr(**kwargs)
/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/charts/attributes.pyc in __init__(self, **kwargs)
317
318 def __init__(self, **kwargs):
--> 319 super(CatAttr, self).__init__(**kwargs)
320
321 def _setup_iterable(self):
/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/charts/attributes.pyc in __init__(self, columns, df, iterable, default, items, **properties)
108 properties['items'] = items
109
--> 110 super(AttrSpec, self).__init__(**properties)
111
112 if self.default is None and self.iterable is not None:
/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/core/properties.pyc in __init__(self, **properties)
699
700 for name, value in properties.items():
--> 701 setattr(self, name, value)
702
703 def __setattr__(self, name, value):
/Users/yuri/anaconda2/envs/ssguru/lib/python2.7/site-packages/bokeh/core/properties.pyc in __setattr__(self, name, value)
720
721 raise AttributeError("unexpected attribute '%s' to %s, %s attributes are %s" %
--> 722 (name, self.__class__.__name__, text, nice_join(matches)))
723
724 def set_from_json(self, name, json, models=None):
AttributeError: unexpected attribute 'cats' to CatAttr, possible attributes are ascending, attr_map, attrname, bins, columns, data, default, items, iterable or sort
Hello,
plotting a timeserie doesn't work "out of the box".
Here is what I'm doing in a IPython notebook.
import pandas_datareader.data as web
df = web.DataReader("GOOG", "yahoo")
ts = df['Close']
from bokeh.io import output_notebook
from bokeh.charts import show
from bokeh.charts import TimeSeries
output_notebook()
p = TimeSeries(ts)
show(p)
And what I get:
There is in fact 3 issues / enhancements :
ts.name
is 'Close'
and so, as it's neither a blank string neither None
this should be used instead of 'value'
ts.index.name
is 'Date'
and so, as it's neither a blank string neither None
this should be used instead of 'index'
ts.index
DatetimeIndex(['2010-01-04', '2010-01-05', '2010-01-06', '2010-01-07',
'2010-01-08', '2010-01-11', '2010-01-12', '2010-01-13',
'2010-01-14', '2010-01-15',
...
'2016-06-20', '2016-06-21', '2016-06-22', '2016-06-23',
'2016-06-24', '2016-06-27', '2016-06-28', '2016-06-29',
'2016-06-30', '2016-07-01'],
dtype='datetime64[ns]', name='Date', length=1636, freq=None)
Kind regards
bokeh.version : '0.11.0'
pandas.version : '0.17.1'
When doing a BoxPlot chart with categories generated by Pandas qcut()
function, the first category (data and label) are shift to the last position in the (x) axis.
Such plot can be seeing running the folowing code:
import numpy as np
import pandas as pd
signal = np.random.normal(0.5,0.1,900)
instr = np.random.poisson(0.5,100)
sample = np.concatenate((signal,instr),axis=0)
np.random.shuffle(sample)
df = pd.DataFrame({'sample':sample})
nbins = 10
df['quantil'] = pd.qcut(df['sample'],nbins)
from bokeh.io import show,output_file
from bokeh.charts import BoxPlot
output_file('boxplot.html')
p = BoxPlot(df,values='sample',label='quantil')
show(p)
There is a notebook I made available for if it helps: http://nbviewer.jupyter.org/github/chbrandt/pynotes/blob/master/issues/bokeh/boxplot_report.ipynb
Tnx
In Bokeh 0.11.0 the Charts API (e.g. HeatMap) always places the xaxis below
and the yaxis on the left
. See Chart.create_axes()
.
This should be configurable, so that the xaxis can be placed above
and the yaxis on the right
.
I was wondering if it would be possible to add the 'source' parameter to Bokeh.Charts objects. This would be helpful in connecting plot interactions to the .charts objects. Right now, I have to create a boxplot manually with the .plotting objects and the script is quite tedious. It would be nice if the .charts BoxPlot() class just let users update its source so interactivity can be done on that object.
When I run this code:
import pandas as pd
from bokeh.charts import Histogram
df = pd.DataFrame([[0, 'A'], [0, 'A'], [1, 'A'],
[0, 'B'], [0, 'B'], [1, 'B']],
columns=['level', 'unit'])
hist = Histogram(df, 'level', group='unit')
I get this error:
AttributeError: unexpected attribute 'width' to HistogramGlyph, similar attributes are bin_width
I would expect instead to get an actual graph.
My environment:
>>> bokeh.__base_version__
'0.11.0'
>>> bokeh.__version__
'0.11.0'
>>> bokeh.print_function
_Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 65536)
I initially found this error when running in a jupyter notebook on Firefox 44.0.2 running on a MacBook Pro running OS X Yosemite 10.10.5, but I get the same error running plain python from the command prompt. It's Python 3.4.4 |Anaconda 2.5.0 (x86_64)| in either case.
Currently there are some initializations parameters you can pass when creating a Figure, such as background_fill_color and toolbar_location, which cause an error when passed when creating a chart, such as a Histogram. It would be easier for new users if the parameters related to the display of the chart that are accepted when creating a Figure would also work when creating a Chart. This would lower the cognitive load when switching between Charts and lower level plotting using Figures.
I just did a Non-ribbon Chord Diagram a couple of days ago. I just want to make it available to all. Non-ribbon Chord Diagram, or DotChord will support two data dimension. Relationship between nodes and "relevance" of the links.
Right now the "relevance" is pictured as a variation in the opacity. I will add the preprocessing necessary to make this "relevance" automated if desired.
My design proposal is to keep it as similar to the chord as possible.
I would love to include maybe more data dimensions, but I'd like to hear what do you think about it first.
Any feedback or ideas are more than welcomed !!! 😁 Now it's a great moment to the discuss about it.
Photo to prove it.
bokeh.version : '0.11.0'
pandas.version : '0.17.1'
When I try to do BoxPlot
using categorical data generated by Pandas 'cut' method, the code crashes with ValueError: items in new_categories are not the same as in old categories
. The exception is raised by pandas/core/categorical.py
code, but I suspect the source of the error comes from Bokeh. The reason I think it starts at Bokeh (bokeh/charts/data_source.groupby()
) is because using the pandas.DataFrame builtin boxplot()
function everything runs properly.
The following code should reproduce the error:
import numpy as np
import pandas as pd
signal = np.random.normal(0.5,0.1,900)
instr = np.random.poisson(0.5,100)
sample = np.concatenate((signal,instr),axis=0)
np.random.shuffle(sample)
df = pd.DataFrame({'sample':sample})
nbins = 10
bins = np.linspace(0,1,nbins)
df['bins'] = pd.cut(df['sample'],bins)
from bokeh.charts import BoxPlot
p = BoxPlot(df,values='sample',label='bins')
If instead of bokeh.charts.BoxPlot
we use Pandas' boxplot, everything works fine:
df.boxplot(column='sample',by='bins')
Am I doing something wrong?
If it's a bug, possibly related to pandas-dev/pandas#10505?
ps: there is a notebook I made available for if it helps. The notebook has more steps then the strictly necessary ones (above), though: http://nbviewer.jupyter.org/github/chbrandt/pynotes/blob/master/issues/bokeh/boxplot_report.ipynb
Tnx
Charts documentation need improvements, especially regarding charts shared arguments and attributes. For instance, it's not easy to find information about charts legends on the user guide and the only mention to this parameter is on the Scatter Chart section, although it is a generic argument that can be used on any type of chart.
As discussed in the mailing list: It is possible (e.g. via import from excel) to create data frames where the column names are integers instead of strings.
Currently bokeh is throwing an ValueError on the attempt to create a chart with non-string column names.
The test code with a try-except-clause to handle the error:
'''
Test case: From Excel imported DataFrame throws an Error with bokeh 0.12.2.
Column names are are not explicitly set as text (and automatically
treated as numbers) in excel.
The values are imported as integers into the DataFrame.
When this DataFrame is used e.g. to create a line chart, bokeh throws an
ValueError.
This error is handled with a try-except clause, that converts on ValueError
the columns names to strings.
'''
if __name__ == '__main__':
import pandas
from bokeh.io import output_server, show
from bokeh.charts import Line
import subprocess
import time
hdf_name = 'test.xlsx'
excel_df = pandas.read_excel(io = hdf_name)
print('excel_df: \n{}'.format(excel_df))
args = ['python', '-m', 'bokeh', 'serve']
p = subprocess.Popen(args)
time.sleep(1) # wait for the server to run
output_server('test')
try:
p1 = Line(excel_df, title="DataFrame from Excel")
except ValueError as e: # raised if column values are e.g. integers
print('ValueError on column_values: \n{}'.format(excel_df.columns.values))
column_values = [str(i) for i in excel_df.columns.values] # make strings
excel_df = pandas.DataFrame(data = excel_df.values, columns = column_values)
print('Converted column_values to strings: \n{}'.format(excel_df.columns.values))
p1 = Line(excel_df, title="DataFrame from Excel")
show(p1)
Output:
excel_df:
1 2
0 100 0
1 50 50
2 0 100
column_values before ValueError:
[1 2]
column_values as strings:
['1' '2']
2016-10-04 18:50:43,656 Starting Bokeh server version 0.12.2
2016-10-04 18:50:43,664 Starting Bokeh server on port 5006 with applications at paths ['/']
2016-10-04 18:50:43,664 Starting Bokeh server with process id: 9260
2016-10-04 18:50:43,711 WebSocket connection opened
2016-10-04 18:50:43,712 ServerConnection created
2016-10-04 18:50:43,834 WebSocket connection closed: code=1000, reason='closed'
INFO:bokeh.client._connection:Connection closed by server
2016-10-04 18:50:44,092 200 GET /?bokeh-session-id=test (::1) 13.01ms
2016-10-04 18:50:44,770 WebSocket connection opened
2016-10-04 18:50:44,770 ServerConnection created
It would be nice if bokeh would handle these cases internally.
from a discussion on the mailing list:
https://groups.google.com/a/continuum.io/forum/#!topic/bokeh/Lq5LX6yH-fU
Currently bokeh can have hierarchical indexes on plots, but they are slotted for each tick, see in this example:
import pandas as pd
import datetime
from bokeh.charts import Bar, output_notebook, show
output_notebook()
data = {
'month': ['2016-01-01','2016-02-01', '2016-01-01'],
'work_location': ['site_a', 'site_a', 'site_b'],
'nps': [50.0, 33.3, -25.0]
}
df = pd.DataFrame(data)
p = Bar(df, values='nps', label=['work_location', 'month'])
show(p)
On the left is the current, on the right the axis is parsed visually so it is easier/quicker to interpret and see the groups.
Here is an additional example:
http://news.infragistics.com/cfs-filesystemfile.ashx/__key/CommunityServer.Discussions.Components.Files/265/4718.Capture.PNG
Again, just floating the idea as I think it lends visual power to the graph in making the content of the axis (when it is hierarchical like this) easier to parse than the current look.
Was requested to create an issue, from the mailing list:
Can CustomJS be used with the charts interface in 0.10? What would I use for the source? Could I pass a dataframe to the callback ?
The short answer is "yes" but the longer answer is that it's probably not as convenient as it could be. In particular, passing the data frame will not work, you'd have to poke around and get the right data source and pass that in. Suggestions for simple and approachable spellings are welcome.
Any example?
I'm afraid I don't have an example at hand. However, one way to find objects in side a collection of Bokeh models is to use the .select method to query the object graph. You can see more info here:
http://bokeh.pydata.org/en/latest/docs/reference/models/plots.html#bokeh.models.plots.Plot.select
I'm using bokeh.plotting
with the circle
glyph renderer and noticed that HoverTool
does not seem to handle column names that parentheses or spaces well.
An example from iPython Notebook below:
fake_data = {'ID': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Total': [0.0, 20.0, 0.0, 6.0, 8.0, 0.0, 8.0, 5.0, 8.0, 1.0, 2.0],
'Column With Spaces': ['a', 'b', 'c','d', 'e', 'f', 'g', 'h', 'i', 'j', 'k'],
'Column_Without_Spaces': ['a', 'b', 'c','d', 'e', 'f', 'g', 'h', 'i', 'j', 'k'],
'Column_With_Parenthese(s)': ['a', 'b', 'c','d', 'e', 'f', 'g', 'h', 'i', 'j', 'k'],
'Column_Without_Parentheses': ['a', 'b', 'c','d', 'e', 'f', 'g', 'h', 'i', 'j', 'k']
}
df = pd.DataFrame(fake_data)
source = ColumnDataSource(data=df)
hover = HoverTool(
tooltips = [
('ID', '@ID'),
('Total', '@Total'),
('Column With Spaces', '@Column With Spaces'),
('Column_Without_Spaces', '@Column_Without_Spaces'),
('Column_With_Parenthese(s)', '@Column_With_Parenthese(s)'),
('Column_Without_Parentheses', '@Column_Without_Parentheses')
]
)
p = figure(plot_width=400, plot_height=400, title=None, tools=[hover])
p.circle('ID', 'Total', size=10, source=source)
output_notebook()
show(p)
The issues as I see them:
HoverTool
does not flag that column names with parentheses or spaces are problematicHoverTool
not displaying them.This is not supported at all at the moment. The proposal is to support data updates but basically wiping all the renderers and recreating them. Although it's still "expensive" it's still a big win as it would avoid recreating the actual Chart which would avoid recreating the canvas on the JS side and the consequent annoying flickering.
We merged @grromrell's PR #5037 before tests were added. Should be fairly simple to add some python side tests for the util class.
I'm trying to use Bokeh to plot a Pandas dataframe with a DateTime
column containing years and a numeric one. If the DateTime is specified as x
, the behaviour is the expected (years in the x-axis). However, if I use set_index
to turn the DateTime column into the index of the dataframe and then only specify the y
in the TimeSeries I get time in milliseconds in the x-axis. A minimal example
output_file('fig.html')
test = pd.DataFrame({'datetime':pd.date_range('1/1/1880', periods=2000),'foo':np.arange(2000)})
fig = TimeSeries(test,x='datetime',y='foo')
show(fig)
output_file('fig2.html')
test = test.set_index('datetime')
fig2 = TimeSeries(test,y='foo')
show(fig2)
I would expect the same picture with both approaches. Also, after the second plot, the dataframe contains an extra column index
This happens in Bokeh 0.11 and Pandas 0.16.2. Browser is Chromium 48 in Linux.
Cheers!!
Related suggestions:
This might be a clearer logic:
if len(self.tools) == 0:
# if no tools customization let's create the default tools
if tools is True:
tools = DEFAULT_TOOLS
elif tools is False:
tools = []
tool_objs = _process_tools_arg(self, tools)
self.add_tools(*tool_objs)
Also "Only adds tools if given boolean and does not already have tools added to self." seems a bit obscure and complicated.
Also this should be a private method as it's not intended for users to use..
Bokeh 0.11.0
Consider the histogram example
from bokeh.charts import Histogram, output_file, show
from bokeh.sampledata.autompg import autompg as df
p = Histogram(df, values='hp', color='navy', title="HP Distribution")
output_file("histogram_color.html")
show(p)
The above example works, but if I use
p = Histogram(df['hp'], color='navy', title="HP Distribution")
I get the error
ValueError: cannot label index with a null key
Is this a known issue?
It seems that the high-level plots are missing the ability to render the name of the series when hovering over it. This is incredibly useful for very high density data where simple colouring and legends may not be appropriate. For example, consider you have 100 lines or stacked areas on a chart, a legend is fairly useless.
Here's how I've solved this problem for a stacked area (with help from the folks on the Gitter), but given that I need to do the same for TimeSeries
(it seems) etc, I feel this is an omission as an option for the high level charts.
https://gist.github.com/stringfellow/1f0a7ab327de326c805e23966fb975ec
When users want to create a bar graph, scatter chart, etc., they frequently need to aggregate the data. Right now it's a two-step process: aggregate the data via aggregate or pivot_table, then create a chart. It would be great if you could do it in one shot ala Excel Pivot Charts. For example:
Scatter(products, x='Year', y='Product Type', function='count')
or
Scatter(products, x='Year', y='Product Type', function='sum', aggregate_field='Revenue')
Although you could trick out this feature to handle much more complex cases, I wouldn't bother – the vast majority of the time users only need to do a simple aggregation.
A suggestion from one of my colleagues on choosing a name for the parameter specifying which function to use to aggregate the data:
I would personally suggest something like “Function”, “Perform” or “Calculate”. When you think about how this would be spoken, you’d say
- I want to run the sum function on flowers
- I want to calculate the sum of flowers
- I want to perform the sum operation on flowers
Following on from #1780, #1836, #1851, #1852
How do people feel about a HorizontalBar variation of Bar, where the categorical axis runs on the y (and the bars run Horizontally)?
One small naming concern is a potential confusion between:
Bar
)I'm not sure what people use for references on this stuff, I have a few books lying around, but have only picked stuff up informally, so I have no idea what "correct" is or should be.
But it seems that maybe being explicit and just having HorizontalBar and VerticalBar (which would imply a name change for Bar) might be useful.
In my toy project with data shape (8033, 9) I see top 2 calls in profiler:
Calls Total Time (ms) Per Call (ms) Cumulative Time (ms) Per Call (ms) Function
171290 642.655 0.0038 1472.878 0.0086 /bokeh/core/properties.py:923(validate)
211867/112752 293.256 0.0014 2114.824 0.0188 /bokeh/core/properties.py:240(is_valid)
Taking about a minute in sum. I guess that pandas data frame already do some type checking and there's no reason to double validate. The rough estimates give about 2 validate() calls per data cell. Is it possible to reduce the number of unnecessary validations?
Dear Bokeh Developers,
I would like to discuss the (changed) default behaviour of displaying the legend in the barplot of Bokeh. Currently, the legend seems to be displayed by default even if there is only data source as shown below:
(from http://bokeh.pydata.org/en/latest/docs/user_guide/charts.html)
Technically this is wrong, because all labels have the same colour. Also, the legend is redundant, as the x-axis already displays the labels.
This behaviour breaks the first four example plots of the tutorial http://bokeh.pydata.org/en/latest/docs/user_guide/charts.html
Therefore I would like to suggest to change the default behaviour for displaying the legend in the barplot to either False or even better to only display it when it makes sense (more than one color / type).
Best,
Tom
I am using Bokeh to plot a kind of precomputed barchart/histogram where I give as input the x labels and their corresponding counts:
from bokeh.charts import Bar, output_file, show
data = {
'labels': ('EN', 'PD', 'AD'),
'counts': (10, 20, 30),
}
p = Bar(data, values='counts', label='labels')
output_file("bar.html")
show(p)
Unfortunately Bokeh 11 sort the x labels in lexical order, while I want to keep the given order. It was not the case for Bokeh 10.
Other minor issue: I have to clear the x and y label which are in English and irrelevant in my case "sum(counts)".
Hopefully a picture is worth a thousand words:
Notice the tooltip. It has an erroneous "bonus" tooltip under it with labels "index, data(x,y), canvas(x,y)". If I pick a point more on the left of the plot (so the tooltip appears to the right of the data point) I instead get the values for this erroneous "bonus" tooltip.
This is caused by inlining the tooltip into the bokeh.charts.Scatter()
call:
s = Scatter(data=autompg[autompg.make.isin('ford volkswagen honda'.split())],
x='yr', y='mpg', color='make',
height=400, width=800, title='Fuel efficiency of selected vehicles from 1970-1982',
tools='hover, box_zoom, lasso_select, save, reset',
tooltips = [
('Make','@make'),
('MPG', '@mpg'),
('hp', '@hp')])
There shouldn't be a second set of tooltips underneath the intended tool tip.
See above code
See above image
bokeh info
commandPython version : 3.5.2 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:52:12)
IPython version : 5.1.0
Bokeh version : 0.12.4
BokehJS static path : /Users/ijstokes/anaconda/envs/dspyr/lib/python3.5/site-packages/bokeh/server/static
node.js version : (not installed)
npm version : (not installed)
See notebook here: http://nbviewer.jupyter.org/github/birdsarah/bokeh-miscellany/blob/master/set%20range%20on%20chart.ipynb
This correctly sets the x_range
to what the user wants:
x_range = Range1d(20, 30)
p = Scatter(df, x='mpg')
p.x_range = x_range
But this does not:
x_range = Range1d(20, 30)
p = Scatter(df, x='mpg', x_range=x_range)
I've checked for Scatter and Bar - I'm guessing it's true for all Charts
The documentation of bokeh.charts.BoxPlot
claims that label
and values
are optional and I get the impression that supplying a simple one-dimensional dataset is sufficient to create a BoxPlot.
Turns out I'm wrong, took me quite some time to figure out why the example works but my code didn't.
Apparently the reason is that it only works with a DataFrame having at least two columns, one being the values and one the labels.
Admittedly the documentation states:
Create a BoxPlot chart [...] from table-like data.
Still, I think it should be possible to use it with 1D datasets.
I'd expect bokeh to add some tag for label and value by itself in that case. Be it None
, 0
, 1
, N/A
or whatever. Maybe it does... as far as I can see, there are no python errors. It fails on the JavaScript side though. In Firefox, on first trial it says TypeError: _ is null
and afterwards Error: Error rendering Bokeh model: could not find tag with id: ...
.
Working example code with output_notebook
:
from bokeh.charts import BoxPlot, show, output_notebook
from bokeh.layouts import row
from bokeh.sampledata.autompg import autompg as df
box = BoxPlot(df, values='mpg', label='cyl', plot_width=400)
box2 = BoxPlot(df, values='mpg', label='cyl', color='cyl',
title="MPG Box Plot by Cylinder Count", plot_width=400)
output_notebook()
show(row(box, box2))
Non-working example code with 1D data-set:
from bokeh.charts import BoxPlot, show, output_notebook
from bokeh.layouts import row
from bokeh.sampledata.autompg import autompg as df
box = BoxPlot(df['mpg'], plot_width=400)
output_notebook()
show(box)
Notebook with test code and output (.txt is actually .ipynb but GitHub didn't accept):
BoxPlotTest.txt
Notebook Server OS:
Red Hat Enterprise Linux Server release 7.2 (Maipo)
Linux kernel: 3.10.0-327.13.1.el7.x86_64
Server's Python Version:
2.7.5
Server's Python Modules:
PipFreeze.txt
Client OS:
Windows 10 x64 (10.0.10586 Build 10586)
Client Browsers (tried both):
Google Chrome Version 56.0.2924.87 (64-bit)
Firefox Version 51.0.1 (32-Bit)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.