kanaries / pygwalker Goto Github PK
View Code? Open in Web Editor NEWPyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis
Home Page: https://kanaries.net/pygwalker
License: Apache License 2.0
PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis
Home Page: https://kanaries.net/pygwalker
License: Apache License 2.0
This is an issue from reddit
Traceback (most recent call last):
File "E:\py\test-pygwalker\main.py", line 15, in <module>
gwalker = pyg.walk(df)
File "E:\py\test-pygwalker\venv\lib\site-packages\pygwalker\gwalker.py", line 91, in walk
js = render_gwalker_js(gid, props)
File "E:\py\test-pygwalker\venv\lib\site-packages\pygwalker\gwalker.py", line 65, in render_gwalker_js
js = gwalker_script() + js
File "E:\py\test-pygwalker\venv\lib\site-packages\pygwalker\base.py", line 15, in gwalker_script
gwalker_js = "const exports={};const process={env:{NODE_ENV:\"production\"} };" + f.read()
File "E:\Python\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 511737: character maps to <undefined>
even loading df = pd.DataFrame(data={'a':[1]}) causes this problem to appear.
Dear Kanaries Team,
Thank you very much for your amazing Pygwalker project!
I would like to kindly ask you to also publish it on conda-forge. By doing that, I think you will get many more users. There is already a feedstock available for it:
https://github.com/conda-forge/pygwalker-feedstock/pulls
Could you please update it to the latest version or make each release arrive on conda-forge as well? Thank you very much in advance.
Currently, using pyg.walk(df)
on a Jupyter Notebook with a dark theme renders a white widget, where most text are so low contrast that they are effectively invisible.
So I would love to use pygwalker in my project which is currently serving the data analytics through a simple flask server. Is there any easy way already available?
Hi, is there a way to display the data labels/values when we pull bar charts? Thanks in advance.
Thanks for creating pygwalker, just starting playing with it, it looks great!
When trying the following snippet:
import pandas as pd
import pygwalker as pyg
df = pd.DataFrame([{"a":"\tb"}])
gwalker = pyg.walk(df)
I get this error:
Javascript error adding output!
SyntaxError: JSON.parse: bad control character in string literal at line 1 column 24 of the JSON data
See your browser Javascript console for more details.
A DataFrame in the format of
x_x | y_y |
---|---|
1 | 3 |
2 | 4 |
will work fine.
The following column naming will result in visible but "unusable" data.
x.x | y.y |
---|---|
1 | 3 |
2 | 4 |
Thank you for this wonderful data analysis project! It is going to be quite helpful for a lot of non-technical people in our lab. I do have a suggestion for a feature, which may or may not be feasible, but integrating Polars support for this tool would be a wonderful addition. Its programming interface is quite similar to that of Pandas, but has quite a few differences in order to optimize the library for speed and performance. It makes working with a lot of our very large datasets in python much quicker.
I got the error "Object of type date is not serializable" for column of type dbdate.
cited from Reddit user's comment
Since you already added graphic walker, you could also add this to https://vega.github.io/vega-lite/ecosystem.html.
Dear Kanaries Team,
Thank you for such an amazing project, it is very useful!
One feature that would be very helpful is to be able to pre-set multiple charts / tabs from a loaded config file. Right now, one has to manually upload the config file to get the charts configured.
We are now building many many dashboards with your project, and having to re-configure charts each time we reload is a huge blocker.
Thank you very much in advance ๐
When I'm running pygwalker from a jupyter notebook, any time I reload the cell that runs pygwalker, I have to go back in and adjust data types (quantitative, ordinal, etc). For data where pygwalker makes assumptions that I don't want, that means fussing with the data tab any time I reload the data. Would be great to be able to specify those in the .walk call or elsewhere.
This is a wonderful library.
I am very excited after a long time.
It seems to be enough to get an overview of the dataset.
Thank you.
Thanks also for the dark mode update.
I have only one request.
When in dark mode, the text color is not visible, perhaps because it is black. I would appreciate it if you could improve it.
I have a dataset with network data which includes the Event Time, Source IP and Destination IP. When I create a chart with Event time as the column and Source IP as the row the graph is displayed. But when I select the Destination IP, I get a smiley face and the program hangs and exits from Jupyter Notebook.
Thank you for developing an awesome program!
Related to other requests, I would like to ask about the supports for Panel Holoviz (https://panel.holoviz.org/).
Please let me know if it is better to ask the Panel's developers about it.
Thanks in advances.
Is it possible to add support for Streamlit?
I think that using the components API from streamlit and a Jinja template that puts the HTML and Javascript file into one this is possible.
CODE:
import pandas as pd
import pygwalker as pyg
df = pd.read_csv(r'E:\VSCODE\QSPR_loss_cal\2.csv')
gwalker = pyg.walk(df)
ERROR:
UnicodeDecodeError Traceback (most recent call last)
Cell In[13], line 5
2 import pygwalker as pyg
4 df = pd.read_csv(r'E:\VSCODE\QSPR_loss_cal\2.csv')
----> 5 gwalker = pyg.walk(df)
File ~\AppData\Roaming\Python\Python38\site-packages\pygwalker\gwalker.py:84, in walk(df, gid, **kwargs)
79 props = {
80 'dataSource': to_records(df),
81 'rawFields': raw_fields(df),
82 }
83 html = render_gwalker_html(gid)
---> 84 js = render_gwalker_js(gid, props)
86 display(HTML(html))
87 display(Javascript(js))
File ~\AppData\Roaming\Python\Python38\site-packages\pygwalker\gwalker.py:65, in render_gwalker_js(gid, props)
63 walker_template = jinja_env.get_template("walk.js")
64 js = walker_template.render(gwalker={'id': gid, 'props': json.dumps(props, cls=DataFrameEncoder)} )
---> 65 return gwalker_script() + js
File ~\AppData\Roaming\Python\Python38\site-packages\pygwalker\base.py:15, in gwalker_script()
13 if gwalker_js is None:
14 with open(os.path.join(HERE, 'templates', 'graphic-walker.umd.js'), 'r') as f:
---> 15 gwalker_js = "const process={env:{NODE_ENV:"production"} };" + f.read()
16 return gwalker_js
UnicodeDecodeError: 'gbk' codec can't decode byte 0x94 in position 357182: illegal multibyte sequence
I'm quite enjoying this, but I have come across a serious data interpretation problem.
I have a dataframe read like so:
df = pd.read_sql_query(query,conn, parse_dates=['date'])
However, the datetime values are all relatively high resolution ones (i.e., 5-15 second samples over many days), and the X axis nearly always shows only "2023-02" instead of showing the date (or the hour if I'm looking at the last 24 hours).
Can we get a way to change the X-axis label resolution, or (even better), a stepwise automatic scale to format those datetimes according to the dataset granularity?
Hi ๐ ,
Congrats on the pygwalker release. I'm the maintainer of VegaFusion, which is an open source project that provides server-side scaling for Vega visualizations by automatically extracting Vega transforms and evaluating them on the server. This makes it possible to scale many Vega/Vega-Lite visualizations to millions of rows as long as they include some form of aggregation.
I haven't looked at the architecture of pygwalker, but it might be fairly straightforward to integrate VegaFusion and enable pygwalker to support lager data sets. Let me know if you're interested in talking through details!
Noticed an issue from https://discord.com/channels/987366424634884096/1057481447541325885
I'm trying out pygwalker and I've noticed that some of my fields land in the 'blue' portion of the field list, which appear to be treated as buckets, while some land in the 'green' portion of the field list, which look like they're treated as numbers. In the dataframe I'm loading, one of my fields is in the blue bucket category and the other is in the green number category. Both fields are int64 data types with no nulls.
How should I understand this behavior and how can I modify it?
UPD: It does look like I can drag the field from blue to green, but even if I choose 'Sum' it is still treated as a bucket, so what should look like a bar chart instead resembles a heatmap.
Often when biologists draw a plot to compare data, statistical analysis along with the statistical annotation of the p-value comparison on the graph is a crucial step. Can this feature be implemented? If so, and if there is help needed, I would be more than happy to help out with this feature.
Here is an example:
As for current versions (pygwalker<=0.1.7
), the generated HTML to display can be extremely large when walking on large data frames.
Generate code from GUI interactions; State restoration & Undo
Generating Pandas Code
For each edit you make to the Mitosheet, Mito generates pandas code below that corresponds to this edit, and puts this code directly below the mitosheet in the next code cell.Rerunning an analysis
When you run mitosheet.sheet(), Mito will automatically generate a unique ID to store the set of edits make to this mitosheet. This ID will appear as an automatically generated analysis_to_replay parameter to the mitosheet.sheet() function call.
As long as you pass this analysis_to_replay parameter to the mitosheet.sheet() call, Mito will attempt to replay that analysis to the mitosheet. Replaying an analysis means applying the same edits that you did in Mito again.
Since Mito will try and apply the same edits when an analysis_to_replay parameter is passed, differently structured datasets might make these edits invalid and Mito will error. For example, if you change the location of the file that you imported in an analysis, and then attempt to replay this analysis, it will fail (as it can no longer find the file to import).
If you want to start a fresh mitosheet, simply make a newmitosheet.sheet()
call in a new code cell.
Now graphic-walker support a new parameter dark = 'light' | 'dark' | 'media'
<GraphicWalker dark="light" />
<GraphicWalker dark="dark" />
<GraphicWalker dark="media" /> // auto detect OS theme
Excel users have the edge with graphs over Python users in my opinion but its getting closer with packages like this! A common type of graph that my audiences like is graphs but with a table of the graph values aligned underneath.
I thought charts are always easier to look at. However, some members of the audience prefer the numbers!
This type of chart is shareable and can be used in meetings and or presentations without having to hover the mouse over.
Example:
Hello, thanks for creating and maintaining this package. Sadly when I try to render the HTML I just got <IPython.core.display.HTML object>
as output.
I have tried with:
! pip install git+https://github.com/Kanaries/pygwalker@main
!pip install 'pygwalker>=0.1.4a0'
!pip install pygwalker
All cases showed same result. Any suggestion?
Thanks
How do you make a histogram chart of one attribute (column)? (without pre-calculating the histogram data before putting the data into pygwalker, of course)
I fiddled with the UI for a while but couldn't find a way.
If it's not possible right now, I'd like it to be implemented.
Thanks
Users can work with pygwalker in (offline) Jupyter Notebooks
Does pygwalker already work w/ micropip and JupyterLite?
%pip install pygwalker
Analyzing data by period is useful for analyzing data by e.g. quarter, which is common in financial objectives. I think that a potential solution to using a 'Period' column type is to convert to datetime in the backend and warn the user that this has occurred:
df['period_temp'] = df['period'].astype('datetime64[ns]')
Ideally, the original values would be stored as a map and assigned as labels in a time series plot but this sounds complex.
Love this package btw, really pleased this is finally here, been looking for incentives to entice excel users out of their bubble for a long time and this might be the boon!
When I turn on the aggregation function, it turns my plot into a single point, which is not very helpful. Is there a way where I can see the average of field but leaving the other field as raw values -- this is especially helpful when I have categorical data.
Hi maintainers,
Hope this message finds you in good health. I wanted to reach out to ask if it's possible to add support for Plotly Dash. It's useful for creating interactive web-based dashboards but can benefit from all the data exploration part.
Thanks
File "lib\site-packages\pygwalker\utils\render.py", line 25, in default
return json.JSONEncoder.default(self, obj)
File "lib\json\encoder.py", line 179, in default
raise TypeError(f'Object of type {o.class.name} '
TypeError: Object of type Decimal is not JSON serializable
Possible solution to add to render.py. Note there will be some loss in precision.. but not sure how else to handle this.
class DataFrameEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, (datetime.datetime,datetime.date,datetime.time)):
return str(obj)
if isinstance(obj, (decimal.Decimal)):
return float(obj)
return json.JSONEncoder.default(self, obj)
I am using JupyterLab. I encountered an encoding problem when importing a data set. The column names of the DataFrame contain Chinese, but the Chinese is displayed as Unicode. The strange thing is that the value of the field also contains Chinese, but it can be displayed normally.
How to get two coaxial pieces of data to be displayed together instead of divided into two views?
Such as a dataset that there is the same X-axis of date but two attribute
Great work!
Thanks for your sharing. But there are some problems I can't solve. How should I change the fontsize of X-Y axis, and title?
I have a double multindex dataframe, i.e. multindex and multindex in the columns names.
when passing it to pygwalker I get:
File ~/.local/lib/python3.10/site-packages/pygwalker/utils/fname_encodings.py:4, in fname_encode(fname)
3 def fname_encode(fname: str):
----> 4 return base64.b64encode(bytes(fname, 'utf-8')).decode()
TypeError: encoding without a string argument
Is this because pygwalker does not admit double multindex?
Thank you so much for this package, I've been looking for something like this for ages. Some features from Tableau I would like to see:
1 - Force table format, as I very much like it for basic data exploration.
2 - Crosstab to excel, to download the data being shown in the viz to Excel, CSV or something.
Also, reinforcing #11, support for streamlit would very much be appreciated and would increase the potential of this package by a lot.
Hi thanks for the great work with PyWalker!
We are considering to use it in a project but noticed that you added an automatic update check triggered every time the library is imported the first time. Commit: to:feat:reminder to update
Please reconsider if this check is really necessary, as I believe the update mechanism provided by pip is the preferred mechanism for checking for updates.
In addition to this, this is also a privacy issue for some users, so removing this check would help with adoption.
You can run PygWalker on Pyodide! The following notebook works on VSCode thanks to the joyceerhl.vscode-pyodide
extension.
https://github.com/davidgasquez/datalab/blob/main/notebooks/2023-03-14-Pyodide.ipynb
What would it take to bring this to Databricks Notebooks ?
Is there support for use with zeppelin notebooks?
Hello!
It is a wonderful module. I am a beginner in django. Is it possible to use this application inside an html div tag?(The file is uploaded through the front end and then processed in the background to return the graphic display and editor to the front end.) Thanks.
It would be really great to be able to export and import a description of the graph for later reuse, like it is done in Vega.
This also relates to [Feat] Force data types in code #70 .
Besides not having to setup predefined data types every time this would enable users of PyGWalker to export predefined setup of the plots in its entirety.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.