tableau / document-api-python Goto Github PK
View Code? Open in Web Editor NEWCreate and modify Tableau workbook and datasource files
Home Page: https://tableau.github.io/document-api-python/
License: MIT License
Create and modify Tableau workbook and datasource files
Home Page: https://tableau.github.io/document-api-python/
License: MIT License
All connections in 10.0 + now are of type 'federated' and this breaks the current introspection on dbname, server, and username.
Example:
<datasources>
<datasource caption='xy (testv1)' inline='true' name='federated.02estec0ldx7s41fmcg871k5lvm7' version='10.0'>
<connection class='federated'>
<named-connections>
<named-connection caption='mysql55.test.tsi.lan' name='mysql.1ioglmu0aggmqh1bbib8i1r6hs2f'>
<connection class='mysql' dbname='testv1' odbc-native-protocol='yes' port='3306' server='mysql55.test.tsi.lan' source-charset='' username='test' />
</named-connection>
</named-connections>
</connection>
</datasource>
</datasources>
(copied from developer forum post)
Java would be the next option, as it has strong support via JAXB for schema-validated XML. On the other hand, a pure JavaScript implementation would open up some interesting possibilities as well.
In order to better support all of the features required, I am splitting up the logical and the physical model (with the hopes of the physical model to be able to be generated from XSD). This is on-going work, but wanted to have an issue to track that as I'm continuing to work on it.
I'm hoping to have it done soon though.
(copied from developer forum post)
Another use case for the Document API is the creation of new worksheets and dashboard programmatically. This has actually been done before for and by Tableau customers but not widely publicized in the community. I could envision a fluent API for sheet creation.
I'd love to be able to do like that Voyager project thing (Data Voyager) and turn a TDS into a TWB by creating all the vizzes that made sense for the dataset. This would mean a lot of "XML hacking," though, even with the aforementioned SDK. It'd be great if we could say "sheet.setDatasource(xyzDatasource).onColumns('some dimension').onColumns('some measure', 'running_sum').onRows('some dimension).asBarChart();" or similar.
Enable removing repository location (see below) when migrating to another data source.
(copied from developer forum post)
I realize that unpacking/repacking a TWBX is trivial, but it's also boilerplate stuff we have to do in every solution unless it becomes core capability.
Possibly others
I think we should take another look at how the code is currently organized and see if we can't implement a few suggestions from the community (http://docs.python-guide.org/en/latest/writing/structure/#structure-of-the-repository is a great guide).
Off the top of my head, things I found odd:
Thoughts?
Happens in the latest commit on the development branch
Traceback (most recent call last):
File "party.py", line 4, in <module>
wb = Workbook('Contact Us Form - Tasks.twbx')
File "/Users/tdoyle/Documents/py/envs/party/lib/python3.5/site-packages/tableaudocumentapi/workbook.py", line 49, in __init__
self._workbookRoot, self._datasource_index
File "/Users/tdoyle/Documents/py/envs/party/lib/python3.5/site-packages/tableaudocumentapi/workbook.py", line 148, in _prepare_worksheets
datasource.fields[column_name].add_used_in(worksheet_name)
File "/Users/tdoyle/Documents/py/envs/party/lib/python3.5/site-packages/tableaudocumentapi/multilookup_dict.py", line 66, in __getitem__
return dict.__getitem__(self, key)
KeyError: '[Account]'
Sending workbook via email since it's internal and I can't just toss it to GH :)
Need to support changing connection string on data server datasources
We need to get everything into PyPI so that it's as easy as "pip install document-api-python" to get going.
We'll also need to update the README with the preferred installation steps.
Opening an issue to track -- do we want to try and document the API reference in the README, in a dedicated docs folder, on some hosted site (readthedocs.org), or on the tableau developer site?
Right now we've got decent doc strings for most public functions and fairly clear function names -- but as it grows it'd be good to know where we want to take the docs.
I have the following requirements:
/cc @DataRoberts
Take an existing TDS file ( say for PostgreSQL DB ) and change connection to MySQL/SQLServer/etc DB connection by changing the connection attributes from (class='postgres' to class='mysql' ) , Servername, dbname, port, username, auth-mode, sslmode, schema, single-node, workgroup-auth-mode.
I can successfully create a .twb file via the Document API, but attempting to publish it to my Tableau Server via Tabcmd results in an unexpected error:
Bad request
unexpected error occurred opening the packaged workbook.
Attached is the template workbook created in Tableau Desktop (superstore_sales.twb) and one of the workbooks created from that template via the Document API (superstore_sales_arizona.twb)
(copied from developer forum post)
Validate that the document is correct (valid XML) and matches the known format of the Tableau document for that target product version, then validate that the document in terms of acceptable Tableau content (are the data connection class types valid, do the columns added to the rows/columns shelves actually present in the datasource, etc.). In other words, both syntactic and semantic validation.
Can we add the port attribute to connection. This might be useful before the full document API is implemented. Some datasource changes could need to change the port in addition to the server etc. Some changes are a bit more involved but could include a port change. For example migrating from AWS RDS Postgres to Redshift might mean changing port from 5432 to 5439. A change of database variant could require other changes of course - schema in the relation for example.
Most projects that accept contributors include a contributing.md file which outlines the processes to follow for contributing to the project. While we have the greater tableau.github.io that includes much of that information, calling out procedures locally would be useful for discoverability. We should also include a link back to tableau.github.io's contributing page.
Hello everyone,
I have the following requirements:
Since the current version of document-api-python doesn't seem to support this, I'm in the process of forking this repository and implementing those features. I have currently implemented a first version of requirements 1 and 2, and will probably have time for the calculated fields and polishing (aka tests) at some point next week.
Changes are currently in the branch "feature_modify_fields" in my fork. I'd love to hear some feedback about the implementation and the general direction I'm going with this fork. If there is any interest in those features, I'd be happy to start the internal process of getting the CLA underway.
Greetings from Germany
Hi,
I’m interested in using the Document API to update the connection type for .twbx files. I would like to update the following fields for a Hortonworks Hadoop Hive connection type:
• Server Name
• Authentication
• Realm
• Host FQDN
• Service Name
If this is not possible with the current version, I would like to submit this as a feature request. These are only the fields that I need, but all field options on the Hortonworks connector would be helpful as well.
It would be great if I could do this not only locally, but also with a workbook that is published on Tableau Server. The Tableau Server UI doesn’t allow me to edit the authentication type for a data source and I haven’t seen a way to do this with tabcmd.
Also, I tried to access the Examples link at the bottom of the Github page (https://github.com/tableau/document-api-python) and it seems like the link is dead.
Thanks for your help!
Alex
For twb/tds
/cc @DataRoberts
Enable updating. These are in 'groupfilter' node.
/cc @DataRoberts
I'm changing a workbook's datasource and then republishing to Server. However, the Data Source Name in Vizportal shows the old name. I've tried changing the datasource.caption to no avail. The only other place the ds name shows up in my workbook's XML is in worksheets.table.view.datasources:
<table>
<view>
<datasources>
<datasource caption='<OLD CAPTION DATASOURCE>' name='<DS NAME>' />```
The current implementation of the Field object isn't very friendly to the end user because code completion doesn't work with the current implementation (dynamic properties), so this needs to be refactored to make the API easier to use.
I've run the sample code to change the database in a workbook. I can change the Server name, the Database name, the User name, and the Port, but there does not seem to be any way to change the database table. This is what I can do.
sourceWB.datasources[i].connections[0].server = 'myServer'
sourceWB.datasources[i].connections[0].dbname = 'myDbName'
sourceWB.datasources[i].connections[0].username = 'myUserName'
What I'd like to be able to do is specify a new database table as well. Of course this is a little more complicated if I'm joining multiple tables but all I need is to be able to iterate through the tables in each connection and change them. I don't need to be able to change the joins.
I want to do something like this:
for i in range(0, len(sourceWB.datasources[0].connections[0].tables)):
sourceWB.datasources[0].connections[0].tables[i] = 'tableName'
(copied from developer forum post)
Being able to tie the curiously named "Tableau SDK" aka Extract API into this project would help us complete the circle of offering data => TWBX capabilities for customers. To that end, we'd need to able to manipulate the packaged files through the API, even though this just means managing files in a ZIP. This is actually slightly more complicated, though, as we need to make sure the references in the TWB/TDS stay correct even as the files are replaced.
The files in packaged workbooks include: Images, Data, Shadow Extracts (though I have no idea what we'd do with those), ummm...and I guess you could lump Shapes into that list (though those are base64-encoded and included in the TWB itself). If those were addressable through the API, it'd be a real timesaver.
Enable modification of custom SQL in data sources
/cc @DataRoberts
For localization, the caption attribute of columns can be used. This blog post has some background.
(copied from developer forum post)
The ability to target a specific version of the software is a must, especially for OEM customers. Therefore the API should expose different interfaces based on the version of the workbook / datasource we're targeting.
I managed to trigger this when downloading workbooks from alpo.
This was done against the HEAD of a local copy of #45
`Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementPath.py", line 263, in iterfind
selector = _cache[cache_key]
KeyError: (".//metadata-record[@Class='column'][local-name='[Today's Date]']", None)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "runner.py", line 30, in
fields = wb_model.datasources[0].fields
File "/Users/tdoyle/Documents/py/document-api-python/tableaudocumentapi/datasource.py", line 135, in fields
self._fields = self._get_all_fields()
File "/Users/tdoyle/Documents/py/document-api-python/tableaudocumentapi/datasource.py", line 141, in _get_all_fields
return collections.OrderedDict([(k, v) for k, v in column_objects])
File "/Users/tdoyle/Documents/py/document-api-python/tableaudocumentapi/datasource.py", line 141, in
return collections.OrderedDict([(k, v) for k, v in column_objects])
File "/Users/tdoyle/Documents/py/document-api-python/tableaudocumentapi/datasource.py", line 140, in
for xml in self._datasourceTree.findall('.//column'))
File "/Users/tdoyle/Documents/py/document-api-python/tableaudocumentapi/datasource.py", line 17, in _mapping_from_xml
metadata_record = root_xml.find(xpath)
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementTree.py", line 649, in find
return self._root.find(path, namespaces)
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementPath.py", line 298, in find
return next(iterfind(elem, path, namespaces), None)
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementPath.py", line 277, in iterfind
selector.append(ops[token[0]](next, token))
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementPath.py", line 233, in prepare_predicate
raise SyntaxError("invalid predicate")
SyntaxError: invalid predicate`
Add support for calculated fields
/cc @DataRoberts
In order to more efficiently manage metadata documentation regarding fields in Tableau data sources, I need the ability to read the information in the element for a column so that I can write that information to a database. Conversely I need to be able to write information back out to this element after reading the information stored in a database. It will also be necessary for me to be able to determine what table a field comes from. Below is an example of the data I want to be able to read/write.
<column datatype='string' name='[STATUS]' role='dimension' type='nominal'>
<desc>
<formatted-text>
<run bold='true' fontsize='10'>Case Mix Index VAL</run>
<run fontcolor='#686868'> The MSDRG weight for inpatients with charges > 0. Does not include inpatient rehab, normal newborns or MSDRG 999.</run>
<run bold='true' fontcolor='#297a98'> HSP_ACCT_MULT_DRGS.DRG_WEIGHT (HAR 651)</run>
</formatted-text>
</desc>
</column>
Certain Datasources allow for initial sql to be passed. Being able to programmatically modify it would be extremely useful.
Enable updating look & feel of workbook
/cc @DataRoberts
I am trying to open a simple tds file to check the field data type and count as it is in Sample Script but after the script print the total field and first field data type it is exiting with the error -
if field.description:
AttributeError: 'Field' object has no attribute 'description'
Make connection plural and always return a list of them.
It would be very useful to be able to create a workbook or data source files, not only modify existing ones.
Refactor things to clean this up
(From my HEAD, which includes #62)
Name Stmts Miss Cover
----------------------------------------------------------
tableaudocumentapi/__init__.py 6 0 100%
tableaudocumentapi/connection.py 29 1 97%
tableaudocumentapi/datasource.py 85 0 100%
tableaudocumentapi/field.py 83 7 92%
tableaudocumentapi/multilookup_dict.py 45 10 78%
tableaudocumentapi/workbook.py 55 2 96%
tableaudocumentapi/xfile.py 65 3 95%
test/__init__.py 2 0 100%
test/bvt.py 208 1 99%
test/test_datasource.py 64 0 100%
test/test_multidict.py 38 1 97%
test/test_workbook.py 9 0 100%
(via beta feedback)
Replace data source -- we have QA and prod data sources and it would be good to be able to change the data source a worksheet uses programmatically
https://help.github.com/articles/configuring-pull-request-merge-squashing/
Is this something we want to do for the document-api repo?
To accomplish this I was getting feedback on PRs until they were done, and then squashing and resubmitting in a new PR, but this seems like a much cleaner way to keep the discussion in one place?
As a first step towards automatically generating a datasource from scratch, I require the ability to create -elements.
I'm trying to figure out how to replace an extract in a packaged workbook with a different extract that has the same table definition. I can use the Tableau SDK to create a new .tde, but it looks like there aren't python methods to replace an extract in a packaged workbook.
I'm imagining this would look something like this:
from tableaudocumentapi import Workbook
sourceWB = Workbook('workbook.twbx')
sourceWB.datasources[0].connections[0].extract = "MY-NEW-EXTRACT.tde"
#OR sourceWB.datasources[0].connections[0].extract.package('new-extract.tde')
sourceWB.save_as('new_workbook.twbx')
I can replace an extract in a .twb file by hacking the xml and changing references, but the "save_as" method doesn't actually package the workbook so that the connection is no longer dependent on a .tde file.
Enable updating images. Handle base64 encoding.
/cc @DataRoberts
From conversations with @t8y8 , we need the ability to get the list of columns in a datasource.
(via beta feedback)
Populate parameter values -- we have some workbooks where we have to refresh the list of values each parameter has. This would allow us to implement a poor man's "dynamic parameter" that can be populated from a DB table
Certain Datasources have query bands and it would be extremely helpful to be able to modify these programatically.
When I try to load a workbook, if some of the fields has description with some kind of accents (like Ñ or á), it throws an UnicodeDecodeError.
I will provide now a PR that provides a solution for Python 2.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.