GithubHelp home page GithubHelp logo

tableau / document-api-python Goto Github PK

View Code? Open in Web Editor NEW
317.0 317.0 175.0 65.15 MB

Create and modify Tableau workbook and datasource files

Home Page: https://tableau.github.io/document-api-python/

License: MIT License

Python 99.84% Shell 0.16%

document-api-python's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

document-api-python's Issues

Connection won't work for federated data sources

All connections in 10.0 + now are of type 'federated' and this breaks the current introspection on dbname, server, and username.

Example:

  <datasources>
    <datasource caption='xy (testv1)' inline='true' name='federated.02estec0ldx7s41fmcg871k5lvm7' version='10.0'>
      <connection class='federated'>
        <named-connections>
          <named-connection caption='mysql55.test.tsi.lan' name='mysql.1ioglmu0aggmqh1bbib8i1r6hs2f'>
            <connection class='mysql' dbname='testv1' odbc-native-protocol='yes' port='3306' server='mysql55.test.tsi.lan' source-charset='' username='test' />
          </named-connection>
        </named-connections>
      </connection>
    </datasource>
  </datasources>

Support for Other Languages

(copied from developer forum post)

Java would be the next option, as it has strong support via JAXB for schema-validated XML. On the other hand, a pure JavaScript implementation would open up some interesting possibilities as well.

Remodeling the object models

In order to better support all of the features required, I am splitting up the logical and the physical model (with the hopes of the physical model to be able to be generated from XSD). This is on-going work, but wanted to have an issue to track that as I'm continuing to work on it.

I'm hoping to have it done soon though.

Build-A-Sheet Workshop

(copied from developer forum post)

Another use case for the Document API is the creation of new worksheets and dashboard programmatically. This has actually been done before for and by Tableau customers but not widely publicized in the community. I could envision a fluent API for sheet creation.

I'd love to be able to do like that Voyager project thing (Data Voyager) and turn a TDS into a TWB by creating all the vizzes that made sense for the dataset. This would mean a lot of "XML hacking," though, even with the aforementioned SDK. It'd be great if we could say "sheet.setDatasource(xyzDatasource).onColumns('some dimension').onColumns('some measure', 'running_sum').onRows('some dimension).asBarChart();" or similar.

Update Project Organization

I think we should take another look at how the code is currently organized and see if we can't implement a few suggestions from the community (http://docs.python-guide.org/en/latest/writing/structure/#structure-of-the-repository is a great guide).

Off the top of my head, things I found odd:

  • There was no tests file or folder (But I added one :) )
  • Document API is a high level folder that I think can un-exist, so that setup.py is at the top level of the git repo
  • tableaudocumentapi should be tableau-document-api for readability
  • Do we want a different file per class for each of the models?
    -- An alternative might be to have them all in a models.py file, or at least group them into a models subpackage so it becomes 'from models.connection import Connection' etc.
  • Examples aren't runnable from a git clone -- you have to install first, it'd be nice if we moved the structure around so that we could just run them... we could do sys.path manipulation (ugly, but works), or something similar

Thoughts?

KeyError throwing when opening workbook

Happens in the latest commit on the development branch

Traceback (most recent call last):
  File "party.py", line 4, in <module>
    wb = Workbook('Contact Us Form - Tasks.twbx')
  File "/Users/tdoyle/Documents/py/envs/party/lib/python3.5/site-packages/tableaudocumentapi/workbook.py", line 49, in __init__
    self._workbookRoot, self._datasource_index
  File "/Users/tdoyle/Documents/py/envs/party/lib/python3.5/site-packages/tableaudocumentapi/workbook.py", line 148, in _prepare_worksheets
    datasource.fields[column_name].add_used_in(worksheet_name)
  File "/Users/tdoyle/Documents/py/envs/party/lib/python3.5/site-packages/tableaudocumentapi/multilookup_dict.py", line 66, in __getitem__
    return dict.__getitem__(self, key)
KeyError: '[Account]'

Sending workbook via email since it's internal and I can't just toss it to GH :)

Publish to PyPI

We need to get everything into PyPI so that it's as easy as "pip install document-api-python" to get going.

We'll also need to update the README with the preferred installation steps.

API Documentation

Opening an issue to track -- do we want to try and document the API reference in the README, in a dedicated docs folder, on some hosted site (readthedocs.org), or on the tableau developer site?

Right now we've got decent doc strings for most public functions and fairly clear function names -- but as it grows it'd be good to know where we want to take the docs.

Enable connection class retargeting

Take an existing TDS file ( say for PostgreSQL DB ) and change connection to MySQL/SQLServer/etc DB connection by changing the connection attributes from (class='postgres' to class='mysql' ) , Servername, dbname, port, username, auth-mode, sslmode, schema, single-node, workgroup-auth-mode.

Tabcmd publish with .twb created via Document API

I can successfully create a .twb file via the Document API, but attempting to publish it to my Tableau Server via Tabcmd results in an unexpected error:

Bad request
unexpected error occurred opening the packaged workbook.

Attached is the template workbook created in Tableau Desktop (superstore_sales.twb) and one of the workbooks created from that template via the Document API (superstore_sales_arizona.twb)

superstore_twbs.zip

Validation at the document level and at the "Tableau" level

(copied from developer forum post)

Validate that the document is correct (valid XML) and matches the known format of the Tableau document for that target product version, then validate that the document in terms of acceptable Tableau content (are the data connection class types valid, do the columns added to the rows/columns shelves actually present in the datasource, etc.). In other words, both syntactic and semantic validation.

Add port attribute to Connection object

Can we add the port attribute to connection. This might be useful before the full document API is implemented. Some datasource changes could need to change the port in addition to the server etc. Some changes are a bit more involved but could include a port change. For example migrating from AWS RDS Postgres to Redshift might mean changing port from 5432 to 5439. A change of database variant could require other changes of course - schema in the relation for example.

Add a Contributing.md file

Most projects that accept contributors include a contributing.md file which outlines the processes to follow for contributing to the project. While we have the greater tableau.github.io that includes much of that information, calling out procedures locally would be useful for discoverability. We should also include a link back to tableau.github.io's contributing page.

Updating field values and adding aliases/calculated fields

Hello everyone,

I have the following requirements:

  • The ability to modify the caption, datatype, role and type of a field (aka ) inside a .tds-file.
  • The ability to modify and add aliases (the kind of alias) to a given field inside a .tds-file.
  • The ability to modify, and add calculated fields inside a .tds-file.

Since the current version of document-api-python doesn't seem to support this, I'm in the process of forking this repository and implementing those features. I have currently implemented a first version of requirements 1 and 2, and will probably have time for the calculated fields and polishing (aka tests) at some point next week.

Changes are currently in the branch "feature_modify_fields" in my fork. I'd love to hear some feedback about the implementation and the general direction I'm going with this fork. If there is any interest in those features, I'd be happy to start the internal process of getting the CLA underway.

Greetings from Germany

Updating Hortonworks (Hadoop) connection attributes

Hi,

I’m interested in using the Document API to update the connection type for .twbx files. I would like to update the following fields for a Hortonworks Hadoop Hive connection type:
• Server Name
• Authentication
• Realm
• Host FQDN
• Service Name

If this is not possible with the current version, I would like to submit this as a feature request. These are only the fields that I need, but all field options on the Hortonworks connector would be helpful as well.

It would be great if I could do this not only locally, but also with a workbook that is published on Tableau Server. The Tableau Server UI doesn’t allow me to edit the authentication type for a data source and I haven’t seen a way to do this with tabcmd.

Also, I tried to access the Examples link at the bottom of the Github page (https://github.com/tableau/document-api-python) and it seems like the link is dead.

Thanks for your help!

Alex

Datasource needs to support changing the caption which is not currently supported.

I'm changing a workbook's datasource and then republishing to Server. However, the Data Source Name in Vizportal shows the old name. I've tried changing the datasource.caption to no avail. The only other place the ds name shows up in my workbook's XML is in worksheets.table.view.datasources:

<table>
        <view>
          <datasources>
            <datasource caption='<OLD CAPTION DATASOURCE>' name='<DS NAME>' />```

Change database table in a workbook

I've run the sample code to change the database in a workbook. I can change the Server name, the Database name, the User name, and the Port, but there does not seem to be any way to change the database table. This is what I can do.
sourceWB.datasources[i].connections[0].server = 'myServer'
sourceWB.datasources[i].connections[0].dbname = 'myDbName'
sourceWB.datasources[i].connections[0].username = 'myUserName'

What I'd like to be able to do is specify a new database table as well. Of course this is a little more complicated if I'm joining multiple tables but all I need is to be able to iterate through the tables in each connection and change them. I don't need to be able to change the joins.
I want to do something like this:

for i in range(0, len(sourceWB.datasources[0].connections[0].tables)):
      sourceWB.datasources[0].connections[0].tables[i] = 'tableName'

Support for Files in Packaged Files

(copied from developer forum post)

Being able to tie the curiously named "Tableau SDK" aka Extract API into this project would help us complete the circle of offering data => TWBX capabilities for customers. To that end, we'd need to able to manipulate the packaged files through the API, even though this just means managing files in a ZIP. This is actually slightly more complicated, though, as we need to make sure the references in the TWB/TDS stay correct even as the files are replaced.

The files in packaged workbooks include: Images, Data, Shadow Extracts (though I have no idea what we'd do with those), ummm...and I guess you could lump Shapes into that list (though those are base64-encoded and included in the TWB itself). If those were addressable through the API, it'd be a real timesaver.

Versioning

(copied from developer forum post)

The ability to target a specific version of the software is a must, especially for OEM customers. Therefore the API should expose different interfaces based on the version of the workbook / datasource we're targeting.

Exception when building xpath query

I managed to trigger this when downloading workbooks from alpo.

This was done against the HEAD of a local copy of #45

`Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementPath.py", line 263, in iterfind
selector = _cache[cache_key]
KeyError: (".//metadata-record[@Class='column'][local-name='[Today's Date]']", None)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "runner.py", line 30, in
fields = wb_model.datasources[0].fields
File "/Users/tdoyle/Documents/py/document-api-python/tableaudocumentapi/datasource.py", line 135, in fields
self._fields = self._get_all_fields()
File "/Users/tdoyle/Documents/py/document-api-python/tableaudocumentapi/datasource.py", line 141, in _get_all_fields
return collections.OrderedDict([(k, v) for k, v in column_objects])
File "/Users/tdoyle/Documents/py/document-api-python/tableaudocumentapi/datasource.py", line 141, in
return collections.OrderedDict([(k, v) for k, v in column_objects])
File "/Users/tdoyle/Documents/py/document-api-python/tableaudocumentapi/datasource.py", line 140, in
for xml in self._datasourceTree.findall('.//column'))
File "/Users/tdoyle/Documents/py/document-api-python/tableaudocumentapi/datasource.py", line 17, in _mapping_from_xml
metadata_record = root_xml.find(xpath)
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementTree.py", line 649, in find
return self._root.find(path, namespaces)
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementPath.py", line 298, in find
return next(iterfind(elem, path, namespaces), None)
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementPath.py", line 277, in iterfind
selector.append(ops[token[0]](next, token))
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementPath.py", line 233, in prepare_predicate
raise SyntaxError("invalid predicate")
SyntaxError: invalid predicate`

Need the ability to read from and write to the <desc> element within .tds files

In order to more efficiently manage metadata documentation regarding fields in Tableau data sources, I need the ability to read the information in the element for a column so that I can write that information to a database. Conversely I need to be able to write information back out to this element after reading the information stored in a database. It will also be necessary for me to be able to determine what table a field comes from. Below is an example of the data I want to be able to read/write.

  <column datatype='string' name='[STATUS]' role='dimension' type='nominal'>
    <desc>
      <formatted-text>
        <run bold='true' fontsize='10'>Case Mix Index VAL</run>
        <run fontcolor='#686868'>&#10;The MSDRG weight for inpatients with charges &gt; 0. Does not include inpatient rehab, normal&#10;newborns or MSDRG 999.</run>
        <run bold='true' fontcolor='#297a98'>&#10;&#10;HSP_ACCT_MULT_DRGS.DRG_WEIGHT (HAR 651)</run>
      </formatted-text>
    </desc>
  </column>

Coverage Report

(From my HEAD, which includes #62)

Name                                   Stmts   Miss  Cover
----------------------------------------------------------
tableaudocumentapi/__init__.py             6      0   100%
tableaudocumentapi/connection.py          29      1    97%
tableaudocumentapi/datasource.py          85      0   100%
tableaudocumentapi/field.py               83      7    92%
tableaudocumentapi/multilookup_dict.py    45     10    78%
tableaudocumentapi/workbook.py            55      2    96%
tableaudocumentapi/xfile.py               65      3    95%
test/__init__.py                           2      0   100%
test/bvt.py                              208      1    99%
test/test_datasource.py                   64      0   100%
test/test_multidict.py                    38      1    97%
test/test_workbook.py                      9      0   100%

Replace Datasource

(via beta feedback)

Replace data source -- we have QA and prod data sources and it would be good to be able to change the data source a worksheet uses programmatically

Replace Extract in Packaged Workbook

I'm trying to figure out how to replace an extract in a packaged workbook with a different extract that has the same table definition. I can use the Tableau SDK to create a new .tde, but it looks like there aren't python methods to replace an extract in a packaged workbook.

I'm imagining this would look something like this:

from tableaudocumentapi import Workbook

sourceWB = Workbook('workbook.twbx')
sourceWB.datasources[0].connections[0].extract = "MY-NEW-EXTRACT.tde"
#OR sourceWB.datasources[0].connections[0].extract.package('new-extract.tde')

sourceWB.save_as('new_workbook.twbx')

I can replace an extract in a .twb file by hacking the xml and changing references, but the "save_as" method doesn't actually package the workbook so that the connection is no longer dependent on a .tde file.

Get Columns

From conversations with @t8y8 , we need the ability to get the list of columns in a datasource.

Parameter Values

(via beta feedback)

Populate parameter values -- we have some workbooks where we have to refresh the list of values each parameter has. This would allow us to implement a poor man's "dynamic parameter" that can be populated from a DB table

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.