GithubHelp home page GithubHelp logo

lbnl-eta / adapter Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 355 KB

The Adapter software is developed at the Energy Efficiency Standards Department and it provides a convenient data table loader from various formats such as xlsx, csv, db (sqlite database), and sqlalchemy. Its main feature is the ability to convert data tables identified in one main and optionally one or more additional input files into database tables and Pandas DataFrames for downstream usage in any compatible software.

Python 100.00%

adapter's People

Contributors

0xd5dc avatar evaneill avatar lyralan avatar milicag avatar rhosbach avatar taburke avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

lyralan

adapter's Issues

Add example to readme on how to handle multiple platforms

Add an example below the current similar examples:

To automatically convert paths between platforms, for example if you are using a VPN connection to access input data files, use the mapping argument:

from adapter.i_o import IO

input_loader = IO(<fullpath_to_the_main_input_file>, 
                               os_mapping={'win32': 'C:', 'darwin': '/Volumes/A', 'linux': '/media/A'})
data = input_loader.load()

xlwings incompetible with Linux

Unfortunately, xlwings couldn't run successfully on Linux because of missing dependencies.
ModuleNotFoundError: No module named 'aem'

29_openpyxl_keep_vba

keep_vba is set to True in Adapter's Excel class (line), which revokes some zipfile methods and raise value error in a new connector PR (link).

To solve this issue, just change keep_vba to False, as with openpyxl's default. As far as I know, we only read non-visual data in Excel when running python tools (we don't normally have images in input files), setting keep_vba to True doesn't have an advantage over False. I'm wondering if @0xd5dc could just set this in the current open PR, as it's just a one line change? I can also make another PR if we treat it as a separate problem.

ps: keep_vba controls whether any Visual Basic elements (images and charts) are preserved or not (default). If preserved, they are not editable (source).

win path issue X: vs X:\

dir1 = os.path.join(
            "X:",  # will get converted for a given OS
            "First_Level",
            "Second_Level",
            "Third_Level",
            "input",
        )
>> dir1
>> "X:First_Level\Second_Level\Third_Level\input"
dir2=r"X:\First_Level\Second_Level\Third_Level\input"

dir1 and dir2 are not equal.
Adapter should handle both cases.

Simplify `comm.tools.user_select_file()` method

The comm.tools.user_select_file() is unnecessarily complex, and it relies on pywin32 for WIndows OS. Ultimately, the functionality of this method can be achieved with the tkinter library (as is already implemented for OSX platform).

Using tkinter should make this method more robust and maintainable in the future.

Reduce Excessive Logging Output

The current logging implementation in our project generates an excessive amount of log messages, leading to log files becoming cluttered and difficult to analyze. This issue aims to address this problem by implementing a more streamlined and efficient logging strategy.

reformat folder name with timestamp

utilize the adapter to generate folder names with a timestamp in two formats, short format(default), and long format.
For example

long format short format
prefix_2022_07_25-13h_50m abbr.ver._220725_1350
product_version_branch_2022_08_02_14h_01m p100_220802_1401

Update user_select_file to remove unnecessary dependencies

The user_select_file will try to import either win32ui, win32con, or tkinter depending on sys.platform. But the win32ui and win32con seem to induce a dependency on pywin32==225 that I see included in the setup.py of other tools that use this, which is undesirable. It appears that using later versions of pywin32 can result in an error on windows.

I imagine it is possible nowadays to find a single package (native or otherwise) that supports file prompt dialogs across OSX, windows, and linux.

Low priority because it works as-is, but if managing dependencies for these imports can be a touch cumbersome now, it can only get worse in the future

Test input file path - readin for functional test modules

We have some repeating code in the functional tests to handle variation in OS and the Adapter should be able to make the repeating sections obsolete.

This issue is to ensure:

  • Adapter IO load can load in data from an input file, for any OS;
  • all the error checking and logging that is currently handled in the functional test occurs in the Adapter;
  • create corresponding issues on our repos to replace therefore obsolete code in the functional tests of the inhouse packages.

Db always downloads all tables

all_dict_of_dfs = self.db.tables2dict(close=True)

the tables2dict method doesn't take table_names, and will request all tables get loaded from the database no matter how much load is asked for.

Could super speed up asking for individual (or small subset of) tables in most cases by updating this.

The error for a requested table not existing could be moved down into tables2dict as well

Update sqlalchemy and/or pandas dependency

The current setup.py fixes the sqlalchemy requirement at 1.4.29, though for versions of pandas ≥2.2.0 (which is not restricted in setup.py), this will result in some adapter dependents no longer working.

It seems to make sense to commit to having dependents of adapter become compatible with pandas ≥2.2.0, and then commit to updating adapter to require pandas≥2.2.0 and newer sqlalchemy.

Related to this issue, which is a matter of newer versions of pandas requiring newer versions of openpyxl.

update_excel

With the latest Openpyxl (v3.1.0, released on 2023.01.31) leading to errors

AttributeError: 'DefinedNameDict' object has no attribute 'definedName'

when loading an Excel with named ranges in line 77 in adapter, an update in Adapter to be compatible with newer Openpyxl may be needed. An easy solution is to change line 77 to

all_input_ranges = {object_range for object_range in self.wb.defined_names}

However, this is only for folks who use v3.1.0, and this change will lead to backwards incompatibility for users with older Openpyxl versions.

Also, it may be good to check if there's any named ranges or tables in the input file if users specify kind='ranges' or kind='tables'.

Update openpyxl dependency

The current setup.py fixes the openpyxl dependency at 3.0.9 because later versions break one very specific segment of code in adapter.to_python here.

I think this could be fixed with:

if hasattr(self.wb.defined_names,"definedName"):
  # This case is for openpyxl <3.1.0
  all_input_ranges = {object_range.name for object_range in self.wb.defined_names.definedName}
else:
  # This case is for openpyxl ≥3.1.0
  all_input_ranges = set(self.wb.defined_names.keys())

This change was capable of getting around the error in my local environment and working, but it should be tried and ran through the adapter tests. If all seems good, then I'd advise just getting rid of the version requirement on openpyxl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.