GithubHelp home page GithubHelp logo

charl-potgieter / powerqueryconsolidator Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 143 KB

Function for consolidating various file types in Power Query (Excel or Power BI)

License: MIT License

Objective-C 88.01% MATLAB 3.81% M 7.38% Mercury 0.80%

powerqueryconsolidator's Introduction

Power Query Consolidator

Overview

Power Query Consolidator aims to ease the process of consolidating local, non-database files such as csv or spreadsheet before importing into Excel or Power BI.
A data schema is utlised for below purposes:

  • Standardise field names across data sources
  • Control the fields selected in each source
  • Specify field types
  • Enable the creation of calculated columns via the data schema


"Installation"

The only 2 power query functions required to be incorporated in custom Power BI or Excel projects are fn_Consolidation and fn_FieldNamesDataVersusSchemaCheck but it is recommended to download the Excel files with example data to view workings:

  • Download the latest zip file here https://github.com/charl-potgieter/PowerQueryConsolidator/releases/latest
  • Unzip in your folder of choice
  • Update Power Query parameter "Example_DataRootFolder" to contain the the file path of the ExampleData root folder
  • Choose one of the 3 data sets using the Power Query parameter Example_SelectedDataSet
  • Consolidated output should then appear in the Example_Consolidation query
  • Helper queries Example_FieldsInDataNotSchema and Example_FieldsInSchemaNotInData list any mismatched fields betwen the schema file and the data files


Parameters: fn_Consolidation


Parameter Description
DataAccessFunction Any function that takes parameters for folder and file name and returns a table of data
SourceFolder The path containing the data to be consolidated
SchemaFilePath The path containing the data schema file in Excel format as per ExampleDataSchema.xlsx included in the zip file download per latest release in this repository
DataSourceName The name of the data source listed in the schema file
FilterFromValue (optional) Utilised to filter files in data source folder
FilterToValue (optional) Utilised to filter files in data source folder
IsDevMode (optional) Boolean value. If set to true only one file of data is returned restricted to 100 rows


Parameters: fn_FieldNamesDataVersusSchemaCheck


Parameter Description
DataAccessFunction As per fn_Consolidation
SourceFolder As per fn_Consolidation
SchemaFilePath As per fn_Consolidation
DataSourceName As per fn_Consolidation
DirectionToCheck Takes one of the following text inputs: "Fields in data not in schema" or "Fields in schema not in data"


The data schema file


The data schema file needs to be in the format of ExampleDataSchema.xlsx included in this repository with the below fields
  • FieldName representing the column header to be generated in the output table
  • FieldTypeAsText representing the field type (as listed in the DataTypes tab of the schema file)
  • One column for each data source representing a folder path containing source files. The column header represents the DataSourceName listed in the parameters above. The data in this column is either the original column name in the data source file or a calculated column formula (refer below)


Calculated columns


Calculated columns can be captured in the schema using <> brackets for example entering <[FirstAmount] * 2> in the schema file where [FirstAmount] is an existing column in the source data will evaluate to the following M code: Table.AddColumn(ExistingTable, "FieldNamePerSchemaFile", each [FirstAmount]* 2, FieldTypePerSchemaFile)
See example data files and schema for workings

Referencing source folder and file metadata in calculated columns

The source folder and file metadata can be referenced in calculated columns as per above process using the below fields. Note that the prefix "PQ.Consol." is added to the standard field names returned by Power Query function Folder.Files() to avoid potential name conflict wth columns in underlying data files.

  • [PQ.Consol.Content]
  • [PQ.Consol.Name]
  • [PQ.Consol.Extension]
  • [PQ.Consol.Date accessed]
  • [PQ.Consol.Date modified]
  • [PQ.Consol.Date created]
  • [PQ.Consol.Attributes]
  • [PQ.Consol.Folder Path]

powerqueryconsolidator's People

Contributors

charl-potgieter avatar

Stargazers

Jimmy Briggs avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.