GithubHelp home page GithubHelp logo

dsv-analyser's Introduction

DSV-Analyser

Description:

Analyse CSV, TSV files for representation to DB format, analysing returns:

  1. Columns amount (with fileds title on demand)
  2. Each column's maximum field length in bytes
  3. Each column's possible data type:
  • TEXT - contains any data
  • NUMBER - contains only digits
  • DOUBLE - contains only digits with decimal separator
  • all these types enumerated in DSV_TYPES;

Detection of possible datatypes and lengths of the fields implemented by reading/detection byte-by-byte just in time.

Usage:

DSV_Analyser obj(filepath, fields_delimiter, decimal_delimiter);

Instance should be initialized with filepath(fullpath to the file), fields_delimiter(1 char, for CSV - comma, for TSV - '\t' and etc), decimal_delimiter(dot or comma depends of your file);

obj.Analyse(hasTitles);

Method Analyse() will open file and start analysing. hasTitles is boolean argument which is true by default,setting this argument to true will append Titles into Columns vector otherwise Titles will be empty in Columns after Analyse completion.

After analyse we can use obj.Columns for access to analysed information. obj.Columns - vector of DSV_FieldInfo structures. DSV_FieldInfo { Title, Length, Type } obj.Columns[index].Type - will return all types that were found in the Column with index while analysing:

0x01 - TEXT
0x02 - NUMBER
0x04 - DOUBLE

So if we need to check does selected Column contains any DOUBLE values or not, we should use bitmask:

obj.Coulmns[index].Type & DSV_TYPES::DOUBLE_TYPE

Additional Information:

dsv-analyser's People

Contributors

sn0w1eo avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

zerocry

dsv-analyser's Issues

DSV field+type detection

Implement class which detects fields in DSV file separeted by delimiter (,.;\t). It should have basic type detection implemented [TEXT, INTEGER, DOUBLE].

Fix ReadHeader() enclosed detection

Implement enclose detection in ReadHeader() method #3 .
Reason: 1st line (header line) with Column names can be enclosed even if there are no titles of columns declared.

Interpolation in quotes for CSV

Implement detection of interpolation in CSV file's fields. Interpolation for CSV must be implemented accroding to RFC 4180.

Implement Analyse() in thread

Implementation Analyse() in thread, meanwhile function printProgressBar in another thread should show information about current execution of Analyse method.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.