Light

slotix / dsv-analyser Goto Github PK

View Code? Open in Web Editor NEW

3.0 5.0 1.0 14 KB

Analyse data types of the CSV/TSV fields just in time

C++ 100.00%

dsv-analyser's Introduction

DSV-Analyser

Description:

Analyse CSV, TSV files for representation to DB format, analysing returns:

Columns amount (with fileds title on demand)
Each column's maximum field length in bytes
Each column's possible data type:

TEXT - contains any data
NUMBER - contains only digits
DOUBLE - contains only digits with decimal separator
all these types enumerated in DSV_TYPES;

Detection of possible datatypes and lengths of the fields implemented by reading/detection byte-by-byte just in time.

Usage:

DSV_Analyser obj(filepath, fields_delimiter, decimal_delimiter);

Instance should be initialized with filepath(fullpath to the file), fields_delimiter(1 char, for CSV - comma, for TSV - '\t' and etc), decimal_delimiter(dot or comma depends of your file);

obj.Analyse(hasTitles);

Method Analyse() will open file and start analysing. hasTitles is boolean argument which is true by default,setting this argument to true will append Titles into Columns vector otherwise Titles will be empty in Columns after Analyse completion.

After analyse we can use obj.Columns for access to analysed information. obj.Columns - vector of DSV_FieldInfo structures. DSV_FieldInfo { Title, Length, Type } obj.Columns[index].Type - will return all types that were found in the Column with index while analysing:

0x01 - TEXT
0x02 - NUMBER
0x04 - DOUBLE

So if we need to check does selected Column contains any DOUBLE values or not, we should use bitmask:

obj.Coulmns[index].Type & DSV_TYPES::DOUBLE_TYPE

Additional Information:

dsv-analyser's People

Contributors

Stargazers

Watchers

Forkers

dsv-analyser's Issues

DSV field+type detection

Implement class which detects fields in DSV file separeted by delimiter (,.;\t). It should have basic type detection implemented [TEXT, INTEGER, DOUBLE].

Fix ReadHeader() enclosed detection

Implement enclose detection in ReadHeader() method #3 .
Reason: 1st line (header line) with Column names can be enclosed even if there are no titles of columns declared.

Interpolation in quotes for CSV

Implement detection of interpolation in CSV file's fields. Interpolation for CSV must be implemented accroding to RFC 4180.

Implement Analyse() in thread

Implementation Analyse() in thread, meanwhile function printProgressBar in another thread should show information about current execution of Analyse method.

Data storage format for information about columns

Implementation of structure for gathering information about columns:

TITLE - string, 0 to 256 bytes
LENGHT - length of maximum analysed value in DSV fields represented in bytes
TYPE - implemented in #1

Unit tests using GOOGLE TEST

Implement unit tests with Google Tests library.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs