GithubHelp home page GithubHelp logo

hosseinmoein / dataframe Goto Github PK

View Code? Open in Web Editor NEW
2.3K 2.3K 289.0 44.47 MB

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage

License: BSD 3-Clause "New" or "Revised" License

C++ 98.85% Makefile 0.44% Shell 0.01% CMake 0.43% Python 0.19% C 0.08%
ai cpp data-analysis data-science dataframe financial-data-analysis financial-engineering heterogeneous-data large-data machine-learning multidimensional-data numerical-analysis pandas polars statistical statistical-analysis tensor tensorboard trading-algorithms trading-strategies

dataframe's People

Contributors

alexandre-p-j avatar bplaa-yai avatar edouardberthe avatar enricodetoma avatar gjacquenot avatar hosseinmoein avatar jchen8tw avatar jmakov avatar jmarrec avatar justinjk007 avatar nikbomb avatar schorrm avatar spaceim avatar spotys avatar theirix avatar thekvs avatar yssource avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataframe's Issues

lack of const methods

Hi,
some methods of library classes can be const. Without const it is hard to use const instances/references to const of that classes.

Load DataFrame from Arrow Table/csv file

Here is my early code for reading a csv file into a DataFrame via Apache Arrow. Before I flesh out all the data types, I wanted to verify that this approach looks good? Also, is there something like a pretty print for dataframes?

// ArrowCsv.cpp
// SBW 2020.04.07

#include <cstdint>
#include <memory>
#include <numeric>
#include <string>
#include <iostream>

#include "arrow/api.h"
#include "arrow/filesystem/localfs.h"
#include "arrow/csv/api.h"
#include "arrow/result.h"

// SBW 2020.04.03 Attach Arrow table to DataFrame.
#define LIBRARY_EXPORTS
#include <DataFrame/DataFrame.h>
using namespace hmdf;
typedef StdDataFrame<unsigned long> MyDataFrame;

using namespace std;
using namespace arrow;

template<typename I, typename  H>
bool TableToDataFrame(const Table& Tbl, DataFrame<I, H>& Df)
{
	int64_t Rows = Tbl.num_rows();
	int Cols = Tbl.num_columns();
	for (int c = 0; c < Cols; c++)
	{
		auto f = Tbl.field(c);
		const string& Name = f->name();
		int TypeId = f->type()->id();
		switch (TypeId)
		{
		case Type::STRING:
		{
			std::vector<string>& vec = Df.create_column<string>(Name.c_str());
			vec.assign(Rows, "");
			auto pChArray = Tbl.column(c);
			int NChunks = pChArray->num_chunks();
			int i = 0;
			for (int n = 0; n < NChunks; n++)
			{
				auto pArray = pChArray->chunk(n);
				int64_t ArrayRows = pArray->length();
				auto pTypedArray = std::static_pointer_cast<arrow::StringArray>(pArray);
				// const string* pData = pTypedArray->raw_values();
				for (int j = 0; j < ArrayRows; j++)
					vec[i++] = pTypedArray->GetString(j);
			}
			break;
		}
		case Type::FLOAT:
		{
			std::vector<float>& vec = Df.create_column<float>(Name.c_str());
			vec.assign(Rows, 0.0);
			auto pChArray = Tbl.column(c);
			int NChunks = pChArray->num_chunks();
			int i = 0;
			for (int n = 0; n < NChunks; n++)
			{
				auto pArray = pChArray->chunk(n);
				int64_t ArrayRows = pArray->length();
				auto pTypedArray = std::static_pointer_cast<arrow::FloatArray>(pArray);
				const float* pData = pTypedArray->raw_values();
				for (int j = 0; j < ArrayRows; j++)
					vec[i++] = pData[j];
			}
			break;
		}
		case Type::DOUBLE:
		{
			std::vector<double>& vec = Df.create_column<double>(Name.c_str());
			vec.assign(Rows, 0.0);
			auto pChArray = Tbl.column(c);
			int NChunks = pChArray->num_chunks();
			int i = 0;
			for (int n = 0; n < NChunks; n++)
			{
				auto pArray = pChArray->chunk(n);
				int64_t ArrayRows = pArray->length();
				auto pTypedArray = std::static_pointer_cast<arrow::DoubleArray>(pArray);
				const double* pData = pTypedArray->raw_values();
				for (int j = 0; j < ArrayRows; j++)
					vec[i++] = pData[j];
			}
			break;
		}
		default:
			assert(false); // unknown type
		}
	}

	return(true);
}

int main(int argc, char *argv[])
{
	auto fs = make_shared<fs::LocalFileSystem>();
	auto r_input = fs->OpenInputStream("c:/temp/Test.csv");

	auto pool = default_memory_pool();
	auto read_options = arrow::csv::ReadOptions::Defaults();
	auto parse_options = arrow::csv::ParseOptions::Defaults();
	auto convert_options = arrow::csv::ConvertOptions::Defaults();

	auto r_table_reader = csv::TableReader::Make(pool, r_input.ValueOrDie(),
		read_options, parse_options, convert_options);
	auto r_read = r_table_reader.ValueOrDie()->Read();
	auto pTable = r_read.ValueOrDie();

	PrettyPrintOptions options{0};
	arrow::PrettyPrint(*pTable, options, &std::cout);
	//arrow::PrettyPrint(*pTable->schema(), options, &std::cout);

	// SBW 2020.04.03 Attach Arrow table to DataFrame.
	MyDataFrame df;
	// df_read.read("c:/temp/Test.csv");
	TableToDataFrame(*pTable, df);
	df.write<std::ostream, int, unsigned long, double, std::string>(std::cout);

	return 1;
}

Questions about using the libDataSci.a in Ubuntu?

Thank you for implementing such a powerful tool.

I successfully compiled and ran the test on ubuntu 18.04.2,and libDataSci.a generated in the folder Linux.GCC64 and Linux.GCC64D.

But I don't know how to use this static library.
I tried to copy the header files in include folder and libDataSci.a to the same folder,
create a new .c file, #include "DataFrame.h", and trying to run the example code at the project homepage.
I used the g++ compiler and the comand is
"g++ testDF.c -L. -lDataSci".

the output message is
"testDF.c: In function ‘int main(int, char**)’:
testDF.c:13:9: error: ‘DataFrame’ does not name a type
typedef DataFrame<unsigned long, std::vector> MyDataFrame;
^~~~~~~~~
......
"
seem like I did not use the libDataSci.a correctly.

Could you please add some more basic using instructions in the tutorial for the people who are not familiar with C++ and Linux?

Thanks for your time.

Support fillna?

Is there any plan to support fillna, as fillna

  • df.fillna(0)
  • df.fillna(method='ffill')

Thanks.

Query interface examples

Could you please add examples on how to query/select? E.g. how does a pandas query like below look in DataFrame?

df_filtered = df[df.something > 5]

Unit tests fail on Windows when compiled with MinGW-W64

$ ctest.exe
Test project C:/Users/expadmin/Work/DataFrame/build
    Start 1: dataframe_tester
1/5 Test #1: dataframe_tester .................***Failed    0.05 sec
    Start 2: vectors_tester
2/5 Test #2: vectors_tester ...................   Passed    0.01 sec
    Start 3: date_time_tester
3/5 Test #3: date_time_tester .................***Failed    0.02 sec
    Start 4: obj_vector_tester
4/5 Test #4: obj_vector_tester ................   Passed    0.01 sec
    Start 5: obj_vector_erase_tester
5/5 Test #5: obj_vector_erase_tester ..........   Passed    0.01 sec

60% tests passed, 2 tests failed out of 5

Total Test time (real) =   0.10 sec

The following tests FAILED:
          1 - dataframe_tester (Failed)
          3 - date_time_tester (Failed)
Errors while running CTest
$ g++.exe --version
g++.exe (i686-posix-dwarf-rev0, Built by MinGW-W64 project) 8.1.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Also please consider adding MinGW suite on AppVeyor.

static member vectors_ in HeteroVector class causes linker error

Hi,
a decision to have static member defined in .tcc file looks problematic.
If I supress inclusion of HeteroVector.tcc by defining HMDF_DO_NOT_INCLUDE_TCC_FILES, I will get "unresolved external symbols".
If I allow inclusion of HeteroVector.tcc, I will get "already defined" vectors_ in other translation unit.

Is the library supposed to be used in some other way?

Drop shared option for Windows, porting with conan supported.

Hi, @hosseinmoein
I was porting this project to be supported with conan-center-index .

But , I was trying to access a symbol which wasn't exported, and DataFrame project doesn't support __declspec(dllexport)/__declspec(import). Even if I force CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS=ON it won't work.

Actually, I have very little knowledge about how to compile with Visual Studio.
So, please take a look into conan-io/conan-center-index#542, and consider to support it with conan in the next release.
Thanks.

Concat two dataframes

I was looking for something similar to pd.concat but I was unable to. Is there any way to achieve that? If not, what would you suggest as an alternative?

Prefix include folder with DataFrame

It'll be safer to integrate this in projects, if the include folder itself had DataFrame/ folder in there, e.g.
instead of include/DataFrame.h to be include/DataFrame/DataFrame.h - this way any extra includes in this folder won't collide with existing others (from other packages).

In short it's better to have

#include "DataFrame/DataFrame.h" (or #include <DataFrame/DataFrame.h>) rather than
#include "DataFrame.h" or (#include <DataFrame.h>)

I don`t know how to install it in windows Clion

ive just downloaded the file but i dont know how to install or lead it in
i`ve tried to run the project in Clion
then i got a floder named cmake-build-debug with bin
after entered into bin,i ran the file dataframe_performance.exe
however it crashed at once and said

terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

so would you please tell me how install it in windows

C2061 compile error in DataFrame.h

I downloaded the source and ran cmake to generate VS 2017 project files. ALL_BUILD ran fine except the one error shown below in the text and screenshot. The error is fixed if I change the diff_func return type from size_type to DataFrame::size_type. Just wanted to verify this is correct?

Severity Code Description Project File Line Suppression State
Error C2061 syntax error: identifier 'size_type' dataframe_tester_2 f:\dev\dataframe-master\include\dataframe\dataframe.h 277

image

Problem with CMakeFiles.txt

CMake Error at CMakeLists.txt:107 (set_target_properties):
set_target_properties Can not find target to add properties to: Dataframe

CMake Error at CMakeLists.txt:113 (target_include_directories):
Cannot specify include directories for target "Dataframe" which is not
built by this project.

CMake Error at CMakeLists.txt:117 (install):
install TARGETS given target "Dataframe" which does not exist in this
directory.

-- Copying files for testing
-- Copying files for testing - done
-- Configuring incomplete, errors occurred!

Exceptions thrown in dataframe_tester

I've built DataFrame with VS 2017 on Windows 10. A number of exceptions are thrown when I run dataframe_tester.exe.

The first occurs here, I think this test is wrong since the size of dvec is 1:

dvec = df3.get_column<double> ("dbl_col");
dvec2 = df3.get_column<double> ("dbl_col_2");
assert(dvec.size() == 1);
// SBW throws exception.
// assert(dvec[5] == 2.2345);

Other exceptions occur during the following tests, I stopped testing at this point:

// SBW throws exception.
// test_shifting_up_down();
test_rotating_up_down();
test_dataframe_with_datetime();
test_dataframe_friend_plus_operator();
test_dataframe_friend_minus_operator();
test_dataframe_friend_multiplies_operator();
test_dataframe_friend_divides_operator();
test_fill_missing_values();
test_fill_missing_fill_forward();
test_fill_missing_fill_backward();
// SBW throws exception.
// test_fill_missing_fill_linear_interpolation();
test_drop_missing_all_no_drop();
// SBW throws exception.
// test_drop_missing_all_2_drop();
// SBW throws exception.
// test_drop_missing_any();

Need help for Build for IOS

I want to build this project for ios in xcode . I was able to build for mac and generate .dylib using cmake . Can i build it using cmake with little modification and . or can i build from source ? is there any dependencies ?

Need any help with refactoring and other simple stuff ?

Hi,
I am somewhat experienced in python, but I have limited experience in c++.
However the project really seems interesting and I want to contribute.
Is there any work around for a newbie maybe like writing docs, refactoring,
or writing simple functions, implementing simple interfaces etc.
Any kind of road map and some sort of guide for contribution would be really good.

Thread safety ?

First of all, thanks for this nice library !

I'm having an issue, which seems to be related to thread safety. I understand that the library is not thread safe, as in "multiple threads cannot access the same DataFrame". In my case, multiple threads cannot work simultaneously on their own DataFrames.
I'm getting (frequent) segfaults with the following test case :

#include <boost/thread.hpp>
#include <DataFrame/DataFrame.h>

#define VECSIZE 10000000

using namespace hmdf;

typedef StdDataFrame<unsigned int> MyDataFrame;

void do_work()
{
    MyDataFrame df;
    std::vector<int> vec;

    for (int i = 0; i < VECSIZE; i++)
        vec.push_back(i);
    
    df.load_data(
		MyDataFrame::gen_sequence_index(0, VECSIZE, 1),
		std::make_pair("col1", vec)
		 );

    for (int i = 0; i < VECSIZE; i++)
        int j = df.get_column<int>("col1")[i];
}

int main(int argc, char **argv)
{
    boost::thread t1(do_work);
    boost::thread t2(do_work);
    boost::thread t3(do_work);
    boost::thread t4(do_work);
    boost::thread t5(do_work);
    boost::thread t6(do_work);
    boost::thread t7(do_work);
    boost::thread t8(do_work);
    t1.join();
    t2.join();
    t3.join();
    t4.join();
    t5.join();
    t6.join();
    t7.join();
    t8.join();
}

If that matters, I'm linking boost_thread and DataFrame as static libraries, on a Linux (Fedora 30) platform.
Single threaded variant (or no thread at all) does not exhibit any issue.

C1001: Internal compiler error

Hello,

I am currently running into the following errors when including dataframe.h:

1>D:\cpplib\df\include\DataFrame\Vectors\VectorPtrView.h(59,1): warning C4251: 'hmdf::VectorPtrView::vector_': class 'std::vector<T*,std::allocator<T*>>' needs to have dll-interface to be used by clients of class 'hmdf::VectorPtrView'
1>D:\cpplib\df\include\DataFrame\Vectors\VectorPtrView.h(57): message : see declaration of 'std::vector<T*,std::allocator<T*>>'
1>D:\cpplib\df\include\DataFrame\Vectors\VectorPtrView.h(686): message : see reference to class template instantiation 'hmdf::VectorPtrView' being compiled
1>D:\cpplib\df\include\DataFrame\Vectors\HeteroPtrView.h(114,1): warning C4251: 'hmdf::HeteroPtrView::views_': class 'std::unordered_map<const hmdf::HeteroPtrView*,hmdf::VectorPtrView,std::hash<const hmdf::HeteroPtrView >,std::equal_to<const hmdf::HeteroPtrView >,std::allocator<std::pair<const hmdf::HeteroPtrViewconst ,hmdf::VectorPtrView>>>' needs to have dll-interface to be used by clients of struct 'hmdf::HeteroPtrView'
1>D:\cpplib\df\include\DataFrame\Vectors\HeteroPtrView.h(113): message : see declaration of 'std::unordered_map<const hmdf::HeteroPtrView
,hmdf::VectorPtrView,std::hash<const hmdf::HeteroPtrView *>,std::equal_to<const hmdf::HeteroPtrView >,std::allocator<std::pair<const hmdf::HeteroPtrViewconst ,hmdf::VectorPtrView>>>'
1>D:\cpplib\df\include\DataFrame\Vectors\HeteroPtrView.h(119,10): warning C4251: 'hmdf::HeteroPtrView::clear_function_': class 'std::function<void (hmdf::HeteroPtrView &)>' needs to have dll-interface to be used by clients of struct 'hmdf::HeteroPtrView'
1>D:\cpplib\df\include\DataFrame\Vectors\HeteroPtrView.h(116): message : see declaration of 'std::function<void (hmdf::HeteroPtrView &)>'
1>D:\cpplib\df\include\DataFrame\Vectors\HeteroPtrView.h(123,10): warning C4251: 'hmdf::HeteroPtrView::copy_function_': class 'std::function<void (const hmdf::HeteroPtrView &,hmdf::HeteroPtrView &)>' needs to have dll-interface to be used by clients of struct 'hmdf::HeteroPtrView'
1>D:\cpplib\df\include\DataFrame\Vectors\HeteroPtrView.h(120): message : see declaration of 'std::function<void (const hmdf::HeteroPtrView &,hmdf::HeteroPtrView &)>'
1>D:\cpplib\df\include\DataFrame\Vectors\HeteroPtrView.h(127,10): warning C4251: 'hmdf::HeteroPtrView::move_function_': class 'std::function<void (hmdf::HeteroPtrView &,hmdf::HeteroPtrView &)>' needs to have dll-interface to be used by clients of struct 'hmdf::HeteroPtrView'
1>D:\cpplib\df\include\DataFrame\Vectors\HeteroPtrView.h(124): message : see declaration of 'std::function<void (hmdf::HeteroPtrView &,hmdf::HeteroPtrView &)>'
1>D:\cpplib\df\include\DataFrame\Utils\DateTime.h(178,41): warning C4251: 'hmdf::DateTime::dt_init_': class 'hmdf::DateTime::DT_initializer' needs to have dll-interface to be used by clients of class 'hmdf::DateTime'
1>D:\cpplib\df\include\DataFrame\Utils\DateTime.h(176): message : see declaration of 'hmdf::DateTime::DT_initializer'
1>D:\cpplib\df\include\DataFrame\Utils\FixedSizeString.h(106,11): error C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
1>D:\cpplib\df\include\DataFrame\Utils\FixedSizeString.h(115,11): error C4996: 'strncpy': This function or variable may be unsafe. Consider using strncpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
1>D:\cpplib\df\include\DataFrame\Utils\FixedSizeString.h(126,11): error C4996: 'strcat': This function or variable may be unsafe. Consider using strcat_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
1>D:\cpplib\df\include\DataFrame\Utils\FixedSizeString.h(217,29): error C4996: 'vsprintf': This function or variable may be unsafe. Consider using vsprintf_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
1>D:\cpplib\df\include\DataFrame\Utils\FixedSizeString.h(230,15): error C4996: 'vsprintf': This function or variable may be unsafe. Consider using vsprintf_s instead. To disable deprecation, use CRT_SECURE_NO_WARNINGS. See online help for details.
1>D:\cpplib\df\include\DataFrame\DataFrame.h(98,1): warning C4251: 'hmdf::DataFrame<I,H>::data
': class 'std::vector<H,std::allocator<_Other>>' needs to have dll-interface to be used by clients of class 'hmdf::DataFrame<I,H>'
1>D:\cpplib\df\include\DataFrame\DataFrame.h(71): message : see declaration of 'std::vector<H,std::allocator<Other>>'
1>D:\cpplib\df\include\DataFrame\DataFrame.h(2461): message : see reference to class template instantiation 'hmdf::DataFrame<I,H>' being compiled
1>D:\cpplib\df\include\DataFrame\DataFrame.h(100,1): warning C4251: 'hmdf::DataFrame<I,H>::column_tb
': class 'std::unordered_map<hmdf::DataFrame<I,H>::ColNameType,std::vector<H,std::allocator<_Other>>::size_type,std::hashhmdf::VirtualString,std::equal_to<hmdf::DataFrame<I,H>::ColNameType>,std::allocator<std::pair<const hmdf::DataFrame<I,H>::ColNameType,std::vector<H,std::allocator<_Other>>::size_type>>>' needs to have dll-interface to be used by clients of class 'hmdf::DataFrame<I,H>'
1>D:\cpplib\df\include\DataFrame\DataFrame.h(94): message : see declaration of 'std::unordered_map<hmdf::DataFrame<I,H>::ColNameType,std::vector<H,std::allocator<_Other>>::size_type,std::hashhmdf::VirtualString,std::equal_to<hmdf::DataFrame<I,H>::ColNameType>,std::allocator<std::pair<const hmdf::DataFrame<I,H>::ColNameType,std::vector<H,std::allocator<_Other>>::size_type>>>'
1>D:\cpplib\df\include\DataFrame\DataFrame.h(291,1): fatal error C1001: Internal compiler error.
1>(compiler file 'msc1.cpp', line 1532)

I used Cmake GUI to configure and generate the progect and then ran it with VS 2019 to build. I'm unsure if the error is due to my installation procedure or with the library itself. Thank you!

what should I do after cmake in windows,my ide is visual studio 2012

what should I do after cmake in windows,my IDE is visual studio 2012

after cmake,I bulild the dataframe_tester in visual studio 2012,there emergs many errors,I don't know what is happend
1 > -- -- -- -- -- - has started to generate: project: dataframe_tester, configuration: Debug Win32 -- -- -- -- -- -
1 > dataframe_tester. Cc
1>c:\users\ flames \downloads\dataframe-master\include\ vectorview.h (25): error C2873: "value_type" : symbols cannot be used in using declarations
1> c:\users\flame\downloads\dataframe-master\include\ vectorview.h (494): see the reference to instantiating "HMDF ::VectorView" for the class template being compiled
1>c:\users\ flames \downloads\dataframe-master\include\VectorView. (before "=")
1>c\users\ flames \downloads\dataframe-master\include\VectorView. before
1>c:\users\ flames \downloads\dataframe-master\include\ vectorview.h (26): error C2873: "size_type" : symbol cannot be used in the using declaration
1>c:\users\ flames \downloads\dataframe-master\include\VectorView. (before "=")
1>c:\users\ flames \downloads\dataframe-master\include\VectorView. before
1>c:\users\ flames \downloads\dataframe-master\include\ vectorview.h (27): error C2873: "pointer" : symbol cannot be used in using declaration
1>c:\users\ flames \downloads\dataframe-master\include\VectorView. (before "=")
1>c:\users\ flames \downloads\dataframe-master\include\VectorView. before
1>c:\users\ flames \downloads\dataframe-master\include\ vectorview.h (28): error C2873: "const_pointer" : symbol cannot be used in using declaration
1>c:\users\ flames \downloads\dataframe-master\include\VectorView. (before "=")
1>c:\users\ flames \downloads\dataframe-master\include\VectorView. before
1>c:\users\ flames \downloads\dataframe-master\include\ vectorview.h (29): error C2873: "const_pointer_const" : symbol cannot be used in the using declaration
1>c:\users\ flames \downloads\dataframe-master\include\VectorView. (before "=")
1>c:\users\ flames \downloads\dataframe-master\include\VectorView. before
1>c:\users\ flames \downloads\dataframe-master\include\ vectorview. h(30): error C2873: "reference" : symbol cannot be used in using declaration
1>c:\users\ flames \downloads\dataframe-master\include\VectorView. (before "=")
1>c:\users\ flames \downloads\dataframe-master\include\VectorView. before
1>c:\users\ flames \downloads\dataframe-master\include\ vectorview. h(31): error C2873: "const_reference" : symbol cannot be used in using declaration...........
image

Project doesn't build with MinGW compiler on Windows

$ cmake -G "MinGW Makefiles"  ..
-- The C compiler identification is GNU 8.1.0
-- The CXX compiler identification is GNU 8.1.0
-- Check for working C compiler: C:/Program Files (x86)/mingw-w64/i686-8.1.0-posix-dwarf-rt_v6-rev0/mingw32/bin/gcc.exe
-- Check for working C compiler: C:/Program Files (x86)/mingw-w64/i686-8.1.0-posix-dwarf-rt_v6-rev0/mingw32/bin/gcc.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: C:/Program Files (x86)/mingw-w64/i686-8.1.0-posix-dwarf-rt_v6-rev0/mingw32/bin/g++.exe
-- Check for working CXX compiler: C:/Program Files (x86)/mingw-w64/i686-8.1.0-posix-dwarf-rt_v6-rev0/mingw32/bin/g++.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
CMake Error at CMakeLists.txt:128 (set_target_properties):
  set_target_properties Can not find target to add properties to: DataFrame


CMake Error at CMakeLists.txt:136 (target_include_directories):
  Cannot specify include directories for target "DataFrame" which is not
  built by this project.


CMake Error at CMakeLists.txt:141 (install):
  install TARGETS given target "DataFrame" which does not exist.


-- Copying files for testing
-- Copying files for testing - done
-- Configuring incomplete, errors occurred!

Undefined reference to `hmdf::HeteroVector::clear()'

When I tried to smoke test my code, I found that even trying to declare a dataframe resulted in an error at compile time. Am I missing a header file?

I'm running Debian 10 and am not using an IDE - I'm simply using g++ (I will create a Makefile once my code makes it past the smoke test). Below is the command I am using and the corresponding error.

username@system:/tmp/smoke_test$ g++ test_dataframe.cpp -I DataFrame/include/ -std=gnu++17 -o test_dataframe
/usr/bin/ld: /tmp/ccjU2Oy0.o: in function hmdf::HeteroVector::~HeteroVector()': test_dataframe.cpp:(.text._ZN4hmdf12HeteroVectorD2Ev[_ZN4hmdf12HeteroVectorD5Ev]+0x14): undefined reference to hmdf::HeteroVector::clear()'
collect2: error: ld returned 1 exit status

test_dataframe.cpp.txt

Adding /bigobj flag in CMakeLists.txt will help

...
if (MSVC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /bigobj")
add_definitions(-D _CRT_SECURE_NO_WARNINGS)
add_library(${LIBRARY_TARGET_NAME} STATIC ${${LIBRARY_TARGET_NAME}_SRC})
endif(MSVC)
...
without /bigobj I get linker failure with MSVC 2017

Help with installation on Windows

Hi,

I read the instructions in the Readme section but I do not know what to do at step 4 in the command prompt

step 1) mkdir [Debug | Release]
step 2) cd [Debug | Release]
step 3) cmake -DCMAKE_BUILD_TYPE=[Debug | Release] ..
step 4) make
step 5) make install

I have managed to generate build files as you can see here.

df

But the "make" or "make install" do not work from the command line.

df2

I am a bit confused on what I should do next to add your library to my Visual Studio Community C++ project to be able to use your dataframe class to manipulate tabular data from a CSV file.

Thanks in advance.

Regards,

Error in compile test file

I'm not sure if there's something wrong with my installation of DataFrame.
Folliowing Installation Instruction, such as cmake, make, make install, I've successfully installed DataFrame. After installation, my iTerm2 showed infos like below

Install the project...
-- Install configuration: "Release"
-- Installing: /usr/local/lib/libDataFrame.1.0.0.dylib
-- Installing: /usr/local/lib/libDataFrame.dylib
-- Up-to-date: /usr/local/include/DataFrame
-- Installing: /usr/local/include/DataFrame/DataFrameMLVisitors.h
-- Up-to-date: /usr/local/include/DataFrame/MMap
-- Installing: /usr/local/include/DataFrame/MMap/ObjectVector.tcc
-- Installing: /usr/local/include/DataFrame/MMap/MMapFile.h
-- Installing: /usr/local/include/DataFrame/MMap/MMapSharedMem.h
-- Installing: /usr/local/include/DataFrame/MMap/FileDef.h
-- Installing: /usr/local/include/DataFrame/MMap/ObjectVector.h
-- Installing: /usr/local/include/DataFrame/MMap/MMapBase.h
-- Installing: /usr/local/include/DataFrame/DataFrameStatsVisitors.h
-- Installing: /usr/local/include/DataFrame/DataFrameTypes.h
-- Up-to-date: /usr/local/include/DataFrame/Utils
-- Installing: /usr/local/include/DataFrame/Utils/DateTime.h
-- Installing: /usr/local/include/DataFrame/Utils/FixedSizeString.h
-- Installing: /usr/local/include/DataFrame/Utils/ThreadGranularity.h
-- Up-to-date: /usr/local/include/DataFrame/Vectors
-- Installing: /usr/local/include/DataFrame/Vectors/HeteroVector.h
-- Installing: /usr/local/include/DataFrame/Vectors/HeteroView.h
-- Installing: /usr/local/include/DataFrame/Vectors/HeteroPtrView.tcc
-- Installing: /usr/local/include/DataFrame/Vectors/VectorPtrView.h
-- Installing: /usr/local/include/DataFrame/Vectors/HeteroPtrView.h
-- Installing: /usr/local/include/DataFrame/Vectors/HeteroView.tcc
-- Installing: /usr/local/include/DataFrame/Vectors/VectorView.h
-- Installing: /usr/local/include/DataFrame/Vectors/HeteroVector.tcc
-- Installing: /usr/local/include/DataFrame/DataFrameFinancialVisitors.h
-- Up-to-date: /usr/local/include/DataFrame/Internals
-- Installing: /usr/local/include/DataFrame/Internals/RandGen.tcc
-- Installing: /usr/local/include/DataFrame/Internals/DataFrame_opt.tcc
-- Installing: /usr/local/include/DataFrame/Internals/DataFrame_read.tcc
-- Installing: /usr/local/include/DataFrame/Internals/DataFrame_standalone.tcc
-- Installing: /usr/local/include/DataFrame/Internals/DataFrame.tcc
-- Installing: /usr/local/include/DataFrame/Internals/DataFrame_set.tcc
-- Installing: /usr/local/include/DataFrame/Internals/DataFrame_misc.tcc
-- Installing: /usr/local/include/DataFrame/Internals/DataFrame_shift.tcc
-- Installing: /usr/local/include/DataFrame/Internals/DataFrame_join.tcc
-- Installing: /usr/local/include/DataFrame/Internals/DataFrame_functors.h
-- Installing: /usr/local/include/DataFrame/Internals/DataFrame_write.tcc
-- Installing: /usr/local/include/DataFrame/Internals/DataFrame_get.tcc
-- Installing: /usr/local/include/DataFrame/RandGen.h
-- Installing: /usr/local/include/DataFrame/DataFrameOperators.h
-- Installing: /usr/local/include/DataFrame/DataFrame.h
-- Installing: /usr/local/lib/pkgconfig/DataFrame.pc
-- Installing: /usr/local/lib/cmake/DataFrame/DataFrameConfigVersion.cmake
-- Installing: /usr/local/lib/cmake/DataFrame/DataFrameConfig.cmake
-- Installing: /usr/local/lib/cmake/DataFrame/DataFrameTargets.cmake
-- Installing: /usr/local/lib/cmake/DataFrame/DataFrameTargets-release.cmake

Then I tried to compile cpp file in Test directory like dataframe_performance.cc using command clang++ dataframe_performance.cc -o dataframe_performance --std=c++17. However, I've got errors list below

Undefined symbols for architecture x86_64:
  "hmdf::HeteroVector::clear()", referenced from:
      hmdf::HeteroVector::~HeteroVector() in dataframe_performance-540be3.o
  "hmdf::HeteroVector::HeteroVector(hmdf::HeteroVector&&)", referenced from:
      void std::__1::allocator<hmdf::HeteroVector>::construct<hmdf::HeteroVector, hmdf::HeteroVector>(hmdf::HeteroVector*, hmdf::HeteroVector&&) in dataframe_performance-540be3.o
  "hmdf::HeteroVector::HeteroVector(hmdf::HeteroVector const&)", referenced from:
      void std::__1::allocator<hmdf::HeteroVector>::construct<hmdf::HeteroVector, hmdf::HeteroVector const&>(hmdf::HeteroVector*, hmdf::HeteroVector const&) in dataframe_performance-540be3.o
  "hmdf::HeteroVector::HeteroVector()", referenced from:
      std::__1::vector<double, std::__1::allocator<double> >& hmdf::DataFrame<long, hmdf::HeteroVector>::create_column<double>(char const*) in dataframe_performance-540be3.o
  "hmdf::DateTime::add_months(long)", referenced from:
      void hmdf::_generate_ts_index_<long>(std::__1::vector<long, std::__1::allocator<long> >&, hmdf::DateTime&, hmdf::time_frequency, long) in dataframe_performance-540be3.o
  "hmdf::DateTime::add_seconds(long)", referenced from:
      void hmdf::_generate_ts_index_<long>(std::__1::vector<long, std::__1::allocator<long> >&, hmdf::DateTime&, hmdf::time_frequency, long) in dataframe_performance-540be3.o
  "hmdf::DateTime::add_nanoseconds(long)", referenced from:
      void hmdf::_generate_ts_index_<long>(std::__1::vector<long, std::__1::allocator<long> >&, hmdf::DateTime&, hmdf::time_frequency, long) in dataframe_performance-540be3.o
  "hmdf::DateTime::add_days(long)", referenced from:
      void hmdf::_generate_ts_index_<long>(std::__1::vector<long, std::__1::allocator<long> >&, hmdf::DateTime&, hmdf::time_frequency, long) in dataframe_performance-540be3.o
  "hmdf::DateTime::add_years(long)", referenced from:
      void hmdf::_generate_ts_index_<long>(std::__1::vector<long, std::__1::allocator<long> >&, hmdf::DateTime&, hmdf::time_frequency, long) in dataframe_performance-540be3.o
  "hmdf::DateTime::DateTime(char const*, hmdf::DT_DATE_STYLE, hmdf::DT_TIME_ZONE)", referenced from:
      hmdf::DataFrame<long, hmdf::HeteroVector>::gen_datetime_index(char const*, char const*, hmdf::time_frequency, long, hmdf::DT_TIME_ZONE) in dataframe_performance-540be3.o
  "hmdf::DateTime::diff_seconds(hmdf::DateTime const&) const", referenced from:
      hmdf::DataFrame<long, hmdf::HeteroVector>::gen_datetime_index(char const*, char const*, hmdf::time_frequency, long, hmdf::DT_TIME_ZONE) in dataframe_performance-540be3.o
  "hmdf::DateTime::date() const", referenced from:
      void hmdf::_generate_ts_index_<long>(std::__1::vector<long, std::__1::allocator<long> >&, hmdf::DateTime&, hmdf::time_frequency, long) in dataframe_performance-540be3.o
  "hmdf::DateTime::time() const", referenced from:
      void hmdf::_generate_ts_index_<long>(std::__1::vector<long, std::__1::allocator<long> >&, hmdf::DateTime&, hmdf::time_frequency, long) in dataframe_performance-540be3.o
  "hmdf::DateTime::compare(hmdf::DateTime const&) const", referenced from:
      hmdf::operator<(hmdf::DateTime const&, hmdf::DateTime const&) in dataframe_performance-540be3.o
  "hmdf::DateTime::long_time() const", referenced from:
      void hmdf::_generate_ts_index_<long>(std::__1::vector<long, std::__1::allocator<long> >&, hmdf::DateTime&, hmdf::time_frequency, long) in dataframe_performance-540be3.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Is there something wrong with my compilation?

Is there any plan to add data adapter for csv or some database?

Hi, thanks for share your great code, I am really impressed with the precision of date time supported, which is much better than pandas.

Is there any plan to add any adapters to import data from csv file or some database?

this should be tagged as a feature request, but sorry for cannot do it.

C API

Hi,

Recently I wanted to write something similar to pandas in C. I was starting (I have been working on that for a few weeks), when I thought someone would have done something similar already, and then I found this.

Do you think it can be ported to C, or at least write an interface between this and C code that would link to the C++ code? If so, maybe I could help you.

Kind regards,
Alex.

dllexport dynamic library on MSVC

Hi, @hosseinmoein
I found it had failed to build DataFrame to be a dynamic library on MSVC while I was making a conan package for DataFrame. So I did not support the dynamic library on MSVC in the conan package also. However, the static library is feasible for all platforms and compilers.

  • Why it fails to build a dynamic library on MSVC

It seems I did not completely fix the problem building a dynamic library on MSVC, though I have tried it in Feature/declspec dllexport dllimport .

  1. Cause

Why Feature/declspec dllexport dllimport has passed the Appveyor testing, I think maybe the add_definitions_(-DLIBRARY_EXPORTS), which working as a common targets both on building the library and testing the library. I'm afraid that it would reproduce the failing on MSVC if it is target_compile_definitions, instead of add_definitions.

  1. Workaround on MSVC

add this in your own project.

if(MSVC)
  add_definitions(-DLIBRARY_EXPORTS)
endif()
  1. Continue ...

I had tried an experiment to figure out the dllexport problem on https://github.com/yssource/DataFrame/blob/cmake-demo-shared-static/CMakeLists.txt, however it still does not work yet. Actually I have very little knowledges about MSVC, and don't have access to a windows development environment.

As far as I know, -DLIBRARY_EXPORTS is needed if building the DataFrame, but it does not need to be included if using the DataFrame (as a workaround here, the msvc user needs to add it to their CMakeLists).

Maybe @justinjk007 can help a lot.

dataframe_performance stack overflow

when i try to run the test sample of ./dataframe_performance
it seen to crashed.
the core stack is like this:

Program received signal SIGSEGV, Segmentation fault.
hmdf::DateTime::maketime_ (this=<error reading variable: Cannot access memory at address 0xffffff7ffff8>, ltime=<error reading variable: Cannot access memory at address 0xffffff7ffff0>)
at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:973
973 DateTime::EpochType DateTime::maketime_ (struct tm &ltime) const noexcept {
#0 hmdf::DateTime::maketime_ (this=<error reading variable: Cannot access memory at address 0xffffff7ffff8>,
ltime=<error reading variable: Cannot access memory at address 0xffffff7ffff0>) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:973
#1 0x000000000041f3a4 in hmdf::DateTime::time (this=0xffffffffe5c8) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:662
#2 0x000000000041f260 in hmdf::DateTime::sec (this=0xffffffffe5c8) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:615
#3 0x00000000004200a4 in hmdf::DateTime::maketime_ (this=0xffffffffe5c8, ltime=...) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:975
#4 0x000000000041f3a4 in hmdf::DateTime::time (this=0xffffffffe5c8) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:662
#5 0x000000000041f260 in hmdf::DateTime::sec (this=0xffffffffe5c8) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:615
#6 0x00000000004200a4 in hmdf::DateTime::maketime_ (this=0xffffffffe5c8, ltime=...) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:975
................repeat.................
#130963 0x000000000041f3a4 in hmdf::DateTime::time (this=0xffffffffe5c8) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:662
#130964 0x000000000041e95c in hmdf::DateTime::compare (this=0xffffffffe5c8, rhs=...) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:348
#130965 0x0000000000405870 in hmdf::operator< (lhs=..., rhs=...) at /home/user/dataframe/DataFrame-1.9.0/include/DataFrame/Utils/DateTime.h:343
#130966 0x00000000004074e8 in hmdf::DataFrame<long, hmdf::HeteroVector>::gen_datetime_index (start_datetime=0x420c58 "01/01/1970", end_datetime=0x420c48 "08/15/2019",
t_freq=hmdf::time_frequency::secondly, increment=1, tz=hmdf::DT_TIME_ZONE::LOCAL) at /home/user/dataframe/DataFrame-1.9.0/include/DataFrame/Internals/DataFrame_set.tcc:250
#130967 0x0000000000403fd8 in main (argc=1, argv=0xffffffffe988) at /home/user/dataframe/DataFrame-1.9.0/test/dataframe_performance.cc:45

(gdb) f 0
#0 hmdf::DateTime::maketime_ (this=<error reading variable: Cannot access memory at address 0xffffff7ffff8>,
ltime=<error reading variable: Cannot access memory at address 0xffffff7ffff0>) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:973
973 DateTime::EpochType DateTime::maketime_ (struct tm &ltime) const noexcept {
(gdb) f 1
#1 0x000000000041f3a4 in hmdf::DateTime::time (this=0xffffffffe5c8) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:662
662 const_cast<DateTime *>(this)->time_ = maketime_ (ltime);
(gdb) f 2
#2 0x000000000041f260 in hmdf::DateTime::sec (this=0xffffffffe5c8) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:615
615 const_cast<DateTime *>(this)->breaktime_ (this->time (), nanosec ());
(gdb) f 3
#3 0x00000000004200a4 in hmdf::DateTime::maketime_ (this=0xffffffffe5c8, ltime=...) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:975
975 ltime.tm_sec = sec ();
(gdb) f 4
#4 0x000000000041f3a4 in hmdf::DateTime::time (this=0xffffffffe5c8) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:662
662 const_cast<DateTime *>(this)->time_ = maketime_ (ltime);
(gdb) f 130966
#130966 0x00000000004074e8 in hmdf::DataFrame<long, hmdf::HeteroVector>::gen_datetime_index (start_datetime=0x420c58 "01/01/1970", end_datetime=0x420c48 "08/15/2019",
t_freq=hmdf::time_frequency::secondly, increment=1, tz=hmdf::DT_TIME_ZONE::LOCAL) at /home/user/dataframe/DataFrame-1.9.0/include/DataFrame/Internals/DataFrame_set.tcc:250
250 while (start_di < end_di)
(gdb) l
245 break;
246 default:
247 throw NotFeasible ("ERROR: gen_datetime_index()");
248 }
249
250 while (start_di < end_di)
251 generate_ts_index(index_vec, start_di, t_freq, increment);
252
253 return (index_vec);
254 }
(gdb) f 130965
#130965 0x0000000000405870 in hmdf::operator< (lhs=..., rhs=...) at /home/user/dataframe/DataFrame-1.9.0/include/DataFrame/Utils/DateTime.h:343
343 return (lhs.compare (rhs) < 0);
(gdb) f 130964
#130964 0x000000000041e95c in hmdf::DateTime::compare (this=0xffffffffe5c8, rhs=...) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:348
348 const EpochType t = this->time() - rhs.time();
(gdb) f 130963
#130963 0x000000000041f3a4 in hmdf::DateTime::time (this=0xffffffffe5c8) at /home/user/dataframe/DataFrame-1.9.0/src/Utils/DateTime.cc:662
662 const_cast<DateTime *>(this)->time_ = maketime_ (ltime);
(gdb)

=========================
some info of my server :

gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/aarch64-linux-gnu/7.3.0/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release -with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,fortran,lto --enable-plugin --enable-initfini-array --disable-libgcj --without-isl --without-cloog --enable-gnu-indirect-function --build=aarch64-linux-gnu --with-stage1-ldflags=' -Wl,-z,relro,-z,now' --with-boot-ldflags=' -Wl,-z,relro,-z,now' --with-multilib-list=lp64
Thread model: posix
gcc version 7.3.0 (GCC)

uname -a

Linux arm1 4.19.36-vhulk1907.1.0.h619.eulerosv2r8.aarch64 #1 SMP Mon Jul 22 00:00:00 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux

ulimit -a

core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 2054314
max locked memory (kbytes, -l) 2097152
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 2054314
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

cat /proc/cpuinfo

processor : 0
BogoMIPS : 200.00
cpu MHz : 2600.000
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
CPU implementer : 0x48
CPU architecture: 8
CPU variant : 0x1
CPU part : 0xd01
CPU revision : 0
........repeat......
processor : 95
BogoMIPS : 200.00
cpu MHz : 2600.000
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
CPU implementer : 0x48
CPU architecture: 8
CPU variant : 0x1
CPU part : 0xd01
CPU revision : 0

free -g

total used free shared buff/cache available
Mem: 501 279 217 0 4 215
Swap: 0 0 0

Input/output formats

Hi!

It would be nice to support reading and writing from/to the standardised "comma separated" and "tab separated" formats used in R and Pandas (Python).

Great work!

why timestamp diff?

https://github.com/hosseinmoein/DataFrame/blob/master/test/dataframe_tester.cc#L2183-L2192

I'm in ShangHai. "TZ=GMT-08", // "Asia/Shanghai",
and doing the following test and find that your local time is +13 hours ahead me.

  • source
idx_vec1 = MyDataFrame::gen_datetime_index("01/01/2018",
                                           "12/31/2022",
                                           time_frequency::hourly);


assert(idx_vec1[0] == 1514782800);
assert(idx_vec1[1] == 1514786400);
assert(idx_vec1[2] == 1514790000);
  • Output Results of my local test
 idx_vec1[0] 1514736000 # Note here, my test is 1514736000, but your is 1514782800. 13 hours differ.
 idx_vec1[1] 1514739600
 idx_vec1[2] 1514743200

#+begin_src python :results output
  import numpy as np
  import datetime

  print("My results ....")

  ts = 1514736000
  dt = datetime.datetime.fromtimestamp(ts)
  print(dt)

  print("While your results ....")

  ts = 1514782800
  dt = datetime.datetime.fromtimestamp(ts)
  print(dt)
#+end_src

#+RESULTS:
: My results ....
: 2018-01-01 00:00:00
: While your results ....
: 2018-01-01 13:00:00

cmake install, preserve directory hierarchy for PUBLIC_HEADER.

At present, cmake installs all PUBLIC_HEADER into "${CMAKE_INSTALL_INCLUDEDIR}/${LIBRARY_TARGET_NAME}" without the original directory hierarchy.
For exmaple,

Actual result:
/usr/local/include/DataFrame/MMapSharedMem.h
Expected result:
/usr/local/include/DataFrame/MMap/MMapSharedMem.h.

My opinion is that keeping the file structure as it is.
But, do some work in CMakeLists.txt.
It likes, replacing the partials of PUBLIC_HEADER with some codes like the following

install(DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/include/${PROJECT_NAME}" # source directory
        DESTINATION "include" # target directory
        FILES_MATCHING # install only matched files
        PATTERN "*.h|*.tcc" # select header files
)

Do you agree with me?

Help, c++ syntax.

    // (..., visit_impl_help_<std::decay_t<T>, TYPES>(visitor)); // C++17
    using expander = int[];
    (void) expander { 0, (visit_impl_help_<T, TYPES>(visitor), 0) ... };

I got confused by the syntax. I think you are using the list initialization, But I don't understand what (void) is for here?

It is equal to this, isn't it?
int v[] = { 0, (visit_impl_help_<T, TYPES>(visitor), 0) ... };

Although this is not a proper place to ask the question, I google and stackoverflow a lot, and still got confused about it.

Thank you in advance.

Writing DataFrame in csv file and loading it in Pandas

I have used one of the test examples for following case scenario:

  1. Write DataFrame content to csv/json file
  2. Load that file using python pandas.

And I expected to have same DataFrame in both cases.

Here is the code I used on c++ side.

    std::vector<unsigned long> idx = {123450,
                                      123451,
                                      123452,
                                      123453,
                                      123454,
                                      123455,
                                      123456,
                                      123457,
                                      123458,
                                      123459,
                                      123460,
                                      123461,
                                      123462,
                                      123466};
    std::vector<double>        d1  = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14};
    std::vector<double>        d2  = {8, 9, 10, 11, 12, 13, 14, 20, 22, 23, 30, 31, 32, 1.89};
    std::vector<double> d3 = {15, 16, 17, 18, 19, 20, 21, 0.34, 1.56, 0.34, 2.3, 0.1, 0.89, 0.45};
    std::vector<int>    i1 = {22, 23, 24, 25, 99, 100, 101, 3, 2};
    MyDataFrame         df;

    df.load_data( std::move( idx ),
                  std::make_pair( "col_1", d1 ),
                  std::make_pair( "col_2", d2 ),
                  std::make_pair( "col_3", d3 ),
                  std::make_pair( "col_4", i1 ) );

    std::ofstream outfile;
    outfile.open( "file.json", std::ios_base::in | std::ios_base::trunc );
    df.write<std::ostream, double, int>( outfile, false, io_format::json );

Once file is loaded using pandas, following table is obtained:

image

Is this expected behaviour?

Prefix header, problem a deeper look

I found a problem, unlike on linux, the installation does not happen on windows, As on there is no place to move the files to unlike unix platforms. So on linux #include <Dataframe/Dataframe.h> will work but on windows only #include <Dataframe.h> will. Because on windows, the external targets looks for header files in the Dataframe/include so to get the same behaviours #include <Dataframe/Dataframe.h> we should change the directory structure of the include directory to have a additional Dataframe subdirectory.

@hosseinmoein

sort by multi columns and ascending or descending order

I am wondering whether the library can be improved to handle more sort algos.
Many many tks.
Currently, data frame can be sorted by only one column. In many circumstances, we want to sort the data frame by col1 first in ascending order, and then by col2 secondly in descending order, and then blabla...
May I know whether this request can be implemented?

:-)

Default column types as class template parameters

The class is being used like this now:

df.make_consistent<int, double, std::string>();
df.get_data_by_idx<int, double, std::string>
df.sort<double, int, double, std::string>

Notice all those <int, double ,std::string> repetition. I find these template parameters on methods bloat the code too much. How about put these into DataFrame's template parameters so we can declare a typedef and forget these?

GPU support?

Do you think give gpu support? at least cuda. You can have gpu support by using existing libraries like dlib or something else if you don't want to struggle with raw cuda.

No definition of `reverse_iterator`

Sorry to disturb. I'm trying to learn cpp with your beautiful cpp code.
While I'm reviewing your code of HeteroView.h, I've found code template<typename T> using reverse_iterator = typename VectorView<T>::reverse_iterator;.
However in file VectorView.h, there doesnot exist a definition of reverse_iterator, am I misunderstanding something ?

Support column rolling?

It there any a plan to support column rolling?
For instance. It would like what pandas do in the following codes.


#+begin_src python :results output
  import numpy
  import pandas

  x = numpy.array([0, 1, 2, 3, 4])
  s = pandas.Series(x)

  print(s.rolling(3).min())
  print(s.rolling(3).max())
  print(s.rolling(3).mean())
  print(s.rolling(3).std())
#+end_src

#+RESULTS:
#+begin_example
0    NaN
1    NaN
2    0.0
3    1.0
4    2.0
dtype: float64
0    NaN
1    NaN
2    2.0
3    3.0
4    4.0
dtype: float64
0    NaN
1    NaN
2    1.0
3    2.0
4    3.0
dtype: float64
0    NaN
1    NaN
2    1.0
3    1.0
4    1.0
dtype: float64
#+end_example

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.