cmu-db / noisepage Goto Github PK

Self-Driving Database Management System from Carnegie Mellon University

License: MIT License

CMake 0.75% C++ 85.00% Shell 0.15% Python 3.69% Perl 0.03% Dockerfile 0.01% C 0.24% Java 9.21% Smarty 0.77% Groovy 0.15%

database dbms

noisepage's Introduction

NoisePage is a relational database management system developed by the Carnegie Mellon Database Group. The research goal of the NoisePage project is to develop high-performance system components that support autonomous operation and optimization as a first-class design principle.

Key Features

Integrated machine learning components to forecast, model, and plan the system's behavior.
Postgres compatible wire-protocol, SQL, and catalogs.
Apache Arrow compatible in-memory columnar storage.
Lock-free multi-version concurrency control.
Just-in-time query compilation using the LLVM.
Vectorized execution using relaxed-operator fusion (ROF).
100% Open-Source (MIT License)

Quickstart

The NoisePage project is built and tested on Ubuntu 20.04. No other environments are officially supported.

git clone https://github.com/cmu-db/noisepage.git
cd noisepage
sudo ./script/installation/packages.sh
mkdir build
cd build
cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DNOISEPAGE_USE_JEMALLOC=ON -DNOISEPAGE_UNITY_BUILD=ON ..
ninja noisepage
./bin/noisepage

You can now connect to NoisePage over the Postgres client psql.

psql -h localhost -U noisepage -p 15721

Additional Notes:

If you have less than 16 GB of RAM, use -DNOISEPAGE_UNITY_BUILD=OFF in the cmake commands above.
If you know what you're doing, install the prerequisite packages from ./script/installation/packages.sh manually.

For Developers

Please see the docs.

Contributing

If you are a current student at CMU,

See the New Student Guide.
Consider enrolling in one of the database courses.

Contributions from non-CMU students are also welcome!

noisepage's People

Contributors

Stargazers

Watchers

Forkers

lmwnshn mbutrovich db-ol pervazea ksaito7 yangjuns saifalharthi tli2 wenxuanqiu amlatyrngom swimj linmagit abnerzheng darkforte canyuchen pengjin95 ghatage yeshengm apavlo lmy1229 jrolli 17zhangw abrahamks codeworm96 calculuser minxuancao esargent28 katiavi yashnan amallia unboundwind gustavoangulo thepulkitagarwal spacejam yuzeliao cih-y2k qdhe saikiriti93 yangdsh songzhaozhe ming535 newtoncx utkarsh39 dales24 junli0411 advanced-database-group tanujnay112 iamzhoug37 guiji101 ziyi-yan shaqsnake jianq93 alekseyba spankydangler1 persistence717 devildevo mush-zhang wangziqi2013 g-vicky andilynn davaco17 joakley26 vivsy magemasher ivan-v-kush booboo14 corona10 jbx7106 pmenon adbadb yew1eb hjhhsy120 tspannhw zhangrb royyu95 joshuahendinata datonli tpan496 jtommaney hovanphan sine31 yusee-habibu levan43 ullashjain004 vibin-kottathala highgor wruiz27 barneybaby thepinetree huangmiumang loosenlu lhb081 raouaa-gif wifitv juheerizwana 0383738184 hornetthe kjames1155 batu11 monoplast1218

noisepage's Issues

Sanitizer complaint about spdlog

Reported by Tianyu
Seen on the Mac, OSX, release mode only. Unable to reproduce on Linux.
Seen when running the data_table_test, with debug logging merged in and one log point
with Cmake option -DTERRIER_USE_ASAN=ON set via the CLion Cmake build settings.

Sanitizer output below

2018-08-16 13:30:59.584554-0400 atos[39050:1230856] Provided dSYM: [/Users/Tianyu/Desktop/PoopDish/build/relwithdebinfo/libterrier_shared.dylib.dSYM/Contents/Resources/DWARF/libterrier_shared.dylib] does not match symbol owner 0x7fab18d0e300 Provided dSYM: [/Users/Tianyu/Desktop/PoopDish/build/relwithdebinfo/libterrier_shared.dylib.dSYM/Contents/Resources/DWARF/libterrier_shared.dylib] does not match symbol owner 0x7fab18d0e300 2018-08-16 13:30:59.688673-0400 data_table_test[39047:1230815] ================================================================= 2018-08-16 13:30:59.688778-0400 data_table_test[39047:1230815] ==39047==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x00010d6296c0 at pc 0x000109babdf9 bp 0x7ffee61e8bb0 sp 0x7ffee61e8ba8 2018-08-16 13:30:59.688804-0400 data_table_test[39047:1230815] READ of size 1 at 0x00010d6296c0 thread T0t 2018-08-16 13:30:59.688817-0400 data_table_test[39047:1230815] pc_0x109babdf8###func_spdlog::details::full_formatter::format(spdlog::details::log_msg const&, tm const&, fmt::v5::basic_memory_buffer<char, 500ul, std::__1::allocator<char> >&)###file_memory###line_3137###obj_(libterrier_shared.dylib:x86_64+0x64df8) 2018-08-16 13:30:59.688838-0400 data_table_test[39047:1230815] pc_0x109b4de7f###func_spdlog::pattern_formatter::format(spdlog::details::log_msg const&, fmt::v5::basic_memory_buffer<char, 500ul, std::__1::allocator<char> >&)###file_pattern_formatter.h###line_565###obj_(libterrier_shared.dylib:x86_64+0x6e7f) 2018-08-16 13:30:59.688850-0400 data_table_test[39047:1230815] pc_0x109bbd376###func_spdlog::sinks::stdout_sink<spdlog::details::console_stdout, spdlog::details::console_mutex>::log(spdlog::details::log_msg const&)###file_stdout_sinks.h###line_40###obj_(libterrier_shared.dylib:x86_64+0x76376) 2018-08-16 13:30:59.688863-0400 data_table_test[39047:1230815] pc_0x109b4bb8f###func_spdlog::logger::sink_it_(spdlog::details::log_msg&)###file_logger_impl.h###line_300###obj_(libterrier_shared.dylib:x86_64+0x4b8f) 2018-08-16 13:30:59.688883-0400 data_table_test[39047:1230815] pc_0x109a930b4###func_void spdlog::logger::log<>(spdlog::level::level_enum, char const*)###file_logger_impl.h###line_81###obj_(data_table_test:x86_64+0x10007d0b4) 2018-08-16 13:30:59.689002-0400 data_table_test[39047:1230815] pc_0x109a1aa0d###func_terrier::DataTableTests_SimpleInsertSelect_Test::TestBody()###file_logger_impl.h###line_117###obj_(data_table_test:x86_64+0x100004a0d) 2018-08-16 13:30:59.689023-0400 data_table_test[39047:1230815] pc_0x109aafcdd###func_void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)###file_gtest.cc###line_2402###obj_(data_table_test:x86_64+0x100099cdd) 2018-08-16 13:30:59.689039-0400 data_table_test[39047:1230815] pc_0x109aafbde###func_testing::Test::Run()###file_gtest.cc###line_2474###obj_(data_table_test:x86_64+0x100099bde) 2018-08-16 13:30:59.689054-0400 data_table_test[39047:1230815] pc_0x109ab0f7d###func_testing::TestInfo::Run()###file_gtest.cc###line_2656###obj_(data_table_test:x86_64+0x10009af7d) 2018-08-16 13:30:59.689068-0400 data_table_test[39047:1230815] pc_0x109ab1916###func_testing::TestCase::Run()###file_gtest.cc###line_2774###obj_(data_table_test:x86_64+0x10009b916) 2018-08-16 13:30:59.689082-0400 data_table_test[39047:1230815] pc_0x109aba0d6###func_testing::internal::UnitTestImpl::RunAllTests()###file_gtest.cc###line_4649###obj_(data_table_test:x86_64+0x1000a40d6) 2018-08-16 13:30:59.689097-0400 data_table_test[39047:1230815] pc_0x109ab9c2d###func_bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)###file_gtest.cc###line_2402###obj_(data_table_test:x86_64+0x1000a3c2d) 2018-08-16 13:30:59.689111-0400 data_table_test[39047:1230815] pc_0x109ab9b56###func_testing::UnitTest::Run()###file_gtest.cc###line_4257###obj_(data_table_test:x86_64+0x1000a3b56) 2018-08-16 13:30:59.689125-0400 data_table_test[39047:1230815] pc_0x109acc1a0###func_main###file_gtest.h###line_2233###obj_(data_table_test:x86_64+0x1000b61a0) 2018-08-16 13:30:59.689138-0400 data_table_test[39047:1230815] pc_0x7fff6a3c6014###func_start###file_<null>###line_-434211552###obj_(libdyld.dylib:x86_64+0x1014) 2018-08-16 13:30:59.689151-0400 data_table_test[39047:1230815] 2018-08-16 13:30:59.689162-0400 data_table_test[39047:1230815] Address 0x00010d6296c0 is located in stack of thread T0 at offset 64 in frame 2018-08-16 13:30:59.689176-0400 data_table_test[39047:1230815] pc_0x109ba5d7f###func_spdlog::details::full_formatter::format(spdlog::details::log_msg const&, tm const&, fmt::v5::basic_memory_buffer<char, 500ul, std::__1::allocator<char> >&)###file_pattern_formatter.h###line_464###obj_(libterrier_shared.dylib:x86_64+0x5ed7f) 2018-08-16 13:30:59.689208-0400 data_table_test[39047:1230815] 2018-08-16 13:30:59.689219-0400 data_table_test[39047:1230815] This frame has 1 object(s): 2018-08-16 13:30:59.689233-0400 data_table_test[39047:1230815] [32, 64) 'i.i' <== Memory access at offset 64 overflows this variable 2018-08-16 13:30:59.689245-0400 data_table_test[39047:1230815] HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext 2018-08-16 13:30:59.689257-0400 data_table_test[39047:1230815] (longjmp and C++ exceptions *are* supported) 2018-08-16 13:30:59.689269-0400 data_table_test[39047:1230815] SUMMARY: AddressSanitizer: stack-buffer-overflow memory:3137 in spdlog::details::full_formatter::format(spdlog::details::log_msg const&, tm const&, fmt::v5::basic_memory_buffer<char, 500ul, std::__1::allocator<char> >&) 2018-08-16 13:30:59.689287-0400 data_table_test[39047:1230815] Shadow bytes around the buggy address: 2018-08-16 13:30:59.689300-0400 data_table_test[39047:1230815] 0x100021ac5280: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 2018-08-16 13:30:59.689313-0400 data_table_test[39047:1230815] 0x100021ac5290: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 2018-08-16 13:30:59.689326-0400 data_table_test[39047:1230815] 0x100021ac52a0: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 2018-08-16 13:30:59.689338-0400 data_table_test[39047:1230815] 0x100021ac52b0: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 2018-08-16 13:30:59.689349-0400 data_table_test[39047:1230815] 0x100021ac52c0: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 2018-08-16 13:30:59.689361-0400 data_table_test[39047:1230815] =>0x100021ac52d0: f1 f1 f1 f1 00 00 00 00[f3]f3 f3 f3 00 00 00 00 2018-08-16 13:30:59.689373-0400 data_table_test[39047:1230815] 0x100021ac52e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2018-08-16 13:30:59.689386-0400 data_table_test[39047:1230815] 0x100021ac52f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2018-08-16 13:30:59.689398-0400 data_table_test[39047:1230815] 0x100021ac5300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2018-08-16 13:30:59.689410-0400 data_table_test[39047:1230815] 0x100021ac5310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2018-08-16 13:30:59.689422-0400 data_table_test[39047:1230815] 0x100021ac5320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2018-08-16 13:30:59.689435-0400 data_table_test[39047:1230815] Shadow byte legend (one shadow byte represents 8 application bytes): 2018-08-16 13:30:59.689447-0400 data_table_test[39047:1230815] Addressable: 00 2018-08-16 13:30:59.689458-0400 data_table_test[39047:1230815] Partially addressable: 01 02 03 04 05 06 07 2018-08-16 13:30:59.689470-0400 data_table_test[39047:1230815] Heap left redzone: fa 2018-08-16 13:30:59.689483-0400 data_table_test[39047:1230815] Freed heap region: fd 2018-08-16 13:30:59.689494-0400 data_table_test[39047:1230815] Stack left redzone: f1 2018-08-16 13:30:59.689504-0400 data_table_test[39047:1230815] Stack mid redzone: f2 2018-08-16 13:30:59.689514-0400 data_table_test[39047:1230815] Stack right redzone: f3 2018-08-16 13:30:59.689524-0400 data_table_test[39047:1230815] Stack after return: f5 2018-08-16 13:30:59.689533-0400 data_table_test[39047:1230815] Stack use after scope: f8 2018-08-16 13:30:59.689543-0400 data_table_test[39047:1230815] Global redzone: f9 2018-08-16 13:30:59.689553-0400 data_table_test[39047:1230815] Global init order: f6 2018-08-16 13:30:59.689563-0400 data_table_test[39047:1230815] Poisoned by user: f7 2018-08-16 13:30:59.689575-0400 data_table_test[39047:1230815] Container overflow: fc 2018-08-16 13:30:59.689587-0400 data_table_test[39047:1230815] Array cookie: ac 2018-08-16 13:30:59.689597-0400 data_table_test[39047:1230815] Intra object redzone: bb 2018-08-16 13:30:59.689608-0400 data_table_test[39047:1230815] ASan internal: fe 2018-08-16 13:30:59.689632-0400 data_table_test[39047:1230815] Left alloca redzone: ca 2018-08-16 13:30:59.689644-0400 data_table_test[39047:1230815] Right alloca redzone: cb 2018-08-16 13:30:59.689656-0400 data_table_test[39047:1230815]

Unable to locate the precise location where the fault occurs from this output. Possibly
fmt_helper.h, around line 44, where it is taking the timestamp and converting into printable form.
format_int i is of size 32, but this is the object size. The output buffer is size 500.

glossary and architectural design doc for the storage engine

It occurred to me during recent interactions that we are not particularly good with names. Some terms are also overloaded (e.g. offset) to the point of being meaningless. We should take a pass, when we have time, to standardize these names and write them down in a glossary. Several big offenders:

offsets. offset should refer to only the location of a tuple slot within a block. Use column id for columns, and projection_list_index for positions of attributes in a row
redo and undo. Maybe we should just rename them to "after_image" and "before_image" to avoid confusion.
DeltaRecord. Maybe should be "UndoRecord". I still think (@mbutrovich disagrees) that delta implies two way (old -> new), whereas what we have on the transaction undo buffer is just a bunch of before-images.

All of the above items are open to discussion.

Add GarbageCollector benchmark

Look at #149 to see how we measured the performance of the GarbageCollector, and then implement a Google Benchmark that reflects this.

check-clang-tidy, clang-tidy possibly broken

make check-clang-tidy is producing errors (e.g. /repo/src/include/common/concurrent_map.h:95:24: error: no member named 'emplace' in ...) without failing the build. It is producing an exit code of 0 (both inside and outside Docker images), which is why Travis doesn't fail it.

There is a related issue here, but attempts at defining TBB_USE_GLIBCXX_VERSION and __TBB_CPP11_VARIADIC_TEMPLATES_PRESENT have not solved the issue. The gist is that it can't detect support for C++11 features.

Garbage Collection

A simple garbage collection that runs on its own thread and consumes the out-of-scope transaction contexts form transaction manager should suffice at this point. Correctness first, optimization later.

Bringing in Index

We want to start bringing in the bw_tree as our index. This will entail:

adding bwtree.h/.cpp to the third_party folder
modify the wrapper bwtree_index.h to not inherit from index.h anymore, and any other adjustments we need to make it work with Terrier
discuss any other dependencies you find before bringing them in

Read-only Transactions don't need commit timestamps

They don't really exist. We can easily tell read-only transactions in the commit method by checking if its undo buffer is empty. Essentially we get to skip most of the commit and logging logic, just remove said transaction from the list of running transactions and hand it to the garbage collector, returning its begin timestamp as commit timestamp. (They still need a timestamp if we are running Serializable)

Support the theoretical speedup with benchmark numbers.

Compact logical delete column

Right now the logical delete column is an 8-byte column with nothing in the tuple, and we use its null bitmap to denote logical deletion. (Always the second column)

This is because we used the first column (version vector)'s bitmap to denote whether a slot is taken or not. It is probably better to push that bitmap (slot validity) into the block header, and use the version vector's bitmap for logical delete.

CI Improvements

The plan for our CI setup is:
Travis:

Jenkins:

DEBUG: make unittest (-DTERRIER_USE_ASAN=ON)
DEBUG: make unittest (-DTERRIER_GENERATE_COVERAGE=ON)
RELEASE: make unittest (-DBUILD_WARNING_LEVEL=Production)
RELEASE: make runbenchmark (-DBUILD_WARNING_LEVEL=Production)
performance and memory
coveralls

TODO (me): quick script to fail build if coverage drops by 5%

Coveralls is wonky

e.g. Although it runs bitmap_test when generating coverage files, it does not pick up the calls to Flip in bitmap.h.

Currently suspect optimizations like inlining to be the issue, though we're already running it in debug (O0, fno-inline), will need to dig more to find out.

This appears to be an issue with gcov (see .gcov files generated with make coveralls_generate). Possibly related to optimizations, inlining, and/or templating.

Tried:

-fno-inline -fno-inline-small-functions -fno-default-inline
-fno-elide-constructors

Can't quite get it working for now.

Debug logging

I've looked at what we've added into the repo for debug logging. The recommendation of how we expect developers to use the logging features was not at all clear. Like everyone else, I can read code and figure this out, but was looking for the "new developers guide to logging". We don't have that yet.

It raises the question of whether we even have agreement and understanding of what we need. Possibly what we've added, is not the right solution.

So, lets discuss requirements and use cases.

The style of debug logging we have in Peloton, is widely used. Many implementations pair the LOG calls with multiple back end implementations, so you can log to stdout, files or network sockets. We use it only for stdout, so it is a more structured way of doing printfs. So, what is wrong with what we have in Peloton?

We have not applied consistency to what we log and when we log it.
At LOG_INFO, we don't log much, but some of what is logged is not useful.
At LOG_DEBUG it is a free for all. Some log points provide useful information, others don't. Often the log information is only useful to the developer who implemented it or needs additional knowledge of the code path to make any sense of the output.
At LOG_TRACE, the volume of output is so high, it is IMO practically useless. LOG_TRACE is inconsistently used to trace function calls. There are better ways of doing this.
There is no way to reduce the noise level and zero in on the component you are developing or debugging. You get all LOG_DEBUG messages or none.

Questions

What do we want to get from debug logging?
What set of features do we want?
What controls / knobs do we want?

Opinions

Debug logging as a structured way of printing information is OK.
Having multiple back ends would be nice, but is not required. Just printing is usable.
We need guidelines on what is appropriate in the different log levels.
We need guidelines on what to put in log messages, and enforce this in code reviews.
In addition to the log level to be output, we need to be able to control logging "per module". This could be:
Per "module", e.g. per namespace
Per module, where it is some other module notion that we encode
Per file.
Compile time selection of what is logged, is usable. Run time enable / disable of module logging is desired, possibly required.
Control of logging via API, at run time, would be very useful. When debugging, this would allow logging to turn on only when some interesting condition or state is entered.
Control of logging via GDB is useful. Control via some admin user interface is better, but is unlikely in the near future.
You should be able to completely eliminate overhead when not using it. The Peloton logging does this by providing an empty macro body when you turn off a log level.
Gcc can compile code that includes hooks for function entry/exit, according to the documentation. Function tracing should use this facility as a compile variant, rather than us manually inserting function entry / exit LOG points.

I was hoping that easylogging would provide a suitably lightweight framework, which could be wrapped to provide a small, controlled subset that fit our needs. I can't tell if it does or not.

Alternatively, if we were rolling our own, a possible approach is as follows. It is very basic and very minimal, and just slightly extends what we've been using.

Outline

Start with the Peloton logging macros, i.e. keep the notion of:
1. LOG_INFO
2. LOG_DEBUG
  levels etc.
Document what is appropriate for each level.
Document and enforce what goes into log messages. The purpose here is to avoid, useless log points being inserted.
Extend the macros to include a module. For the moment lets assume modules map roughly to a namespace, but possibly finer granularity, defined by us.
A log point might look like:
LOG_DEBUG(MODULE_NETWORK_MARSHALLING, "output binary %d", var);
If we could live with 64 modules max, the modules could be encoded as bits, e.g.
#define MODULE_NETWORK_MARSHALLING 0x1
#define MODULE_NETWORK_PARSING 0x2
and so on
If we want more than 64 modules, then you do a lookup rather than a bit mask, so marginally
more complex.
Log macros would then be roughly:
if ( log_enabled_var & MODULE_XXX) then LOG_DEBUG_NO_MODULE(string, varags)
Couple with some additional macros and / or APIs to control logging at runtime.

Now, this is the familiar approach. The disadvantage to this is printing to stdout is not cheap. It slows down the system significantly and therefore changes timing. Problems that occur at high load when running at high performance, sometimes (frequently?) can't be debugged this way.

Therefore it may be necessary, at some point, to supplement debug logging with "high performance tracing". It may be a long way in the future. I note here just to raise awareness that debug logging is not the end of the story.
This is an old technique for debugging devices drivers, kernel modules etc. The outline being:

Create a "large" memory buffer. Used as a circular buffer.
Insert log points that insert items into the memory buffer. Typically binary information only. Log point identifier, values to save for the log point.
When needed, after some event of interest, programmatically dump the buffer.
Use associated utility to decode the contents of the buffer.
This is of course, extremely fast but not so easy to use.

Back to debug logging. Comments?

tuple_access_strategy_test times out on Ubuntu 18.04 Docker

(Docker setup guide available on our wiki)

When running make test on both my local machine and Travis, the Ubuntu 18.04 Docker image gets stuck on tuple_access_strategy_test. I have let it run locally for over an hour.

tuple_access_strategy_test however completes in ~70 seconds on OSX.

jemalloc and ASAN / Valgrind don't get along

ASAN and Valgrind currently do not catch any memory issues in tests.

As the title suggests linking in jemalloc results in some wonky issues where ASAN and Valgrind stops being able to detect leaks. The Internet suggests that either jemalloc doesn't expose the necessary interface or the dynamic linking interferes with instrumentation of the two tools. A quick search did not turn up any widely accepted explanation or fix.

See #56 for more information.

Writeup comments for DataTable tests

Forgot to do that in #72

DataTable consistency checker

We enforce a lot of storage invariants with TERRIER_ASSERTs at runtime throughout the codebase, but it would be helpful to have a function that we can invoke at any time in tests that, given a DataTable pointer, can verify the consistency of the table and its version chains (if they exist).

Some example invariants regarding ordering of version chain:

Uncommitted UndoRecords should not appear after Committed UndoRecords
If an Insert UndoRecord is present, it should be the last element in the version chain
If a Delete UndoRecord is present, it should be the first element in the version chain

I think we're reaching a point of stability on the DataTable API and internal BlockLayout (after #174 goes in) that we could start implementing this. When it's done, you'll have a good understanding of how the storage layer is organized. We can discuss more invariants as we think of them.

If this class is written in test/include/util/storage_test_util.h it should be made a friend of the DataTable class so it can access the private fields.

Package cleanup

The minimum dependencies appear to be:

clang-format-6.0
clang-tidy-6.0
cmake
git
g++-7
libjemalloc-dev
libjsoncpp-dev
libtbb-dev
libz-dev
llvm-6.0

TODO:

Rewrite packages.sh
Rewrite as brew file
Rewrite Dockerfile (need to manually set /usr/bin/c++)

make lint issues in codebase

We'll need to resolve these before make lint can be a gating check on Travis.

/terrier/src/common/logger.cpp:103:  Namespace should be terminated with "// namespace terrier"  [readability/namespace] [5]
/terrier/src/include/common/statistics.h:32:  You don't need a ; after a }  [readability/braces] [4]
Done processing /terrier/src/common/logger.cpp
/terrier/src/common/statistics.cpp:19:  Missing space before {  [whitespace/braces] [5]
/terrier/src/common/statistics.cpp:19:  Extra space before ( in function call  [whitespace/parens] [4]
Done processing /terrier/src/include/common/statistics.h
Done processing /terrier/src/common/statistics.cpp
/terrier/src/include/common/concurrent_bitmap.h:53:  Using C-style cast.  Use reinterpret_cast<uint8_t *>(...) instead  [readability/casting] [4]
/terrier/src/include/common/typedefs.h:90:  You don't need a ; after a }  [readability/braces] [4]
Done processing /terrier/src/include/common/concurrent_bitmap.h
/terrier/src/include/common/typedefs.h:73:  Add #include <utility> for move  [build/include_what_you_use] [4]
/terrier/src/include/common/typedefs.h:155:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]
Done processing /terrier/src/include/common/typedefs.h
/terrier/src/include/storage/storage_defs.h:94:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]
Done processing /terrier/src/include/storage/storage_defs.h
/terrier/src/include/common/concurrent_map.h:95:  Add #include <utility> for make_pair  [build/include_what_you_use] [4]
/terrier/src/include/common/concurrent_map.h:144:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]
Done processing /terrier/src/include/common/concurrent_map.h
/terrier/src/include/storage/tuple_access_strategy.h:69:  Do not use unnamed namespaces in header files.  See https://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Namespaces for more information.  [build/namespaces] [4]
/terrier/src/include/common/json_serializable.h:24:  You don't need a ; after a }  [readability/braces] [4]
Done processing /terrier/src/include/common/json_serializable.h
/terrier/src/include/storage/tuple_access_strategy.h:186:  Add #include <utility> for move  [build/include_what_you_use] [4]
Done processing /terrier/src/include/storage/tuple_access_strategy.h
/terrier/src/include/common/logger.h:35:  Found C++ system header after other header. Should be: logger.h, c system, c++ system, other.  [build/include_order] [4]
/terrier/src/include/common/logger.h:36:  Found C++ system header after other header. Should be: logger.h, c system, c++ system, other.  [build/include_order] [4]
/terrier/src/include/common/logger.h:58:  Missing username in TODO; it should look like "// TODO(my_username): Stuff."  [readability/todo] [2]
/terrier/src/include/common/logger.h:59:  Should have a space between // and comment  [whitespace/comments] [4]
/terrier/src/include/util/string_util.h:24:  For a static/global string constant, use a C style string instead: "static const char GETINFO_SPACER[]".  [runtime/string] [4]
/terrier/src/include/util/string_util.h:26:  For a static/global string constant, use a C style string instead: "static const char GETINFO_DOUBLE_STAR[]".  [runtime/string] [4]
/terrier/src/include/util/string_util.h:28:  For a static/global string constant, use a C style string instead: "static const char GETINFO_LONG_ARROW[]".  [runtime/string] [4]
/terrier/src/include/util/string_util.h:30:  For a static/global string constant, use a C style string instead: "static const char GETINFO_SINGLE_LINE[]".  [runtime/string] [4]
/terrier/src/include/util/string_util.h:32:  For a static/global string constant, use a C style string instead: "static const char GETINFO_THICK_LINE[]".  [runtime/string] [4]
/terrier/src/include/util/string_util.h:34:  For a static/global string constant, use a C style string instead: "static const char GETINFO_HALF_THICK_LINE[]".  [runtime/string] [4]
Done processing /terrier/src/include/common/logger.h
Done processing /terrier/src/include/util/string_util.h
/terrier/src/storage/tuple_access_strategy.cpp:37:  Namespace should be terminated with "// namespace storage"  [readability/namespace] [5]
/terrier/src/storage/tuple_access_strategy.cpp:38:  Namespace should be terminated with "// namespace terrier"  [readability/namespace] [5]
Done processing /terrier/src/storage/tuple_access_strategy.cpp
/terrier/src/include/common/macros.h:168:  Extra space before [  [whitespace/braces] [5]
/terrier/src/include/common/macros.h:170:  Extra space before [  [whitespace/braces] [5]
/terrier/src/include/common/macros.h:183:  Could not find a newline character at the end of the file.  [whitespace/ending_newline] [5]
Done processing /terrier/src/include/common/macros.h
/terrier/src/include/common/object_pool.h:4:  Include the directory when naming .h files  [build/include] [4]
/terrier/src/util/string_util.cpp:151:  Using C-style cast.  Use static_cast<int>(...) instead  [readability/casting] [4]
/terrier/src/util/string_util.cpp:203:  Namespace should be terminated with "// namespace terrier"  [readability/namespace] [5]
/terrier/src/util/string_util.cpp:174:  Add #include <vector> for vector<>  [build/include_what_you_use] [4]
Done processing /terrier/src/util/string_util.cpp
Total errors found: 18
/terrier/src/include/common/object_pool.h:86:  Add #include <utility> for move  [build/include_what_you_use] [4]
Done processing /terrier/src/include/common/object_pool.h
Total errors found: 16

Set up Jenkins

Set up Jenkins for Terrier.
As stated in Peloton PR #1458, re-purpose the MemSQL machines. These can be the starting point for systems used for Terrier. Machines currently used for Peloton can be migrated over to Terrier as activity increases.

Ubuntu 18.04 only.

C++ 17 (= gcc7, clang 6)
LLVM 6.0
cmake 3.2

At the moment the repo has just a skeleton setup in it. Matt Butrovich has been setting up the Cmake components.

There are no code review requirements, so feel free to commit Jenkins file and update as needed.

Move Constants out of common_defs

Create a new constants.h

Design adjustments from Peloton

There are a number of things discovered during the work on codegen support for index scans, that have not been merged into Peloton. These will need to be addressed correctly in Terrier. This issue describes the issues, for new designs to address or, if code is ported from Peloton, to correct the issues in the new context.

Contents

Text vs. binary marshaling for network communication.
Plan objects contents should not store execution state
Run time query state
PerformBinding issues

Descriptions

Text vs. binary marshaling.
The Postgres protocol JDBC driver performs some optimizations at run time. These are not immediately obvious from using JDBC and must be handled correctly in the implementation.

See https://jdbc.postgresql.org/documentation/head/server-prepare.html
To summarize the information:

The driver will use server side prepared statements after a configurable number of calls to the prepared statement. The transition is controlled by the prepareThreshold connection property. The default value is 5.
When the threshold is crossed, the driver switches from text marshaling of responses to binary marshal of responses.

There should be a single instance of the marshaling code, in the network communication layer. In Peloton, it is down in the executor layer, with a second instance being required in the codegen execution pathway.

Further, this behavior requires that unit tests exercise both text and binary marshaling.

Plan objects
Plan instances should contain only invariant information. No per invocation state should be stored in the plan. For example:

INSERT into tbl VALUES (1); The value inserted is a constant and may be stored in the plan.
INSERT into tbl VALUES (?); This form is used by prepared statements. The plan is re-used with differing values, which should not be stored in the plan instance.

Run time query state
Peloton's ExecutorContext stores any invocation specific state.

To summarize this and the prior item:

Invariant information can be part of the plan object
Any information that is specific to an execution instance of a query, must be in the ExecutorContext.
Any code imported from Peloton may need review and fixes for this issue.

PerformBinding
PerformBinding is called on the plan tree, prior to execution of the plan. Amongst other things, this sets up information required for lower level plan nodes to communicate their output to higher plan nodes. For example, in the codegen path in Peloton, ais (i.e. attribute names etc), need to be initialized. These identify the columns to be passed.

The process of "PerformBinding" in Peloton still has bugs when using codegen when dealing with server side prepared statements and when dealing with non-trivial plan trees. Perform binding suffers from:

Appending to initialized state, when the state has already been set once. This results in failure. See Peloton issue: https://github.com/cmu-db/peloton/issues/1394
Omitting initialization in non-trivial plan trees.

In summary:

The PerformBinding calls need review, if ported
Clean separation between invariant state and run time state, as noted in the earlier points, should be implemented and would be part of the solution.

namespace fixes

Everything should be in the namespace of its subdir. Including common. Do not need to extend to sub-sub-dirs.
May need to write python script for checking namespaces.

Cleanup week!

It's been a month of rapid engineering by a very small team, and now we need a few days to reflect on what we've accomplished and get the codebase ready for a lot more people to work on it. To that end, we're going to complete the following next week:

Performance counters and stats

Migrated from earlier discussions. See #30.

Replicated below (from @pervazea ):

I don't think the design in this PR is suitable for building upon. In other words, I'm saying we should not merge this PR. So, rather than comment on the implementation, I'm going to step back to design, requirements and use cases. I don't have answers for everything.

Questions

Do we want statistics only in debug mode or also in release mode? IMO even in release mode you need stats for monitoring and understanding what the system is doing. Depending on the amount of overhead in debug mode, the behavior can be quite a bit different in release mode. Further, it would possible and reasonable to feed stats into the self-learning portion, if we so chose.
How are stats collected, reported, used? IMO the current design is lacking in this area.
How are stats aggregated?
Do we need to clear stats? Normally yes, you'd want a method to clear stats. You can cheat and build in something to collect and diff stats. Not as nice though.
Do we want only counters, or also also include support for time durations? One could usefully start with just the former.

Requirements

Cost of keeping stats should be minimized.
If stats are "disabled" or compiled out, the resulting overhead on the system should be zero.
Stats should be available in human friendly form as well as in API / machine consumable form.
A mechanism should be provided for determining available stats.
A mechanism should be provided for dumping selected stats.
It is preferred that stats aggregation is part of the design and built into the system rather than bolted on externally.

Examples of stats usage

Codegen cache. Stats kept for this might include:
1. No. of inserts into cache
2. No. of deletions from the cache
3. Lookup in cache
4. Found in cache (so one can determine cache hit rate)

There would (I assume) be a single instance of the codegen cache class, so no stats aggregation would be needed.

Network module stats. First, my assumption would be that the system has multiple network threads, so aggregation of stats would be needed. Each network thread or network class instance might keep:
1. No. of requests received
2. No. of requests completed
3. No. of JDBC requests
4. No. of prepared statements handled
5. Text mode / marshaling request
6. Binary mode / marshaling request

A possible scenario might be that one has a test that uses prepared statements:

Run the test
Query for and dump out aggregate network stats. These display that we successfully execute the expected number of JDBC, prepared statement requests. All of them text mode.
Clear the stats.
Modify the test and increase the number of prepared statement invocations. Run the test.
Dump network aggregate stats again. Expected result would be that now you see some number of text mode requests and some number of binary mode requests.

Design outline
How stats might be implemented. Not necessarily complete or optimal.

Declaring and keeping stats

Stats variables are defined in the class declaration. (Omit discussion of public, private, friend etc). These should be separated from the other declarations, so it is clear what variables are stats versus what is internal state. Examples would be needed on how declarations should look.
Stats are maintained by the class methods. Increment might be simply, var++. Therefore if you have multiple instances of a class, they each keep their own stats.
Class constructor
1. Would need to initialize stats (e.g. to zero)
2. Would need to register stats with "stats registry". See later.
Class destructor
1. Would need to synchronize with stats collection, prior to completing destructor. Details to be fleshed out.
We'd need to group stats logically in some way. Could be:
1. stats per class, e.g. codegen_cache_stats, network_stats, worker_stats etc.
2. Could have some notion of module. This adds some layering / complexity.

Stats collection

There needs to be a way of getting stats. If there is a central registry of stats, stats collection could iterate or lookup the registered stats and retrieve the desired set. Where multiple instances exist, collection would aggregate the stats for you.
Stats collection needs to be made thread safe. Need to ensure that stats are not deallocated while the collector is trying to read them, therefore synchronization is needed between class destructors and stats collection. Possible sync. on class constructors too.
The goal is that keeping stats should have absolutely minimal overhead, so you can afford to have stats be always on. Collecting stats can be more expensive. Collection only done on demand, so it is OK if this is a bit expensive. It is OK if synchronization is needed here, as long as it doesn't affect stats keeping in normal operation.

Stats registry
Haven't thought through all the details...

You need some way to ask, what stats does the system have?
Retrieve and give me back a set of stats of type "x".
Upon creating a class instance, it would register the stats it keeps. Maybe register the type and address of its' stats.
When a class is destroyed, it would de-register its stats structure.
This might be a suitable map structure...

Change MultiThreadedTestUtil to use a thread pool

Some funky thing is going on where c++ threads are not garbage-collected after joined in long running tests, resulting in slow but noticeable "leak".

This is not an issue for unit tests. This is an issue if we want to scale up the same tests to do stress/soak/fuzz nightly. We should reuse threads with a pool inside the test util class. (or use a harness)

Column access for DataTable

Implement a column access API for DataTable that returns a raw column

CMake changes for Clang on Linux and GCC on Mac

Currently I believe that our CMake infrastructure makes some assumptions about platform and compiler correspondence: if you're on macOS you're building with Clang, and if you're on Linux you're building with GCC.

There's no reason we have to be this strict, since we should be able to build with GCC on macOS and with Clang on Linux.

Tests for ProjectedRow, DeltaRecord, and StorageUtils

We kind of didn't care much about them because they are relatively simple and tested in data table anyway.

However, I've been making changes to their structures and realized that the change is split in several places and easy to miss. Maybe we should have a small test suite for them, just like MemorySafety test for TupleAccessStrategy, to make mistakes easier to spot down the line.

Port Peloton Plan Node Code

We need to bring over the plan node classes from Peloton into the new repo:

https://github.com/cmu-db/peloton/tree/master/src/include/planner

There are some fixes that we need to do with the original code:

Remove all pointers to database objects (e.g., DataTable). Everything should be replaced with object identifiers.
Replace SerializeTo and DeserializeFrom with the new JSON serialization code.
https://github.com/cmu-db/peloton/blob/master/src/include/planner/abstract_plan.h#L123
Ideally I would like to rename all of the objects from *Plan to *PlanNode, but I am open for discussion about this.
We should comment out the PerformBinding and VisitParameters methods for now.
We are also going to have to bring in the Expression code. We should clean this up to remove the dependency on the old value type system. We should discuss this with @pmenon to find out what needs.

Add build-essential to packages.sh

As per conversation yesterday, add build-essential to packages.sh

Remove global functions

We've been writing some global functions out of convenience. It gets hairy once headers start to be used in most places. We should remove them and make a rule to avoid when possible. Probably better practice to group them into utility classes anyways.

Rename null bitmap to something else

Maybe presence bitmap? validity bitmap?

Reason behind this:

0 = null and 1 = not null, that sounds more like a presence bit to me.
columns can have nullptrs (version vector). Then, saying that "a column is null" becomes ambiguous, at least in the storage layer (concept of pointers should go away above it. )

Write Ahead Logging

While the logging mechanism will not be complete until execution engine is in, it is possible to build most of the code with DataTable and concurrency control already in.

We need:

RedoRecord, mirroring the ones we use for undo in transaction manager
LogManager (I hate the word Manager), in charge of flushing buffers and signaling buffers flushed
I/O code to serialize RedoRecords out to a file. This include inlining of varlen fields (how such fields should be denoted is an open question)

Intermittent Jenkins failures

Been seeing this off and on since Jenkins was enabled:

[terrier_PR-117-OQYACMCYLGTWIB5WRVRHBKS7GCQCONHDWA3HAAJLOOZ4ZLYOLUFQ] Running shell script
+ docker inspect -f . ubuntu:bionic
.
Failed to run image 'ubuntu:bionic'. Error: docker: Error response from daemon: connection error: desc = "transport: dial unix /var/run/docker/containerd/docker-containerd.sock: connect: connection refused".

Here's a recent PR with this error on one of the machines:
http://jenkins.db.cs.cmu.edu:8080/blue/organizations/jenkins/terrier/detail/PR-117/8/pipeline/9/

The lambdas in TransactionTestUtil are leaking (clang only?)

According to leak check on macOS, it looks like there's a malloc call every time a lambda is invoked in our large_garbage_collector_test that is then not freed. I suspect it's similar to the issue reported here:

https://stackoverflow.com/questions/15197281/clang-generates-executable-that-leaks-memory-about-stdfunction-and-lambda

We're not seeing leaks under GCC tests on CI, so the only concern at this point would be the ability to perform long-running tests on Macs, which isn't a high priority right now.

Cleanup TAS tests

Because TAS tests were written before the DataTable, there's duplicate logic. Although this is low priority, we should eventually clean this up and use the newer implementations. TAS test util as a class should be eventually merged into storage test util.

Bring in codegen's type system

Enable runtime/references check in make lint

We have currently suppressed this check, but it might be worth enabling based on this:
https://google.github.io/styleguide/cppguide.html#Reference_Arguments

If we enable it, these are what we'd have to fix:

/src/include/common/typedefs.h:130:  Is this a non-const reference? If so, make const or use a pointer: t &expected  [runtime/references] [2]
/src/include/common/typedefs.h:134:  Is this a non-const reference? If so, make const or use a pointer: t &expected  [runtime/references] [2]
Done processing /src/include/common/typedefs.h
/src/include/storage/data_table.h:30:  Is this a non-const reference? If so, make const or use a pointer: BlockStore &store  [runtime/references] [2]
/src/include/common/concurrent_queue.h:38:  Is this a non-const reference? If so, make const or use a pointer: T &dest  [runtime/references] [2]
Done processing /src/include/common/concurrent_queue.h
Done processing /src/include/storage/data_table.h
/src/include/storage/tuple_access_strategy.h:205:  Is this a non-const reference? If so, make const or use a pointer: TupleSlot &slot  [runtime/references] [2]
Total errors found: 1
Done processing /src/include/storage/tuple_access_strategy.h
/src/include/util/string_util.h:128:  Is this a non-const reference? If so, make const or use a pointer: std::string &str  [runtime/references] [2]
Done processing /src/include/util/string_util.h
Total errors found: 5

Add Delete() to DataTable

Constructor needs to guarantee that the first 2 columns are reserved for version pointer and then logical delete
Modify Select() to return a boolean: false if not present or deleted, true if visible and ProjectedRow has been populated with data
Modify Update() to not allow writes to the first 2 column_ids (hidden from user). Maybe can't assert this if I want to call Update() from Delete() to flip that column's null bit.
Add Delete(transaction::TransactionContext *txn, TupleSlot slot) method
Modify tests to support new semantics
Add new tests to verify functionality

Solidify benchmarking infrastructure and have reference numbers

Once we have a dedicated benchmark machines we should start formalizing a suite we want to run for new PRs, as well as numbers we think that would make sense. (TAS performance, DataTable throughput, and others)

TupleId

Currently the first column of a DataTable is an 8-byte attr that is used as a pointer to the next version in the delta chain, while its null bit is used to represent "presence". Basically, that's what the DataTable uses to find empty slots.

We want to add a feature that the second column is an 8-byte attr that is used as a TupleId (unique, suitable for use as a primary key) while its null bit is used to represent logically deleted.

storage::TupleAccessStrategy::Allocate is inefficient

Right now we cmpxchg every single bit (8 times on a byte) linearly until successful. Realistically we can test for bits 64 at a time and only cmpxchg on the ones that tested false.

CMake cleanup

At some point, we should decide what we want to keep/remove (e.g. make lint).

Transaction layer randomized testing

Large, randomized multithreaded testing for the transaction layer to ensure correctness

Enhance ObjectPool for runtime changes and pre-allocation

The object pool implementation now is basic and simple keeps a queue around for reuse up to a certain limit. We want 2 things:

The reuse limit should be tunable. We might want to increase the reuse limit when memory usage goes up, and free memory when it goes back down.
When the queue is empty and we are calling malloc, we only do it one at a time for now. It might make sense to put several objects onto the queue at the same time (maybe contiguous in memory?).

New scenarios for large_transaction_test (& w/ GC)

We have large scale tests that generate randomized workloads to run against the storage layer. The general framework is in place, and it has helped us find many bugs, but we have discovered that certain bugs only appear under certain configurations (e.g. high abort rates, large number of read-only transactions, way too many threads).

Currently we only have one test scenario for each of the test cases, with numbers that largely don't mean anything. The goal would be to come up with a variety of meaningful configurations that will help us smoke out different classes of bugs.

The tests themselves are under test/storage/large_garbage_collector_test.cpp and test/transaction/large_transaction_test.cpp. The small framework we use for them can be found in test/util/transaction_test_util.cpp and test/include/util/transaction_test_util.h. The tunable parameters exposed are documented and the existing test cases should serve as good examples.

Come up with a configuration, describe the scenario it models, and if you are curious, verify that it can find bugs by injecting the kind of bug you think this would lead to into the transaction system. This is also a good opportunity to read the related pieces of the system and familiarize yourself with the codebase and toolchain.

Take a scenario and comment below so others don't work on the same thing.

Rewrite SpinLatch wait

@mbutrovich is suggesting that _mm_pause() will perform terribly on newer platforms.

We can probably replace that with newer language features.

64-bit alignment for RawConcurrentBitmap

In RawConcurrentBitmap we assumed that bits_ is 64-bit aligned at 0.

This is no longer the case in ProjectedRowInit in delta_record.h.

We have two options:

add a check to RawConcurrentBitmap. In theory, this is at most 7 more 1-byte loads. In practice, adding the alignment check decreases the number of flips by some 20-60 million items/s when you benchmark FirstUnsetPos from 0.
make sure it is the case by modifying delta_record. I have not tried this yet.

I will keep messing with it over the weekend.

Schema

SqlTable will be responsible for translating the SQL notion of schema to the storage layer's BlockLayout. We'll likely need to bring in Peloton's representation of schema. We also want to add the notion of layout_version to the DataTable level.

SqlTable

tuple_id discussion
- whether we need it to represent logical deletes. Not really
- whether having a layer of indirection has any benefits (in indexes, logging, elsewhere) Probably not
- secondary indexes, logical or physical? Physical, but HyPer did logical, so we should ask them.
Should SqlTable own indexes or oids? Own
- might make sense to just haveSqlTable own indexes until we have Catalog, and make decision then
Column access? (immutable flag / synopsis needed) Punt
Recycling can be written now that logical deletes are useful
- this includes potential optimization to the DataTable to reuse blocks. It didn't make sense because we effectively never remove tuples before.

Revisit scale numbers (iterations, etc.) on Google Tests

When #61 is in, we should be fast enough to remove the hacky throttle we put on TAS tests.