GithubHelp home page GithubHelp logo

scylladb / scylladb Goto Github PK

View Code? Open in Web Editor NEW
12.6K 341.0 1.2K 168.79 MB

NoSQL data store using the seastar framework, compatible with Apache Cassandra

Home Page: http://scylladb.com

License: GNU Affero General Public License v3.0

C++ 78.93% Hack 0.05% Thrift 0.15% Shell 0.34% Python 19.50% GAP 0.41% CMake 0.33% Assembly 0.06% Dockerfile 0.01% Ragel 0.01% Nix 0.02% Rust 0.11% Lua 0.07% C 0.01%
nosql c-plus-plus scylla seastar cassandra database cpp

scylladb's Introduction

Scylla

Slack Twitter

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++20 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and Thrift. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

  • The community forum and Slack channel are for users to discuss configuration, management, and operations of the ScyllaDB open source.
  • The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

scylladb's People

Contributors

alecco avatar amnonh avatar annastuchlik avatar argenet avatar asias avatar avikivity avatar bhalevy avatar cvybhu avatar deexie avatar denesb avatar duarten avatar elcallio avatar espindola avatar gleb-cloudius avatar kbr-scylla avatar kostja avatar michoecho avatar nyh avatar patjed41 avatar pdziepak avatar penberg avatar piodul avatar psarna avatar raphaelsc avatar syuu1228 avatar tchaikov avatar tgrabiec avatar vladzcloudius avatar wmitros avatar xemul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scylladb's Issues

WARNING: closing file in reactor thread during exception recovery

Please fix or quiet this warning. It is too noise.

WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery

Incorrcect error code return when consistency_level can not be met

Urchin returns code=0000 [Server error] Origin returns code=1000 [Unavailable exception]

create a 3 node cluster

run the following via cqlsh

CREATE KEYSPACE keyspace1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} ;

CREATE TABLE keyspace1.standard1 (
key blob PRIMARY KEY,
"C0" blob,
"C1" blob,
"C2" blob,
"C3" blob,
"C4" blob
) WITH compression = {};

stop node 3

connect via cqlsh and run

consistency all
insert into keyspace1.standard1 (key) values (0x01)

On Urchin you get

cqlsh> consistency all
Consistency level set to ALL.
cqlsh> insert into keyspace1.standard1 (key,c0) values (0x01,0x02)
... ;
code=2200 [Invalid query] message="Unknown identifier c0"
cqlsh> insert into keyspace1.standard1 (key,C0) values (0x01,0x02) ;
code=2200 [Invalid query] message="Unknown identifier c0"
cqlsh> insert into keyspace1.standard1 (key) values (0x01) ;
<ErrorMessage code=0000 [Server error] message="Cannot achieve consistency level">
cqlsh>

On origin you get

cqlsh> consistency all
Consistency level set to ALL.
cqlsh> insert into keyspace1.standard1 (key) values (0x01) ;
Traceback (most recent call last):
File "/home/shlomi/cassandra/bin/cqlsh", line 980, in perform_simple_statement
rows = self.session.execute(statement, trace=self.tracing_enabled)
File "/home/shlomi/cassandra/bin/../lib/cassandra-driver-internal-only-2.1.3.zip/cassandra-driver-2.1.3/cassandra/cluster.py", line 1295, in execute
result = future.result(timeout)
File "/home/shlomi/cassandra/bin/../lib/cassandra-driver-internal-only-2.1.3.zip/cassandra-driver-2.1.3/cassandra/cluster.py", line 2799, in result
raise self._final_exception
Unavailable: code=1000 [Unavailable exception] message="Cannot achieve consistency level ALL" info={'required_replicas': 3, 'alive_replicas': 2, 'consistency': 'ALL'}

This is failing the DTEST consistency_test.py - test_simple_strategy (modified version avilable on slivne/urchin-dtest branch consistency_test_simple_strategy)
nosetests -v consistency_test.py:TestAvailability.test_simple_strategy

Queries do not return rows that contain only partition key

Using Head 98c38a7

On Urchin

Connected to test at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.2.0 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cqlsh> CREATE KEYSPACE keyspace3 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> CREATE TABLE keyspace3.standard2 ( key text PRIMARY KEY, C0 text, C1 text, C2 text, C3 blob, C4 text);
cqlsh> insert into keyspace3.standard2 (key,c0) values ('2','2');
cqlsh> insert into keyspace3.standard2 (key) values ('1');
cqlsh> select * from keyspace3.standard2;

key | c0 | c1 | c2 | c3 | c4
-----+----+------+------+------+------
2 | 2 | null | null | null | null

(1 rows)
cqlsh> select * from keyspace3.standard2 where key='1';

key | c0 | c1 | c2 | c3 | c4
-----+----+----+----+----+----

(0 rows)
cqlsh>

On Origin

shlomi@localhost~/urchin (master)$ ~/cassandra/bin/cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.0.0-SNAPSHOT | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cqlsh> CREATE KEYSPACE keyspace3 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> CREATE TABLE keyspace3.standard2 ( key text PRIMARY KEY, C0 text, C1 text, C2 text, C3 blob, C4 text);
cqlsh> insert into keyspace3.standard2 (key) values ('1');
cqlsh> insert into keyspace3.standard2 (key,c0) values ('2','2');
cqlsh> select * from keyspace3.standard2;

key | c0 | c1 | c2 | c3 | c4
-----+------+------+------+------+------
2 | 2 | null | null | null | null
1 | null | null | null | null | null

(2 rows)

cqlsh> select * from keyspace3.standard2 where key='1';

key | c0 | c1 | c2 | c3 | c4
-----+------+------+------+------+------
1 | null | null | null | null | null

(1 rows)

rpc/messaging_service handler is always executed on cpu0

I tried this:
sever$ ./message --listen-address 127.0.0.100
client$ ./message --listen-address 127.0.0.200 --server 127.0.0.100

There 2 cpus available on the system.
I run the client 1000+ times. Debug message shows hander is always executed on cpu0.

We don't encode error messages properly for CQL binary protocol

This makes cassandra-stress error out like this:

ERROR 20:31:50 Exception in response
io.netty.handler.codec.DecoderException: org.apache.cassandra.transport.messages.ErrorMessage$WrappedException: java.lang.IndexOutOfBoundsException: readerIndex(53) + length(2) exceeds writerIndex(53): SlicedByteBuf(ridx: 53, widx: 53, cap: 53/53, unwrapped: UnpooledUnsafeDirectByteBuf(ridx: 62, widx: 62, cap: 64))
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:99) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31]
Caused by: org.apache.cassandra.transport.messages.ErrorMessage$WrappedException: java.lang.IndexOutOfBoundsException: readerIndex(53) + length(2) exceeds writerIndex(53): SlicedByteBuf(ridx: 53, widx: 53, cap: 53/53, unwrapped: UnpooledUnsafeDirectByteBuf(ridx: 62, widx: 62, cap: 64))
    at org.apache.cassandra.transport.messages.ErrorMessage.wrap(ErrorMessage.java:256) ~[main/:na]
    at org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:273) ~[main/:na]
    at org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:235) ~[main/:na]
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) [netty-all-4.0.23.Final.jar:4.0.23.Final]
    ... 17 common frames omitted
Caused by: java.lang.IndexOutOfBoundsException: readerIndex(53) + length(2) exceeds writerIndex(53): SlicedByteBuf(ridx: 53, widx: 53, cap: 53/53, unwrapped: UnpooledUnsafeDirectByteBuf(ridx: 62, widx: 62, cap: 64))
    at io.netty.buffer.AbstractByteBuf.checkReadableBytes(AbstractByteBuf.java:1175) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.buffer.AbstractByteBuf.readShort(AbstractByteBuf.java:589) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.buffer.AbstractByteBuf.readUnsignedShort(AbstractByteBuf.java:597) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at org.apache.cassandra.transport.CBUtil.readConsistencyLevel(CBUtil.java:186) ~[main/:na]
    at org.apache.cassandra.transport.messages.ErrorMessage$1.decode(ErrorMessage.java:80) ~[main/:na]
    at org.apache.cassandra.transport.messages.ErrorMessage$1.decode(ErrorMessage.java:43) ~[main/:na]
    at org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:247) ~[main/:na]
    ... 19 common frames omitted

That's because it expects to read consistency level field for READ_TIMEOUT error message but we don't write it in transport/server.cc.

Relevant code is in Origin in org.apache.cassandra.transport.messages.ErrorMessage.

Log directory

Log should be be written to /var/log/scylla/system.log not /var/log/urchin.log
to match origin convention

"storage_service: fail to update schema_version" errors

On current head

commit 150c28e
Author: Gleb Natapov [email protected]
Date: Tue Jul 21 18:04:50 2015 +0300

storage_proxy: do not ignore connection errors

Do nothing about them for now. Read will eventually fail on timeout.

running a cluster of 3 nodes with --default-log-level debug

running the following toward node 1 via cqlsh

CREATE KEYSPACE keyspace1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} ;

CREATE TABLE keyspace1.standard1 (
key blob PRIMARY KEY,
"C0" blob,
"C1" blob,
"C2" blob,
"C3" blob,
"C4" blob
) WITH compression = {};

I get on node2,3

SS::on_change endpoint=127.0.0.1
storage_service:: Update ep=127.0.0.1, state=2, value=d1902779-779e-397e-9f5e-0729dc667086
storage_service: fail to update schema_version for 127.0.0.1: boost::exception_detail::clone_implboost::exception_detail::error_info_injector<boost::bad_any_cast > (boost::bad_any_cast: failed conversion using boost::any_cast)

@penberg I am not sure if this is from enabling logging (which means it was there all the time) or from something else

TCP issues with dpdk

Playing with Scylla in dpdk mode using the following methods causes tcp connections
to pause. It's visible to observe it using tcpdump on the client - A Syn is sent without a
response. After some time the situation is resolved (future connections).

In order to reproduce:
- Run ./tools/bin/cassandra-stress write -mode simplenative cql3 prepared -rate threads=500 -node 1.1.1.1
- In parallel run the examples/perf program: https://github.com/cloudius-systems/urchin/wiki/Using-example-perf.c-from-the-cpp-driver
- Stop/start them a bit until connections get stuck.
- Use telnet to SERVER 9042 and see that sometimes it doesn't respond.

The issue does not exist in the posix mode

storage_proxy::get_live_sorted_endpoints - bad selection of target replicas with RF > 1 in case of failures

Head at commit 8ba5d19
Author: Avi Kivity [email protected]
Date: Tue Jul 21 12:19:54 2015 +0300

db: avoid ubsan false-positive in query_state move constructor

The value is moved before initialization due to a do_with().  It's harmless,
but better to silence the warning.

I have added a printout in get_live_sorted_endpoints - to debug the case of failing to read after a node has been killed.

storage_proxy::get_read_executor calls -

std::vectorgms::inet_address storage_proxy::get_live_sorted_endpoints(keyspace& ks, const dht::token& token) {
auto& rs = ks.get_replication_strategy();
std::vectorgms::inet_address eps = rs.get_natural_endpoints(token);
boost::range::remove_if(eps, std::not1(std::bind1st(std::mem_fn(&gms::failure_detector::is_alive), &gms::get_local_failure_detector())));
for (auto ep: eps) {
std::cout << "(" << ep << "," << gms::get_local_failure_detector().is_alive(ep) << ")";
}
std::cout << "\n";
// DatabaseDescriptor.getEndpointSnitch().sortByProximity(FBUtilities.getBroadcastAddress(), liveEndpoints);
return eps;
}

The following is printed out
(127.0.0.3,1)(127.0.0.2,0)(127.0.0.1,0)

To reproduce:

  1. start a 3 node cluster

  2. run the following via cqlsh
    CREATE KEYSPACE keyspace1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'} ;

CREATE TABLE keyspace1.standard1 (
key blob PRIMARY KEY,
"C0" blob,
"C1" blob,
"C2" blob,
"C3" blob,
"C4" blob
) WITH compression = {};

  1. run the following via cqlsh
    insert into keyspace1.standard1 (key) values (0x01);
    insert into keyspace1.standard1 (key) values (0x02);
    insert into keyspace1.standard1 (key) values (0x03);
    insert into keyspace1.standard1 (key) values (0x04);
    insert into keyspace1.standard1 (key) values (0x05);
    insert into keyspace1.standard1 (key) values (0x06);
    insert into keyspace1.standard1 (key) values (0x07);
    insert into keyspace1.standard1 (key) values (0x08);
    insert into keyspace1.standard1 (key) values (0x09);

  2. kill node1, node 2

  3. run http://127.0.0.3:10000/gossiper/endpoint/live/ - wait till you get a response holding 127.0.0.3 only

  4. connect with cqlsh 127.0.0.3
    run
    select * from keyspace1.standard1 where key=0x08;

you will get:
(127.0.0.3,1)(127.0.0.2,0)(127.0.0.1,0)

the printout should only hold the live node 127.0.0.3

Fails to insert value into table with a 2 nodes cluster sometimes

I used the following code to create keyspace and table and then insert value into it.

Sometimes, it works perfectly fine. Sometimes, values can not be inserted.

future<> stream_session::test(distributed<cql3::query_processor>& qp) {
    if (utils::fb_utilities::get_broadcast_address() == inet_address("127.0.0.1")) {
        auto tester = make_shared<timer<lowres_clock>>();
        tester->set_callback ([tester, &qp] {
            seastar::async([&qp] {
                sslog.debug("================ STREAM_PLAN TEST ==============");
                auto cs = service::client_state::for_external_calls();
                service::query_state qs(cs);
                auto opts = make_shared<cql3::query_options>(cql3::query_options::DEFAULT);
                qp.local().process("CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };", qs, *opts).get();
                sslog.debug("CREATE KEYSPACE = KS DONE");
                qp.local().process("CREATE TABLE ks.tb ( key text PRIMARY KEY, C0 text, C1 text, C2 text, C3 blob, C4 text);", qs, *opts).get();
                sslog.debug("CREATE TABLE = TB DONE");
                qp.local().process("insert into ks.tb (key,c0) values ('1','1');", qs, *opts).get();
                sslog.debug("INSERT VALUE DONE: 1");
                qp.local().process("insert into ks.tb (key,c0) values ('2','2');", qs, *opts).get();
                sslog.debug("INSERT VALUE DONE: 2");
                qp.local().process("insert into ks.tb (key,c0) values ('3','3');", qs, *opts).get();
                sslog.debug("INSERT VALUE DONE: 3");
                qp.local().process("insert into ks.tb (key,c0) values ('4','4');", qs, *opts).get();
                sslog.debug("INSERT VALUE DONE: 4");
                qp.local().process("insert into ks.tb (key,c0) values ('5','5');", qs, *opts).get();
                sslog.debug("INSERT VALUE DONE: 5");
                qp.local().process("insert into ks.tb (key,c0) values ('6','6');", qs, *opts).get();
                sslog.debug("INSERT VALUE DONE: 6");
            }).then([] {
                sleep(std::chrono::seconds(3)).then([] {
                    sslog.debug("================ START STREAM  ==============");
                    auto sp = stream_plan("MYPLAN");
                    auto to = inet_address("127.0.0.2");
                    std::vector<query::range<token>> ranges = {query::range<token>::make_open_ended_both_sides()};
                    std::vector<sstring> cfs{"tb"};
                    sp.transfer_ranges(to, to, "ks", std::move(ranges), std::move(cfs)).execute();
                });
            });
        });
        tester->arm(std::chrono::seconds(10));
    }
    return make_ready_future<>();
}


diff --git a/main.cc b/main.cc
index 5e48af7..4569c54 100644
--- a/main.cc
+++ b/main.cc
@@ -156,6 +156,8 @@ int main(int ac, char** av) {
                 }).then([api_port] {
                     std::cout << "Seastar HTTP server listening on port " << api_port << " ...\n";
                 });
+            }).then([&qp] {
+                return streaming::stream_session::test(qp);
             }).or_terminate();
         });
     });

Output when works:

Start Storage service ...
WARNING: Not implemented: COMPACT_TABLES
Messaging server listening on ip 127.0.0.1 port 7000 ...
Populating Keyspace system
WARNING: Not implemented: RANGE_QUERIES
WARNING: Not implemented: VALIDATION
WARNING: Not implemented: INDEXES
Start gossiper service ...
WARNING: Not implemented: GOSSIP
CQL server listening on port 9042 ...
Thrift server listening on port 9160 ...
Seastar HTTP server listening on port 10000 ...
================ STREAM_PLAN TEST ==============
WARNING: Not implemented: AUTH
WARNING: Not implemented: METRICS
WARNING: Not implemented: PERMISSIONS
CREATE KEYSPACE = KS DONE
CREATE TABLE = TB DONE
WARNING: Not implemented: TRIGGERS
INSERT VALUE DONE: 1
INSERT VALUE DONE: 2
INSERT VALUE DONE: 3
INSERT VALUE DONE: 4
INSERT VALUE DONE: 5
INSERT VALUE DONE: 6
================ START STREAM  ==============

Output when fails:

Start Storage service ...
WARNING: Not implemented: COMPACT_TABLES
Messaging server listening on ip 127.0.0.1 port 7000 ...
Populating Keyspace system
WARNING: Not implemented: RANGE_QUERIES
WARNING: Not implemented: VALIDATION
WARNING: Not implemented: INDEXES
Start gossiper service ...
WARNING: Not implemented: GOSSIP
CQL server listening on port 9042 ...
Thrift server listening on port 9160 ...
Seastar HTTP server listening on port 10000 ...
================ STREAM_PLAN TEST ==============
WARNING: Not implemented: AUTH
WARNING: Not implemented: METRICS
WARNING: Not implemented: PERMISSIONS
CREATE KEYSPACE = KS DONE
CREATE TABLE = TB DONE
WARNING: Not implemented: TRIGGERS

cmd:

killall -9 scylla
rm -rf /home/asias/src/cloudius-systems/urchin/tmp/{1..4}/*
rm -f  /tmp/out{1..4}
./build/release/scylla --logger-log-level stream_session=debug -c 1 -m 128M --rpc-address 127.0.0.1 --listen-address 127.0.0.1 --seed-provider-parameters seeds=127.0.0.1 --datadir `pwd`/tmp/1 --commitlog-directory `pwd`/tmp/1 >/tmp/out1 2>&1 &
./build/release/scylla --logger-log-level stream_session=debug -c 1 -m 128M --rpc-address 127.0.0.2 --listen-address 127.0.0.2 --seed-provider-parameters seeds=127.0.0.1 --datadir `pwd`/tmp/2 --commitlog-directory `pwd`/tmp/2 >/tmp/out2 2>&1 &

Following - Schema pull - gossiper stopps removing killed nodes from gossiper::live_endpoints

  1. start a cluster of 3 nodes (you can enable gossip logging with --default-log-level trace)
  2. kill first node
  3. the other two nodes will report (after 30 seconds)

FatClient 127.0.0.1 has been silent for 30000ms, removing from gossip

you can check the live_endpoints via

curl -X GET --header "Accept: application/json" "http://127.0.0.2:10000/gossiper/endpoint/live/"

with commit

commit b737a85f0834a86024e2fe5b53a2ec6d704b1262
Author: Gleb Natapov <[email protected]>
Date:   Tue Jul 21 17:51:47 2015 +0300

    storage_proxy: fix get_live_sorted_endpoints()

    remove_if() does not really removes anything.

    Fixes #33

this works

when applying

commit a547b881a7264d445012a8fce59c187aac444b8a

 Merge "Schema pull" from Pekka

"This series enables the schema pull functionality. It's used to
synchronize schema at node startup for schema changes that happened in
the cluster while the node was down.

Node 1:

  # Node 2 is not running.

  [penberg@nero apache-cassandra-2.1.7]$ ./bin/cqlsh --no-color 127.0.0.1
  Connected to Test Cluster at 127.0.0.1:9042.
  [cqlsh 5.0.1 | Cassandra 2.2.0 | CQL spec 3.2.0 | Native protocol v3]
  Use HELP for help.
  cqlsh> CREATE KEYSPACE keyspace3 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
  cqlsh> SELECT * FROM system.schema_keyspaces;

   keyspace_name | durable_writes | strategy_class                             | strategy_options
  ---------------+----------------+--------------------------------------------+----------------------------
       keyspace3 |           True |                             SimpleStrategy | {"replication_factor":"1"}
          system |           True | org.apache.cassandra.locator.LocalStrategy |                         {}

  (2 rows)
  cqlsh> SELECT key, schema_version FROM system.local;

   key   | schema_version
  -------+--------------------------------------
   local | c3a18ddc-80c5-3a25-b82d-57178a318771

  (1 rows)

Node 2:

  # Node 2 is started.

  [penberg@nero apache-cassandra-2.1.7]$ ./bin/cqlsh --no-color 127.0.0.2
  Connected to Test Cluster at 127.0.0.2:9042.
  [cqlsh 5.0.1 | Cassandra 2.2.0 | CQL spec 3.2.0 | Native protocol v3]
  Use HELP for help.
  cqlsh> SELECT * FROM system.schema_keyspaces;

   keyspace_name | durable_writes | strategy_class                             | strategy_options
  ---------------+----------------+--------------------------------------------+----------------------------
       keyspace3 |           True |                             SimpleStrategy | {"replication_factor":"1"}
          system |           True | org.apache.cassandra.locator.LocalStrategy |                         {}

  (2 rows)
  cqlsh> SELECT key, schema_version FROM system.local;

   key   | schema_version
  -------+--------------------------------------
   local | c3a18ddc-80c5-3a25-b82d-57178a318771

  (1 rows)"

this stops working that is node1 is not removed from node2,node3 (there is no printout and the live_endpoints of node2 keep node1 for ever)

@penberg can you please check this out.

this seems to be the cause of regression between in http://jenkins.cloudius-systems.com:8080/job/urchin-dtest/label=master/211/ and http://jenkins.cloudius-systems.com:8080/job/urchin-dtest/label=master/212/

when simple_cluster_driver_test.TestSimpleCluster.simple_rf_3_consistency_level_tests started to fail (again)

I know we have many gaps with killing a node and are missing some non converted code - but this seemed to already be working.

db::system_keyspace::local_cache Assertion `_instances.empty()' failed

scylla: /home/asias/src/cloudius-systems/urchin/seastar/core/sharded.hh:227: seastar::sharded::~sharded() [with Service = db::system_keyspace::local_cache]: Assertion `_instances.empty()' failed.

I saw this when scylla is killed.

Need to shutdown db::system_keyspace::local_cache when exits.

Logging should use prefix notation

All log messages should use a prefix of
"TRACE", "DEBUG", "INFO", "WARN", "ERROR" AND "FATAL"
currently there are messages with no prefix at all, which make it harder to parse

Broken pipe when killing a client

When running scylla head (095c2f2) using posix and testing
it against the datastax cpp driver with example/perf I got a broken pipe when the client was ctrl-c

To compile the client, download https://github.com/datastax/cpp-driver ,
build it (cmake) plus build the perf example gcc perf.c -o p -lcassandra -luv -L../../build -I../../include/

Results:

WARNING: exceptional future ignored of type 'std::system_error': Error system:32 (Broken pipe)

Program received signal SIGPIPE, Broken pipe.
0x00007ffff419d530 in writev () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff419d530 in writev () from /lib64/libc.so.6
#1  0x00000000004bfa9e in writev (iovcnt=<optimized out>, iov=<optimized out>, this=<optimized out>) at ./core/posix.hh:261
#2  operator() (__closure=__closure@entry=0x7fffffffd0f0) at ./core/reactor.hh:1161
#3  apply (args=args@entry=<unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x5c9149, DIE 0x66d56a>, 
    func=func@entry=<unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x5c9149, DIE 0x66d558>) at ./core/apply.hh:34
#4  apply<pollable_fd::write_some(net::packet&)::<lambda()> > (
    args=args@entry=<unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x5c9149, DIE 0x66d5ad>, 
    func=func@entry=<unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x5c9149, DIE 0x66d599>) at ./core/apply.hh:42
#5  futurize<future<unsigned long> >::apply<pollable_fd::write_some(net::packet&)::{lambda()#1}>(pollable_fd::write_some(net::packet&)::{lambda()#1}&&, std::tuple<>&&) (func=func@entry=<unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x5c9149, DIE 0x6b359f>, 
    args=args@entry=<unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x5c9149, DIE 0x6b3a46>) at ./core/future.hh:988
#6  0x00000000004bfbb7 in future<>::then<future<unsigned long>, pollable_fd::write_some(net::packet&)::{lambda()#1}, future<unsigned long> future<>::then<{lambda()#1}, future<unsigned long> >({lambda()#1}&&)::{lambda(future_state<>&&)#1}>(future<>::then&&, future<unsigned long> future<>::then<{lambda()#1}, future<unsigned long> >({lambda()#1}&&)::{lambda(future_state<>&&)#1}&&) (this=this@entry=0x7fffffffd0e0, 
    func=func@entry=<unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x5c9149, DIE 0x6b3c0c>, 
    param=param@entry=<unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x5c9149, DIE 0x6b3c22>) at ./core/future.hh:625
#7  0x00000000004bf252 in then<pollable_fd::write_some(net::packet&)::<lambda()>, future<long unsigned int> > (
    func=<unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x5c9149, DIE 0x6b023e>, this=0x7fffffffd0e0) at ./core/future.hh:753
#8  pollable_fd::write_some (this=<optimized out>, p=...) at ./core/reactor.hh:1169
#9  0x00000000004bf2ec in pollable_fd::write_all (this=<optimized out>, p=...) at ./core/reactor.hh:1180
#10 0x00000000004b8437 in net::posix_data_sink_impl::put (this=0x600000a4f150, p=...) at net/posix-stack.cc:166
#11 0x00000000006b1902 in put (p=..., this=0x600000aa3448) at /home/dor/src/urchin/seastar/core/iostream.hh:100
#12 output_stream<char>::write (this=this@entry=0x600000aa3448, p=...) at /home/dor/src/urchin/seastar/core/iostream-impl.hh:65
#13 0x00000000006a2407 in write (msg=..., this=0x600000aa3448) at /home/dor/src/urchin/seastar/core/iostream-impl.hh:49
#14 operator() (__closure=__closure@entry=0x7fffffffd260) at transport/server.cc:624
#15 apply (args=<optimized out>, func=func@entry=<unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x2c91d99, DIE 0x2dd658d>)
    at /home/dor/src/urchin/seastar/core/apply.hh:34
#16 apply<cql_server::connection::write_response(shared_ptr<cql_server::response>)::<lambda()> > (args=<optimized out>, 
    func=func@entry=<unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x2c91d99, DIE 0x2dd65fb>)
    at /home/dor/src/urchin/seastar/core/apply.hh:42
#17 futurize<future<> >::apply<cql_server::connection::write_response(shared_ptr<cql_server::response>)::<lambda()> >(<unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x2c91d99, DIE 0x2dd663b>, <unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x2c91d99, DIE 0x2dd664e>) (func=func@entry=<unknown type in /home/dor/src/urchin/build/release/scylla, CU 0x2c91d99, DIE 0x2dd663b>, args=<optimized out>)
    at /home/dor/src/urchin/seastar/core/future.hh:988

Unable to connect with Cassandra 2.1.0 cqlsh

When I try connect with cqlsh from Cassandra 2.1.0, I get the following error:

[penberg@nero apache-cassandra-2.1.0]$ ./bin/cqlsh 127.0.0.1
Connection error: ('Unable to connect to any servers', {'127.0.0.1': KeyError('column_aliases',)})

Cassandra 2.1.7 works OK.

ridiculously large test executables

Urchin's tests' executables are ridiculously large - between 1/4 and 1/2 a gigabyte (!!) each.

We cannot continue this way - we're starting now to avoid writing new tests (or artificially merging them into existing tests, or using other ugly workarounds) just to save disk space. My Urchin build tree is now almost 20 gigabytes in size...

Some of this size can be explained by C++'s inflated debug information - stripping a 400 MB test brings it down to 150 MB. But this is still huge, and completely unjustified.

Seastar's tests are also pretty big (up to 30 MB) but nothing close to the Urchin tests.

I think we need to understand:

  1. Why our individual object files (.o) are so large. Some of them are 100 MB in size!
  2. What makes our debug information so gigantic (might be related to previous issue - e.g., very long identifier names?).
  3. We can use a static library (.a) so the sstable test will not bring in rpc's code (for example) and make the test much smaller.
  4. We can create an Urchin shared library (.so) and use that when linking the tests, so the individual test executables will (hopefully) come out very small.

This 4 (shared library) is probably what we'll need to do: It will work around all the other problems without spending any effort on fixing them. I.e., even if we end up with a single huge shared library with huge debug information in it, at least we'll have just one copy of it, and not in every test executable.

If, despite solution 4 above, executables are still large, it means we're getting tons of code through the header files. This would need to be fixed (if it is a problem at all).

Excessive DHCP printout on new lease

I got this on AWS with dpdk and pci device assignment.
The VM was up for a while so I don't know whether there was a period without
networking. It worked all the time for me but worth looking into it and worth to check
the level of these printouts:

DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP sending discover
DHCP timeout

Log commitlog and compaction progress

We need it to be like origin:

INFO 12:27:51 Enqueuing flush of compactions_in_progress: 164 (0%) on-heap, 0 (0%) off-heap
INFO 12:27:51 Writing Memtable-compactions_in_progress@1174094719(8 serialized bytes, 1 ops, 0%/0% of on/off-heap limit)
INFO 12:27:51 Completed flushing /data/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-4-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=1436271859142, position=8145773)
INFO 12:27:51 Compacting [SSTableReader(path='/data/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-2-Data.db'), SSTableReader(path='/data/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-1-Data.db'), SSTableReader(path='/data/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-3-Data.db'), SSTableReader(path='/data/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-4-Data.db')]
INFO 12:27:51 Compacted 4 sstables to [/data/var/lib/cassandra/data/keyspace1/standard1-34b16ba024a311e588eab980f3a13a97/keyspace1-standard1-ka-11,]. 1,028,147,528 bytes to 281,000,000 (~27% of original) in 61,605ms = 4.350012MB/s. 3,658,888 total partitions merged to 1,000,000. Partition merge counts were {3:341112, 4:658888, }
INFO 12:27:51 Compacted 4 sstables to [/data/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-5,]. 427 bytes to 42 (~9% of original) in 13ms = 0.003081MB/s. 4 total partitions merged to 1. Partition merge counts were {2:2, }

SIGABORT when creating a table with COMPACT STORAGE

with
when trying to create a compact storage table

commit 095c2f2920b2dd592fe6b57c5436c0b6163829bc
Merge: 73dfa66 45b4471
Author: Avi Kivity <[email protected]>
Date:   Sat Jul 25 17:47:40 2015 +0300

    Merge "Fixes for partition_range model" from Tomasz

    "range::is_wrap_around() will not work with current ring_position, because it
    relies on total ordering. Same for range::contains(). Currently ring_position
    is weakly ordered. This series fixes this problem by making ring_position
    totally ordered.

    Another problem fixed by this series is handling of wrap-around ranges. In
    Origin, ]x; x] is treated as a wrap around range covering whole ring."

create keyspace ks WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1};
CREATE TABLE ks.tb ( key int primary key, c0 int ) WITH COMPACT STORAGE;

I am getting

scylla: cql3/statements/create_table_statement.cc:133: void cql3::statements::create_table_statement::add_column_metadata_from_aliases(schema_builder&, std::vector<basic_sstring<signed char, unsigned int, 31u> >, const std::vector<shared_ptr<const abstract_type> >&, column_kind): Assertion `aliases.size() == types.size()' failed.

Program received signal SIGABRT, Aborted.
0x0000003b096348c7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install jsoncpp-0.6.0-0.14.rc2.fc21.x86_64 libcom_err-1.42.12-1.fc21.x86_64 lz4-r127-1.fc21.x86_64 openssl-libs-1.0.1k-1.fc21.x86_64
(gdb) where
#0  0x0000003b096348c7 in raise () from /lib64/libc.so.6
#1  0x0000003b0963652a in abort () from /lib64/libc.so.6
#2  0x0000003b0962d46d in __assert_fail_base () from /lib64/libc.so.6
#3  0x0000003b0962d522 in __assert_fail () from /lib64/libc.so.6
#4  0x00000000006c3e18 in cql3::statements::create_table_statement::add_column_metadata_from_aliases (this=this@entry=0x600000108f40, builder=..., aliases=std::vector of length 0, capacity 0, 
    types=std::vector of length 1, capacity 1 = {...}, kind=kind@entry=column_kind::clustering_key) at cql3/statements/create_table_statement.cc:133
#5  0x00000000006c4f17 in cql3::statements::create_table_statement::apply_properties_to (this=this@entry=0x600000108f40, builder=...) at cql3/statements/create_table_statement.cc:122
#6  0x00000000006c5053 in cql3::statements::create_table_statement::get_cf_meta_data (this=this@entry=0x600000108f40) at cql3/statements/create_table_statement.cc:108
#7  0x00000000006c51e2 in operator() (__closure=<optimized out>) at cql3/statements/create_table_statement.cc:82
#8  apply (args=<optimized out>, func=<optimized out>) at /home/shlomi/urchin/seastar/core/apply.hh:34
#9  apply<cql3::statements::create_table_statement::announce_migration(distributed<service::storage_proxy>&, bool)::<lambda()> > (args=<optimized out>, func=<optimized out>)
    at /home/shlomi/urchin/seastar/core/apply.hh:42
#10 apply<cql3::statements::create_table_statement::announce_migration(distributed<service::storage_proxy>&, bool)::<lambda()> > (args=<optimized out>, func=<optimized out>)
    at /home/shlomi/urchin/seastar/core/future.hh:971
#11 then<future<>, cql3::statements::create_table_statement::announce_migration(distributed<service::storage_proxy>&, bool)::<lambda()>, future<T>::then(Func&&) [with Func = cql3::statements::create_table_statement::announce_migration(distributed<service::storage_proxy>&, bool)::<lambda()>; Result = future<>; T = {}]::<lambda(future_state<>&&)> > (param=<optimized out>, func=<optimized out>, this=0x7fffffffca00)
    at /home/shlomi/urchin/seastar/core/future.hh:625
#12 then<cql3::statements::create_table_statement::announce_migration(distributed<service::storage_proxy>&, bool)::<lambda()>, future<> > (func=<optimized out>, this=0x7fffffffca00)
    at /home/shlomi/urchin/seastar/core/future.hh:753
#13 cql3::statements::create_table_statement::announce_migration (this=0x600000108f40, proxy=..., is_local_only=<optimized out>) at cql3/statements/create_table_statement.cc:92
#14 0x00000000006d59f7 in cql3::statements::schema_altering_statement::execute0 (this=<optimized out>, proxy=..., state=..., options=..., is_local_only=is_local_only@entry=false)
    at cql3/statements/schema_altering_statement.cc:50
#15 0x00000000006d5fcc in cql3::statements::schema_altering_statement::execute (this=<optimized out>, proxy=..., state=..., options=...) at cql3/statements/schema_altering_statement.cc:55
#16 0x00000000007a82ef in cql3::query_processor::process_statement (this=this@entry=0x6000000ca960, statement=..., query_state=..., options=...) at cql3/query_processor.cc:113
#17 0x00000000007ab248 in cql3::query_processor::process (this=0x6000000ca960, query_string=..., query_state=..., options=...) at cql3/query_processor.cc:95
#18 0x000000000067ee0c in cql_server::connection::process_query (this=this@entry=0x60000009d110, stream=stream@entry=7, buf=...) at transport/server.cc:433
#19 0x0000000000680b58 in operator() (buf=<error reading variable: access outside bounds of object referenced via synthetic pointer>, __closure=<optimized out>) at transport/server.cc:368
#20 apply (args=<unknown type in /home/shlomi/urchin/build/release/scylla, CU 0x2a203c0, DIE 0x2bcbd12>, func=<optimized out>) at /home/shlomi/urchin/seastar/core/apply.hh:34
#21 apply<cql_server::connection::process_request()::<lambda(future<std::experimental::optional<cql_binary_frame_v3> >&&)>::<lambda(temporary_buffer<char>)>, temporary_buffer<char> > (
    args=<unknown type in /home/shlomi/urchin/build/release/scylla, CU 0x2a203c0, DIE 0x2b6b14a>, func=<optimized out>) at /home/shlomi/urchin/seastar/core/apply.hh:42
#22 apply<cql_server::connection::process_request()::<lambda(future<std::experimental::optional<cql_binary_frame_v3> >&&)>::<lambda(temporary_buffer<char>)>, temporary_buffer<char> > (
    args=<unknown type in /home/shlomi/urchin/build/release/scylla, CU 0x2a203c0, DIE 0x2b6b192>, func=<optimized out>) at /home/shlomi/urchin/seastar/core/future.hh:971
#23 then<future<>, cql_server::connection::process_request()::<lambda(future<std::experimental::optional<cql_binary_frame_v3> >&&)>::<lambda(temporary_buffer<char>)>, future<T>::then(Func&&) [with Func = cql_server::connection::process_request()::<lambda(future<std::experimental::optional<cql_binary_frame_v3> >&&)>::<lambda(temporary_buffer<char>)>; Result = future<>; T = {temporary_buffer<char>}]::<lambda(future_state<temporary_buffer<char> >&&)> > (param=<optimized out>, func=<optimized out>, this=0x7fffffffcfd0) at /home/shlomi/urchin/seastar/core/future.hh:625
#24 then<cql_server::connection::process_request()::<lambda(future<std::experimental::optional<cql_binary_frame_v3> >&&)>::<lambda(temporary_buffer<char>)>, future<> > (func=<optimized out>, 
    this=0x7fffffffcfd0) at /home/shlomi/urchin/seastar/core/future.hh:753
#25 cql_server::connection::<lambda(future<std::experimental::optional<cql_binary_frame_v3> >&&)>::operator()(<unknown type in /home/shlomi/urchin/build/release/scylla, CU 0x2a203c0, DIE 0x2bcbf7a>) const (
    __closure=__closure@entry=0x60000068a510, v=v@entry=<unknown type in /home/shlomi/urchin/build/release/scylla, CU 0x2a203c0, DIE 0x2bcbf7a>) at transport/server.cc:387
#26 0x00000000006845fd in apply<cql_server::connection::process_request()::<lambda(future<std::experimental::optional<cql_binary_frame_v3> >&&)>, future<std::experimental::optional<cql_binary_frame_v3> > > (
    func=<unknown type in /home/shlomi/urchin/build/release/scylla, CU 0x2a203c0, DIE 0x2b6b58c>) at /home/shlomi/urchin/seastar/core/future.hh:977
#27 operator()<future_state<std::experimental::optional<cql_binary_frame_v3> > > (state=<unknown type in /home/shlomi/urchin/build/release/scylla, CU 0x2a203c0, DIE 0x2bdaf20>, __closure=0x60000068a4f0)
    at /home/shlomi/urchin/seastar/core/future.hh:636
#28 continuation<future<T>::then(Func&&, Param&&) [with Ret = future<>; Func = cql_server::connection::process_request()::<lambda(future<std::experimental::optional<cql_binary_frame_v3> >&&)>; Param = future<T>::then_wrapped(Func&&) [with Func = cql_server::connection::process_request()::<lambda(future<std::experimental::optional<cql_binary_frame_v3> >&&)>; Result = future<>; T = {std::experimental::optional<cql_binary_frame_v3>}]::<lambda(future_state<std::experimental::optional<cql_binary_frame_v3> >&&)>; T = {std::experimental::optional<cql_binary_frame_v3>}; futurize_t<Ret> = future<>]::<lambda(auto:1&&)>, std::experimental::optional<cql_binary_frame_v3> >::run(void) (this=0x60000068a4d0) at /home/shlomi/urchin/seastar/core/future.hh:333
#29 0x000000000043d9e2 in reactor::run_tasks (this=this@entry=0x6000001f7000, tasks=..., quota=<optimized out>) at core/reactor.cc:1037
#30 0x0000000000460634 in reactor::run (this=0x6000001f7000) at core/reactor.cc:1134
#31 0x00000000004a6626 in app_template::run(int, char**, std::function<void ()>&&) (this=this@entry=0x7fffffffd7d0, ac=ac@entry=7, av=av@entry=0x7fffffffda18, 
    func=func@entry=<unknown type in /home/shlomi/urchin/build/release/scylla, CU 0x464954, DIE 0x4fb81e>) at core/app-template.cc:104
---Type <return> to continue, or q <return> to quit---
#32 0x000000000041749c in main (ac=7, av=0x7fffffffda18) at main.cc:169

Support for UTF8 / String validation is broken: Client Drivers report "Cannot decode string as UTF8"

On head:

Running: ~/apache-cassandra-2.1.3/tools/bin/cassandra-stress write n=100000 -mode cql3 simplenative prepared -rate threads=10

produces

java.lang.RuntimeException: org.apache.cassandra.transport.ProtocolException: Cannot decode string as UTF8
at org.apache.cassandra.transport.SimpleClient.execute(SimpleClient.java:201)
at org.apache.cassandra.transport.SimpleClient.prepare(SimpleClient.java:167)
at org.apache.cassandra.stress.operations.predefined.CqlOperation$SimpleClientWrapper.createPreparedStatement(CqlOperation.java:347)
at org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:75)
at org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:107)
at org.apache.cassandra.stress.operations.predefined.CqlOperation.run(CqlOperation.java:253)
at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:299)
Caused by: org.apache.cassandra.transport.ProtocolException: Cannot decode string as UTF8
at org.apache.cassandra.transport.messages.ErrorMessage$1.decode(ErrorMessage.java:56)
at org.apache.cassandra.transport.messages.ErrorMessage$1.decode(ErrorMessage.java:43)
at org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:247)
at org.apache.cassandra.transport.Message$ProtocolDecoder.decode(Message.java:235)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
java.lang.RuntimeException: org.apache.cassandra.transport.ProtocolException: Cannot decode string as UTF8

after bisect the culprit is:

commit 4273c6d
Author: Paweł Dziepak [email protected]
Date: Thu Jul 9 13:41:35 2015 +0200

transport: verify that strings are valid utf8

Signed-off-by: Paweł Dziepak <[email protected]>

@pdziepak can you please have a look

I seems to be generated by the validate function

  • void validate_utf8(sstring_view s) {
  •    try {
    
  •        boost::locale::conv::utf_to_utf<char>(s.data(), boost::locale::conv::stop);
    
  •    } catch (const boost::locale::conv::conversion_error& ex) {
    
  •        throw transport::protocol_exception("Cannot decode string as UTF8");
    
  •    }
    
  • }

Create Table without compression does not set compression_parameters to be '{}'

In Urchin

cqlsh> CREATE KEYSPACE keyspace3 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> CREATE TABLE keyspace3.standard2 ( key blob PRIMARY KEY, C0 blob, C1 blob, C2 blob, C3 blob, C4 blob) with compression = {};
cqlsh> CREATE TABLE keyspace3.standard3 ( key blob PRIMARY KEY, C0 blob, C1 blob, C2 blob, C3 blob, C4 blob) with compression = {'sstable_compression': ''};
cqlsh> select compression_parameters from system.schema_columnfamilies where keyspace_name='keyspace3' and columnfamily_name='standard3';

compression_parameters

             null\n

(1 rows)
cqlsh> select compression_parameters from system.schema_columnfamilies where keyspace_name='keyspace3' and columnfamily_name='standard2';

compression_parameters

             null\n

(1 rows)
cqlsh>

In Origin:

CREATE TABLE keyspace3.standard2 ( key blob PRIMARY KEY, C0 blob, C1 blob, C2 blob, C3 blob, C4 blob) with compression = {};

cqlsh> select compression_parameters from system.schema_columnfamilies where keyspace_name='keyspace3' and columnfamily_name='standard2';

compression_parameters

                 {}

cqlsh> CREATE TABLE keyspace3.standard3 ( key blob PRIMARY KEY, C0 blob, C1 blob, C2 blob, C3 blob, C4 blob) with compression = {'sstable_compression': ''};
cqlsh> select compression_parameters from system.schema_columnfamilies where keyspace_name='keyspace3' and columnfamily_name='standard3';

compression_parameters

                 {}

This causes cassandra-stress with user profile to throw

ERROR 13:36:25 Error parsing schema options for table ....
Cluster.getMetadata().getKeyspace("keyspace2").getTable("standard1").getOptions() will return null
java.lang.IllegalArgumentException: Not a JSON map: null

at com.datastax.driver.core.SimpleJSONParser.parseStringMap(SimpleJSONParser.java:77) ~[cassandra-driver-core-2.1.7-SNAPSHOT.jar:na]
at com.datastax.driver.core.TableMetadata$Options.<init>(TableMetadata.java:598) ~[cassandra-driver-core-2.1.7-SNAPSHOT.jar:na]
at com.datastax.driver.core.TableMetadata.build(TableMetadata.java:128) ~[cassandra-driver-core-2.1.7-SNAPSHOT.jar:na]
at com.datastax.driver.core.Metadata.buildTableMetadata(Metadata.java:187) [cassandra-driver-core-2.1.7-SNAPSHOT.jar:na]
at com.datastax.driver.core.Metadata.rebuildSchema(Metadata.java:147) [cassandra-driver-core-2.1.7-SNAPSHOT.jar:na]
at com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:374) [cassandra-driver-core-2.1.7-SNAPSHOT.jar:na]
at com.datastax.driver.core.Cluster$Manager$8.run(Cluster.java:2085) [cassandra-driver-core-2.1.7-SNAPSHOT.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_31]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_31]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_31]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_31]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31]

Sharding causes keyspace to disappear

When executing the same Scylla instance with 1 processor keyspace+table exist.
When executing it with 16 processors the keyspace is gone. If I'll run the same Scylla
again on the same data with 1 processor it will get back to life again.

I used the cpp driver example/perf again.
The schema is the following:
CREATE KEYSPACE examples
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};

USE examples;

CREATE TABLE songs (
id uuid PRIMARY KEY,
title text,
album text,
artist text,
tags set,
data blob
);

cmdline: build/release/scylla --network-stack posix --collectd 0 --collectd-address 1.1.1.2:25826 --datadir /data --commitlog-directory /data -m 32G --default-log-level debug --options-file config/scylla.yaml --smp 16

Configfile (standard): uses 1.1.1.1 adress and /data directory.

system.local cluster_name is not according to origin defaults

running cassandra-stress twice toward a node with user profile throws an exception

ERROR 10:17:04 Error creating pool to localhost/127.0.0.1:9042
com.datastax.driver.core.ClusterNameMismatchException: [localhost/127.0.0.1:9042] Host localhost/127.0.0.1:9042 reports cluster name 'urchin' that doesn't match our cluster name 'Test Cluster'. This host will be ignored.
    at com.datastax.driver.core.Connection.checkClusterName(Connection.java:246) ~[cassandra-driver-core-2.1.2.jar:na]
    at com.datastax.driver.core.Connection.initializeTransport(Connection.java:162) ~[cassandra-driver-core-2.1.2.jar:na]
    at com.datastax.driver.core.Connection.<init>(Connection.java:113) ~[cassandra-driver-core-2.1.2.jar:na]
    at com.datastax.driver.core.PooledConnection.<init>(PooledConnection.java:32) ~[cassandra-driver-core-2.1.2.jar:na]
    at com.datastax.driver.core.Connection$Factory.open(Connection.java:521) ~[cassandra-driver-core-2.1.2.jar:na]
    at com.datastax.driver.core.SingleConnectionPool.<init>(SingleConnectionPool.java:76) ~[cassandra-driver-core-2.1.2.jar:na]
    at com.datastax.driver.core.HostConnectionPool.newInstance(HostConnectionPool.java:35) ~[cassandra-driver-core-2.1.2.jar:na]
    at com.datastax.driver.core.SessionManager.replacePool(SessionManager.java:239) ~[cassandra-driver-core-2.1.2.jar:na]
    at com.datastax.driver.core.SessionManager.access$400(SessionManager.java:39) ~[cassandra-driver-core-2.1.2.jar:na]
    at com.datastax.driver.core.SessionManager$3.call(SessionManager.java:272) [cassandra-driver-core-2.1.2.jar:na]
    at com.datastax.driver.core.SessionManager$3.call(SessionManager.java:264) [cassandra-driver-core-2.1.2.jar:na]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_31]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_31]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_31]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_31]

we do not set the cluster_name according to Origin defaults

checking the java_driver code


    private ListenableFuture<Void> checkClusterName(ProtocolVersion protocolVersion, final Executor executor) {
        final String expected = factory.manager.metadata.clusterName;

        // At initialization, the cluster is not known yet
        if (expected == null)
            return MoreFutures.VOID_SUCCESS;

        DefaultResultSetFuture clusterNameFuture = new DefaultResultSetFuture(null, protocolVersion, new Requests.Query("select cluster_name from system.local"));
        try {
            write(clusterNameFuture);
            return Futures.transform(clusterNameFuture,
                new AsyncFunction<ResultSet, Void>() {
                    @Override
                    public ListenableFuture<Void> apply(ResultSet rs) throws Exception {
                        Row row = rs.one();
                        String actual = row.getString("cluster_name");
                        if (!expected.equals(actual))
                            throw new ClusterNameMismatchException(address, actual, expected);
                        return MoreFutures.VOID_SUCCESS;
                    }
                }, executor);
        } catch (Exception e) {
            return Futures.immediateFailedFuture(e);
        }
    }

Missing initialization of system.schema_columnfamilies.key_validator field

JavaDriver requires the field to exist

Applying this hack - allows the driver to boot.

shlomi@localhost~/urchin (master)$ git diff
diff --git a/db/legacy_schema_tables.cc b/db/legacy_schema_tables.cc
index 463210c..b0d90a2 100644
--- a/db/legacy_schema_tables.cc
+++ b/db/legacy_schema_tables.cc
@@ -1024,6 +1024,7 @@ std::vector<const char*> ALL { KEYSPACES, COLUMNFAMILIES, COLUMNS, TRIGGERS, USE
adder.add("compaction_strategy_class", table.compactionStrategyClass.getName());
adder.add("compaction_strategy_options", json(table.compactionStrategyOptions));
#endif

  •    m.set_clustered_cell(ckey, "key_validator", sstring("org.apache.cassandra.db.marshal.UTF8Type"), timestamp);
     const auto& compression_options = table->get_compressor_params();
     m.set_clustered_cell(ckey, "compression_parameters", json::to_json(compression_options.get_options()), timestamp);
     m.set_clustered_cell(ckey, "default_time_to_live", table->default_time_to_live().count(), timestamp)
    

Missing initialization of system.schema_columnfamilies fields

CREATE KEYSPACE keyspace3 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

CREATE TABLE keyspace3.standard1 ( key blob PRIMARY KEY, C0 blob, C1 blob, C2 blob, C3 blob, C4 blob);

cqlsh> select * from system.schema_columnfamilies where keyspace_name='keyspace3' and columnfamily_name='standard1';

keyspace_name | columnfamily_name | bloom_filter_fp_chance | caching | cf_id | comment | compaction_strategy_class | compaction_strategy_options | comparator | compression_parameters | default_time_to_live | default_validator | dropped_columns | gc_grace_seconds | is_dense | key_validator | local_read_repair_chance | max_compaction_threshold | max_index_interval | memtable_flush_period_in_ms | min_compaction_threshold | min_index_interval | read_repair_chance | speculative_retry | subcomparator | type
---------------+-----------------------+------------------------+---------+--------------------------------------+---------+---------------------------+-----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+----------------------+-------------------------------------------+-----------------+------------------+----------+------------------------------------------+--------------------------+--------------------------+--------------------+-----------------------------+--------------------------+--------------------+--------------------+-------------------+---------------+------
keyspace3 | standard1 | 0.01 | null | 7f0455a0-2093-11e5-bae1-000000000000 | | null | null | org.apache.cassandra.db.marshal.UTF8Type | {"sstable_compression":"org.apache.cassandra.io.compress.LZ4Compressor"}\n | 0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null

On Origin

cqlsh> select * from system.schema_columnfamilies where keyspace_name='keyspace3' and columnfamily_name='standard1';

keyspace_name | columnfamily_name | bloom_filter_fp_chance | caching | cf_id | comment | compaction_strategy_class | compaction_strategy_options | comparator | compression_parameters | default_time_to_live | default_validator | dropped_columns | gc_grace_seconds | is_dense | key_validator | local_read_repair_chance | max_compaction_threshold | max_index_interval | memtable_flush_period_in_ms | min_compaction_threshold | min_index_interval | read_repair_chance | speculative_retry | subcomparator | type
---------------+-------------------+------------------------+---------------------------------------------+--------------------------------------+---------+-----------------------------------------------------------------+-----------------------------+-----------------------------------------------------------------------------------------+--------------------------------------------------------------------------+----------------------+-------------------------------------------+-----------------+------------------+----------+-------------------------------------------+--------------------------+--------------------------+--------------------+-----------------------------+--------------------------+--------------------+--------------------+-------------------+---------------+----------
keyspace3 | standard1 | 0.01 | {"keys":"ALL", "rows_per_partition":"NONE"} | 47af5860-20a3-11e5-a733-a30be8570b1e | | org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy | {} | org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type) | {"sstable_compression":"org.apache.cassandra.io.compress.LZ4Compressor"} | 0 | org.apache.cassandra.db.marshal.BytesType | null | 864000 | False | org.apache.cassandra.db.marshal.BytesType | 0.1 | 32 | 2048 | 0 | 4 | 128 | 0 | 99.0PERCENTILE | null | Standard

(1 rows)

SIGSEGV when running multilple cassandra-stress with user profile and smp

I get the following when running multiple cassandra-stress with user profile (which means that we are doing create_table viq cql and not via thrift

on monster

sudo rm -Rf /data2/shlomi/tmp/* ; sudo gdb --args build/release/seastar --datadir /data2/shlomi/tmp --commitlog-directory /data2/shlomi/tmp --smp 6 --network-stack native --dpdk-pmd --dhcp 0 --host-ipv4-addr 192.168.10.101 --netmask-ipv4-addr 255.255.255.0 --seed-provider-parameters seeds=127.0.0.1

on godzilla
sleep 1
taskset -c 1,17 ~/cassandra/tools/bin/cassandra-stress user duration=2m profile=./urchin-qa-internal/simple_test_no_compression.yaml 'ops(insert=1)' -mode cql3 native -rate threads=16 -node 192.168.10.101 &
taskset -c 2,18 ~/cassandra/tools/bin/cassandra-stress user duration=2m profile=./urchin-qa-internal/simple_test_no_compression.yaml 'ops(insert=1)' -mode cql3 native -rate threads=16 -node 192.168.10.101 &
taskset -c 3,19 ~/cassandra/tools/bin/cassandra-stress user duration=2m profile=./urchin-qa-internal/simple_test_no_compression.yaml 'ops(insert=1)' -mode cql3 native -rate threads=16 -node 192.168.10.101 &
taskset -c 4,20 ~/cassandra/tools/bin/cassandra-stress user duration=2m profile=./urchin-qa-internal/simple_test_no_compression.yaml 'ops(insert=1)' -mode cql3 native -rate threads=16 -node 192.168.10.101 &
taskset -c 5,21 ~/cassandra/tools/bin/cassandra-stress user duration=2m profile=./urchin-qa-internal/simple_test_no_compression.yaml 'ops(insert=1)' -mode cql3 native -rate threads=16 -node 192.168.10.101 &
taskset -c 6,22 ~/cassandra/tools/bin/cassandra-stress user duration=2m profile=./urchin-qa-internal/simple_test_no_compression.yaml 'ops(insert=1)' -mode cql3 native -rate threads=16 -node 192.168.10.101 &
taskset -c 7,23 ~/cassandra/tools/bin/cassandra-stress user duration=2m profile=./urchin-qa-internal/simple_test_no_compression.yaml 'ops(insert=1)' -mode cql3 native -rate threads=16 -node 192.168.10.101 &
taskset -c 8,24 ~/cassandra/tools/bin/cassandra-stress user duration=2m profile=./urchin-qa-internal/simple_test_no_compression.yaml 'ops(insert=1)' -mode cql3 native -rate threads=16 -node 192.168.10.101 &
taskset -c 9,25 ~/cassandra/tools/bin/cassandra-stress user duration=2m profile=./urchin-qa-internal/simple_test_no_compression.yaml 'ops(insert=1)' -mode cql3 native -rate threads=16 -node 192.168.10.101 &
taskset -c 10,26 ~/cassandra/tools/bin/cassandra-stress user duration=2m profile=./urchin-qa-internal/simple_test_no_compression.yaml 'ops(insert=1)' -mode cql3 native -rate threads=16 -node 192.168.10.101 &
taskset -c 11,27 ~/cassandra/tools/bin/cassandra-stress user duration=2m profile=./urchin-qa-internal/simple_test_no_compression.yaml 'ops(insert=1)' -mode cql3 native -rate threads=16 -node 192.168.10.101 &
taskset -c 12,28 ~/cassandra/tools/bin/cassandra-stress user duration=2m profile=./urchin-qa-internal/simple_test_no_compression.yaml 'ops(insert=1)' -mode cql3 native -rate threads=16 -node 192.168.10.101 &
taskset -c 13,29 ~/cassandra/tools/bin/cassandra-stress user duration=2m profile=./urchin-qa-internal/simple_test_no_compression.yaml 'ops(insert=1)' -mode cql3 native -rate threads=16 -node 192.168.10.101 &

before the SIGSEGV there are some
WARNING: exceptional future ignored of type 'query::no_such_column': keyspace_name
WARNING: exceptional future ignored of type 'query::no_such_column': keyspace_name
WARNING: exceptional future ignored of type 'query::no_such_column': keyspace_name

which I'll try to catch and open a new bug for them

[Switching to Thread 0x7ffde39fb700 (LWP 42787)]
find (__k=..., this=0x6040004b16c8) at /usr/include/c++/4.9.2/bits/hashtable.h:1321
1321          __node_type* __p = _M_find_node(__n, __k, __code);
Missing separate debuginfos, use: debuginfo-install boost-date-time-1.55.0-8.fc21.x86_64 boost-filesystem-1.55.0-8.fc21.x86_64 boost-program-options-1.55.0-8.fc21.x86_64 boost-system-1.55.0-8.fc21.x86_64 boost-test-1.55.0-8.fc21.x86_64 boost-thread-1.55.0-8.fc21.x86_64 cryptopp-5.6.2-5.fc21.x86_64 hwloc-libs-1.10.0-1.fc21.x86_64 jsoncpp-0.6.0-0.14.rc2.fc21.x86_64 keyutils-libs-1.5.9-4.fc21.x86_64 krb5-libs-1.12.2-15.fc21.x86_64 libaio-0.3.110-4.fc21.x86_64 libcom_err-1.42.12-4.fc21.x86_64 libgcc-4.9.2-6.fc21.x86_64 libpciaccess-0.13.3-0.3.fc21.x86_64 libselinux-2.3-9.fc21.x86_64 libstdc++-4.9.2-6.fc21.x86_64 libtool-ltdl-2.4.2-31.fc21.x86_64 libxml2-2.9.1-7.fc21.x86_64 lz4-r128-2.fc21.x86_64 numactl-libs-2.0.9-4.fc21.x86_64 openssl-libs-1.0.1k-6.fc21.x86_64 pcre-8.35-11.fc21.x86_64 snappy-1.1.1-3.fc21.x86_64 thrift-0.9.1-13.fc21.1.x86_64 xz-libs-5.1.2-14alpha.fc21.x86_64 yaml-cpp-0.5.1-4.fc21.x86_64 zlib-1.2.8-7.fc21.x86_64
(gdb) where
#0  find (__k=..., this=0x6040004b16c8) at /usr/include/c++/4.9.2/bits/hashtable.h:1321
#1  find (__x=..., this=0x6040004b16c8) at /usr/include/c++/4.9.2/bits/unordered_map.h:578
#2  query::result_set_row::get_data_value (this=0x6040004b16c0, column_name=...) at ./query-result-set.hh:46
#3  0x0000000000556379 in get<basic_sstring<char, unsigned int, 15u> > (column_name=..., this=<optimized out>) at ./query-result-set.hh:56
#4  query::result_set_row::get_nonnull<basic_sstring<char, unsigned int, 15u> > (this=<optimized out>, column_name=...) at ./query-result-set.hh:64
#5  0x00000000007ddd5d in db::legacy_schema_tables::create_table_from_table_row_and_column_rows (builder=..., table_row=..., serialized_column_definitions=...) at db/legacy_schema_tables.cc:1240
#6  0x00000000007e24ef in _ZZN6futureIISt4pairIK13basic_sstringIcjLj15EE13lw_shared_ptrIN5query10result_setEEEEE4thenIS4_IK6schemaEZN2db20legacy_schema_tables27create_table_from_table_rowERN7service13storage_proxyERKNS5_14result_set_rowEEUlT_E_ZNS9_4thenISN_S_IISD_EEEET0_OT_EUlO12future_stateIIS8_EEE_EEN8futurizeISR_E4typeEOSQ_OT1_ENUlOSM_E_clISU_EEDaSS_ () at db/legacy_schema_tables.cc:1197
#7  0x00000000009528a2 in reactor::run_tasks (this=this@entry=0x604000119000, tasks=..., quota=<optimized out>) at core/reactor.cc:1025
#8  0x0000000000974db4 in reactor::run (this=0x604000119000) at core/reactor.cc:1122
#9  0x000000000097b7a6 in smp::<lambda()>::operator()(void) const (__closure=0x60000005fe00) at core/reactor.cc:1772
#10 0x00000000009539c9 in dpdk_thread_adaptor (f=<optimized out>) at core/reactor.cc:1637
#11 0x0000000000be7e13 in eal_thread_loop ()
#12 0x00007ffff425f52a in start_thread () from /lib64/libpthread.so.0
#13 0x00007ffff3f9b22d in clone () from /lib64/libc.so.6

Provide support/workaround for thrif.cql3_prepare_stamenets

cassandra-stress user profile supports multiple modes (thrift,cql3)

As a simple workaround, I had to change cassandra-stress as it creates thrift and cql3 connection and then depending on the transport decides to use or the other - our thrift support is lacking

shlomi@localhost~/cassandra/tools/stress (pekka_unitests)$ git diff src/org/apache/cassandra/stress/StressProfile.java
diff --git a/tools/stress/src/org/apache/cassandra/stress/StressProfile.java b/tools/stress/src/org/apache/cassandra/stress/StressProfile.java
index e223fbd..f744c21 100644
--- a/tools/stress/src/org/apache/cassandra/stress/StressProfile.java
+++ b/tools/stress/src/org/apache/cassandra/stress/StressProfile.java
@@ -365,6 +365,7 @@ public class StressProfile implements Serializable

                 JavaDriverClient client = settings.getJavaDriverClient();
                 String query = sb.toString();

+/*
try
{
thriftInsertId = settings.getThriftClient().prepare_cql3_query(query, Compression.NONE);
@@ -373,6 +374,8 @@ public class StressProfile implements Serializable
{
throw new RuntimeException(e);
}
+*/

  •                thriftInsertId = 0;
                 insertStatement = client.prepare(query);
             }
         }
    

We need to provide a better workaround as we cannot change cassandra-stress

Client error when reading cassandra stress

Happened on AWS with thrift

Running READ with 500 threads for 5000000 iteration
Failed to connect over JMX; not collecting these stats
type, total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb
java.io.IOException: Operation x0 on key(s) [30503337373039503231]: Data returned was not validated

at org.apache.cassandra.stress.Operation.error(Operation.java:216)
at org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:188)
at org.apache.cassandra.stress.operations.predefined.ThriftReader.run(ThriftReader.java:46)
at org.apache.cassandra.stress.StressAction$Consumer.run(StressAction.java:317)

--api-dir does not work

adding a printout in file_interaction_handler::read printing the fine_name produces the following

it seesm the api-dir path is not concatenated as a prefix for all lookups

/home/shlomi/urchin/swagger-ui/dist//index.html
WARNING: closing file in reactor thread during exception recovery
/home/shlomi/urchin/swagger-ui/dist/css/typography.css
/home/shlomi/urchin/swagger-ui/dist/css/reset.css
/home/shlomi/urchin/swagger-ui/dist/css/screen.css
/home/shlomi/urchin/swagger-ui/dist/css/print.css
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
/home/shlomi/urchin/swagger-ui/dist/lib/jquery-1.8.0.min.js
/home/shlomi/urchin/swagger-ui/dist/lib/jquery.slideto.min.js
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
/home/shlomi/urchin/swagger-ui/dist/lib/jquery.wiggle.min.js
/home/shlomi/urchin/swagger-ui/dist/lib/jquery.ba-bbq.min.js
WARNING: closing file in reactor thread during exception recovery
/home/shlomi/urchin/swagger-ui/dist/lib/handlebars-2.0.0.js
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
/home/shlomi/urchin/swagger-ui/dist/lib/underscore-min.js
/home/shlomi/urchin/swagger-ui/dist/lib/backbone-min.js
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
/home/shlomi/urchin/swagger-ui/dist/swagger-ui.js
/home/shlomi/urchin/swagger-ui/dist/lib/highlight.7.3.pack.js
WARNING: closing file in reactor thread during exception recovery
/home/shlomi/urchin/swagger-ui/dist/lib/marked.js
/home/shlomi/urchin/swagger-ui/dist/lib/swagger-oauth.js
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
/home/shlomi/urchin/swagger-ui/dist/images/logo_small.png
/home/shlomi/urchin/swagger-ui/dist/fonts/droid-sans-v6-latin-700.woff2
WARNING: closing file in reactor thread during exception recovery
WARNING: closing file in reactor thread during exception recovery
/home/shlomi/urchin/swagger-ui/dist/fonts/droid-sans-v6-latin-regular.woff2
WARNING: closing file in reactor thread during exception recovery
api/api-doc/storage_service.json
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
api/api-doc/commitlog.json
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
api/api-doc/gossiper.json
api/api-doc/column_family.json
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
api/api-doc/failure_detector.json
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
api/api-doc/messaging_service.json
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
api/api-doc/storage_proxy.json
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
api/api-doc/cache_service.json
api/api-doc/collectd.json
api/api-doc/endpoint_snitch_info.json
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
api/api-doc/storage_service.json
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
api/api-doc/compaction_manager.json
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
api/api-doc/commitlog.json
api/api-doc/gossiper.json
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
api/api-doc/column_family.json
api/api-doc/hinted_handoff.json
api/api-doc/failure_detector.json
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
api/api-doc/messaging_service.json
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
WARNING: exceptional future ignored of type 'std::system_error': Error system:2 (No such file or directory)
/home/shlomi/urchin/swagger-ui/dist/images/favicon-16x16.png
WARNING: closing file in reactor thread during exception recovery

crash with recent rpc patches

With these patches committed:

2ec7535 rpc: allow handler to receive non default constructable types
dfa0f1c rcp: put client into an error state when connection is broken
0a47b5f rpc: return exception as a future instead of throwing in client send path
2e54725 rpc: allow handler to return a type without default constructor

[asias@hjpc urchin]$ ./message --listen-address 127.0.0.100

[asias@hjpc urchin]$ gdb --args ./message --listen-address 127.0.0.200 --server 127.0.0.100

(gdb) r
Starting program: /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message --listen-address 127.0.0.200 --server 127.0.0.100
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff67ff700 (LWP 5728)]
[New Thread 0x7ffff61ff700 (LWP 5729)]
[New Thread 0x7ffff5bff700 (LWP 5730)]
Messaging server listening on port 7000 ...
=============TEST START===========
Sending to server ....
=== test_gossip_digest ===

Program received signal SIGSEGV, Segmentation fault.
std::_Rb_tree<gms::inet_address, std::pair<gms::inet_address const, gms::endpoint_state>, std::_Select1st<std::pair<gms::inet_address const, gms::endpoint_state> >, std::less<gms::inet_address>, std::allocator<std::pair<gms::inet_address const, gms::endpoint_state> > >::_M_erase (this=this@entry=0x600000063438, __x=0x1) at /usr/include/c++/4.9.2/bits/stl_tree.h:1245
1245              _M_erase(_S_right(__x));

#0  std::_Rb_tree<gms::inet_address, std::pair<gms::inet_address const, gms::endpoint_state>, std::_Select1st<std::pair<gms::inet_address const, gms::endpoint_state> >, std::less<gms::inet_address>, std::allocator<std::pair<gms::inet_address const, gms::endpoint_state> > >::_M_erase (this=this@entry=0x600000063438, __x=0x1) at /usr/include/c++/4.9.2/bits/stl_tree.h:1245
#1  0x00000000005e3dcc in std::_Rb_tree<gms::inet_address, std::pair<gms::inet_address const, gms::endpoint_state>, std::_Select1st<std::pair<gms::inet_address const, gms::endpoint_state> >, std::less<gms::inet_address>, std::allocator<std::pair<gms::inet_address const, gms::endpoint_state> > >::_M_erase (this=this@entry=0x600000063438, __x=0x600000063700) at /usr/include/c++/4.9.2/bits/stl_tree.h:1245
#2  0x00000000005e9355 in clear (this=0x600000063438) at /usr/include/c++/4.9.2/bits/stl_tree.h:908
#3  _M_move_assign (__x=..., this=0x600000063438) at /usr/include/c++/4.9.2/bits/stl_tree.h:1081
#4  std::map<gms::inet_address, gms::endpoint_state, std::less<gms::inet_address>, std::allocator<std::pair<gms::inet_address const, gms::endpoint_state> > >::operator=(std::map<gms::inet_address, gms::endpoint_state, std::less<gms::inet_address>, std::allocator<std::pair<gms::inet_address const, gms::endpoint_state> > >&&) (this=this@entry=0x600000063438, 
    __x=__x@entry=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x33deec2>) at /usr/include/c++/4.9.2/bits/stl_map.h:311
#5  0x00000000006017ea in operator= (this=0x600000063420) at ./gms/gossip_digest_ack.hh:36
#6  operator() (buf=<error reading variable: access outside bounds of object referenced via synthetic pointer>, __closure=0x7fffffffd590) at ./message/messaging_service.hh:164
#7  apply_helper<auto net::serializer::operator()<gms::gossip_digest_ack>(input_stream<char>&, gms::gossip_digest_ack&, std::enable_if<(!std::is_integral<gms::gossip_digest_ack>::value)&&(!std::is_enum<gms::gossip_digest_ack>::value), void*>::type)::{lambda(temporary_buffer<char>)#1}::operator()(temporary_buffer<char>)::{lambda(temporary_buffer<char>)#1}, std::tuple<temporary_buffer<char> >&&, std::integer_sequence<unsigned long, 0ul> >::apply({lambda(temporary_buffer<char>)#1}&&, std::tuple) (func=func@entry=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x34b6c5b>, 
    args=args@entry=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x34b72e1>) at ./core/apply.hh:34
#8  0x0000000000601b46 in apply<net::serializer::operator()(input_stream<char>&, T&, std::enable_if_t<((! std::is_integral<_Tp>::value) && (! std::is_enum<_Tp>::value)), void*>)::<lambda(temporary_buffer<char>)> mutable [with T = gms::gossip_digest_ack]::<lambda(temporary_buffer<char>)>, temporary_buffer<char> > (args=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x339e5a4>, 
    func=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x339e590>) at ./core/apply.hh:42
#9  apply<net::serializer::operator()(input_stream<char>&, T&, std::enable_if_t<((! std::is_integral<_Tp>::value) && (! std::is_enum<_Tp>::value)), void*>)::<lambda(temporary_buffer<char>)> mutable [with T = gms::gossip_digest_ack]::<lambda(temporary_buffer<char>)>, temporary_buffer<char> > (args=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x339e5ec>, 
    func=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x339e5d9>) at ./core/future.hh:685
#10 future<temporary_buffer<char> >::then<future<>, auto net::serializer::operator()<gms::gossip_digest_ack>(input_stream<char>&, gms::gossip_digest_ack&, std::enable_if<(!std::is_integral<gms::gossip_digest_ack>::value)&&(!std::is_enum<gms::gossip_digest_ack>::value), void*>::type)::{lambda(temporary_buffer<char>)#1}::operator()(temporary_buffer<char>)::{lambda(temporary_buffer<char>)#1}, futurize<std::result_of<{lambda(temporary_buffer<char>)#1} (temporary_buffer<char>&&)>::type>::type future<temporary_buffer<char> >::then<{lambda(temporary_buffer<char>)#1}>(futurize&&)::{lambda(future_state<temporary_buffer<char> >&&)#1}>(auto net::serializer::operator()<gms::gossip_digest_ack>(input_stream<char>&, gms::gossip_digest_ack&, std::enable_if<(!std::is_integral<gms::gossip_digest_ack>::value)&&(!std::is_enum<gms::gossip_digest_ack>::value), void*>::type)::{lambda(temporary_buffer<char>)#1}::operator()(temporary_buffer<char>)::{lambda(temporary_buffer<char>)#1}&&, futurize<std::result_of<{lambda(temporary_buffer<char>)#1} (temporary_buffer<char>&&)>::type>::type future<temporary_buffer<char> >::then<{lambda(temporary_buffer<char>)#1}>(futurize&&)::{lambda(future_state<temporary_buffer<char> >&&)#1}&&) (this=this@entry=0x7fffffffd5e0, 
    func=func@entry=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x34b7d97>, 
    param=param@entry=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x34b7dad>) at ./core/future.hh:468
#11 0x000000000060203a in then<net::serializer::operator()(input_stream<char>&, T&, std::enable_if_t<((! std::is_integral<_Tp>::value) && (! std::is_enum<_Tp>::value)), void*>)::<lambda(temporary_buffer<char>)> mutable [with T = gms::gossip_digest_ack]::<lambda(temporary_buffer<char>)> > (func=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x339e64d>, this=0x7fffffffd5e0)
    at ./core/future.hh:530
#12 operator() (buf=<error reading variable: access outside bounds of object referenced via synthetic pointer>, __closure=0x7fffffffd670) at ./message/messaging_service.hh:166
#13 apply (args=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x34b899c>, 
    func=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x34b898a>) at ./core/apply.hh:34
#14 apply<net::serializer::operator()(input_stream<char>&, T&, std::enable_if_t<((! std::is_integral<_Tp>::value) && (! std::is_enum<_Tp>::value)), void*>) [with T = gms::gossip_digest_ack; std::enable_if_t<((! std::is_integral<_Tp>::value) && (! std::is_enum<_Tp>::value)), void*> = void*]::<lambda(temporary_buffer<char>)>, temporary_buffer<char> > (
    args=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x339e723>, 
    func=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x339e70f>) at ./core/apply.hh:42
#15 apply<net::serializer::operator()(input_stream<char>&, T&, std::enable_if_t<((! std::is_integral<_Tp>::value) && (! std::is_enum<_Tp>::value)), void*>) [with T = gms::gossip_digest_ack; std::enable_if_t<((! std::is_integral<_Tp>::value) && (! std::is_enum<_Tp>::value)), void*> = void*]::<lambda(temporary_buffer<char>)>, temporary_buffer<char> > (
    args=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x339e76b>, 
    func=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x339e758>) at ./core/future.hh:685
#16 future<temporary_buffer<char> >::then<future<>, auto net::serializer::operator()<gms::gossip_digest_ack>(input_stream<char>&, gms::gossip_digest_ack&, std::enable_if<(!std::is_integral<gms::gossip_digest_ack>::value)&&(!std::is_enum<gms::gossip_digest_ack>::value), void*>::type)::{lambda(temporary_buffer<char>)#1}, futurize<std::result_of<{lambda(temporary_buffer<char>)#1} (temporary_buffer<char>&&)>::type>::type future<temporary_buffer<char> >::then<{lambda(temporary_buffer<char>)#1}>(std::result_of&&)::{lambda(future_state<temporary_buffer<char> >&&)#1}>(auto net::serializer::operator()<gms::gossip_digest_ack>(input_stream<char>&, gms::gossip_digest_ack&, std::enable_if<(!std::is_integral<gms::gossip_digest_ack>::value)&&(!std::is_enum<gms::gossip_digest_ack>::value), void*>::type)::{lambda(temporary_buffer<char>)#1}&&, futurize<std::result_of<{lambda(temporary_buffer<char>)#1} (temporary_buffer<char>&&)>::type>::type future<temporary_buffer<char> >::then<{lambda(temporary_buffer<char>)#1}>(std::result_of&&)::{lambda(future_state<temporary_buffer<char> >&&)#1}&&) (this=0x7fffffffd690, 
    func=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x34b945c>, param=<optimized out>) at ./core/future.hh:468
#17 0x000000000060236e in _ZZN3rpc10unmarshallILm0EN3net10serializerEZNS_10unmarshallIS2_IN3gms17gossip_digest_ackEEEE6futureIIEERT_R12input_streamIcEOSt5tupleIIDpRT0_EEEUlmE_IS5_EEENSt9enable_ifIXneT_stDpT2_ES7_E4typeERT0_SC_OSD_IIDpRSL_EEOT1_ENUlvE_clEv () at ./core/future.hh:530
#18 0x000000000060276c in auto rpc::wait_for_reply<future<gms::gossip_digest_ack>, net::serializer, net::messaging_verb>(rpc::protocol<net::serializer, net::messaging_verb>::client&, long, future<>, std::enable_if<!std::is_same<future<gms::gossip_digest_ack>, rpc::no_wait_type>::value, void*>::type)::{lambda(rpc::rcv_reply<net::serializer, net::messaging_verb, future<gms::gossip_digest_ack> >&, rpc::protocol<net::serializer, net::messaging_verb>::client&, long)#2}::operator()(rpc::rcv_reply<net::serializer, net::messaging_verb, future<gms::gossip_digest_ack> >&, rpc::protocol<net::serializer, net::messaging_verb>::client&, long) () at ./core/apply.hh:34
#19 0x000000000060297a in rpc::protocol<net::serializer, net::messaging_verb>::client::reply_handler<rpc::rcv_reply<net::serializer, net::messaging_verb, future<gms::gossip_digest_ack> >, auto rpc::wait_for_reply<future<gms::gossip_digest_ack>, net::serializer, net::messaging_verb>(rpc::protocol<net::serializer, net::messaging_verb>::client&, long, future<>, std::enable_if<!std::is_same<future<gms::gossip_digest_ack>, rpc::no_wait_type>::value, void*>::type)::{lambda(rpc::rcv_reply<net::serializer, net::messaging_verb, future<gms::gossip_digest_ack> >&, rpc::protocol<net::serializer, net::messaging_verb>::client&, long)#2}>::operator()(rpc::protocol<net::serializer, net::messaging_verb>::client&, long) (this=<optimized out>, client=..., msg_id=<optimized out>) at ./rpc/rpc.hh:112
#20 0x00000000005eebbd in rpc::protocol<net::serializer, net::messaging_verb>::client::client(rpc::protocol<net::serializer, net::messaging_verb>&, ipv4_addr)::{lambda(connected_socket)#1}::operator()(connected_socket) const::{lambda()#2}::operator()()::{lambda()#1}::operator()() const () at ./rpc/rpc_impl.hh:579
#21 0x00000000005eef4e in apply (args=<optimized out>, func=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x338e0a2>) at ./core/apply.hh:34
#22 apply<rpc::protocol<Serializer, MsgType>::client::client(rpc::protocol<Serializer, MsgType>&, ipv4_addr)::<lambda(connected_socket)>::<lambda()> mutable [with Serializer = net::serializer; MsgType = net::messaging_verb]::<lambda()> > (args=<optimized out>, func=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x338e0e3>) at ./core/apply.hh:42
#23 apply<rpc::protocol<Serializer, MsgType>::client::client(rpc::protocol<Serializer, MsgType>&, ipv4_addr)::<lambda(connected_socket)>::<lambda()> mutable [with Serializer = net::serializer; MsgType = net::messaging_verb]::<lambda()> > (args=<optimized out>, func=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x30d9590, DIE 0x338e123>) at ./core/future.hh:685
#24 _ZZN6futureIIEE4thenIS0_ZZZN3rpc8protocolIN3net10serializerENS4_14messaging_verbEE6clientC4ERS7_9ipv4_addrENKUl16connected_socketE_clESB_ENUlvE0_clEvEUlvE_ZNS0_4thenISE_EEN8futurizeINSt9result_ofIFT_vEE4typeEE4typeEOSI_EUlO12future_stateIIEEE_EENSG_ISI_E4typeEOT0_OT1_ENUlOT_E_clISQ_EEDaSO_ (__closure=0x600000067850, state=<optimized out>) at ./core/future.hh:479
#25 0x000000000061b1a2 in reactor::run_tasks (this=this@entry=0x600000249000, tasks=..., quota=<optimized out>) at core/reactor.cc:823
#26 0x000000000063241a in reactor::run (this=0x600000249000) at core/reactor.cc:919
#27 0x0000000000651070 in app_template::run(int, char**, std::function<void ()>&&) (this=this@entry=0x7fffffffdef0, ac=ac@entry=5, av=av@entry=0x7fffffffe138, 
    func=func@entry=<unknown type in /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message, CU 0x3d51565, DIE 0x3df0e68>) at core/app-template.cc:104
#28 0x0000000000416f70 in main (ac=5, av=0x7fffffffe138) at tests/urchin/message.cc:217
Starting program: /home/asias/src/cloudius-systems/urchin/build/release/tests/urchin/message --listen-address 127.0.0.200 --server 127.0.0.100
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff67ff700 (LWP 5728)]
[New Thread 0x7ffff61ff700 (LWP 5729)]
[New Thread 0x7ffff5bff700 (LWP 5730)]

Program received signal SIGSEGV, Segmentation fault.
std::_Rb_tree<gms::inet_address, std::pair<gms::inet_address const, gms::endpoint_state>, std::_Select1st<std::pair<gms::inet_address const, gms::endpoint_state> >, std::less<gms::inet_address>, std::allocator<std::pair<gms::inet_address const, gms::endpoint_state> > >::_M_erase (this=this@entry=0x600000063438, __x=0x1) at /usr/include/c++/4.9.2/bits/stl_tree.h:1245
1245          _M_erase(_S_right(__x));

With Native stack - while running a cassandra-stress load trying to add additional clients sometimes you get an exception

apache-cassandra-2.1.3/tools/bin/cassandra-stress write duration=200m -mode cql3 native -node 192.168.10.101 -rate threads=16 &

It seems its more simple to replicate this (or a very similar issue according to the stack trace) by

  1. run urchin via gdb
    (sudo rm -Rf /data2/jenkins/cassandra/*; sudo gdb --args build/release/seastar --network-stack native --dpdk-pmd --dhcp 0 --host-ipv4-addr 192.168.10.101 --netmask-ipv4-addr 255.255.255.0 --gw-ipv4-addr 192.168.10.185 --datadir /data2/jenkins/cassandra --commitlog-directory /data2/jenkins/cassandra --collectd 1 --collectd-address 10.0.0.163:25826 --thrift-port 9160 --rpc-address 192.168.10.101 --seed-provider-parameters seeds=127.0.0.1 --smp 4 --collectd-hostname monster-shlomi)
  2. run cassandra-stress
  3. pkill the cassandra-stress

usually you will get the following

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffde49fd700 (LWP 37272)]
0x00007ffff3ecf8d7 in raise () from /lib64/libc.so.6
(gdb) where
#0 0x00007ffff3ecf8d7 in raise () from /lib64/libc.so.6
#1 0x00007ffff3ed153a in abort () from /lib64/libc.so.6
#2 0x00007ffff3ec847d in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007ffff3ec8532 in __assert_fail () from /lib64/libc.so.6
#4 0x000000000040fb2a in future_state<>::set_exception(std::__exception_ptr::exception_ptr) (ex=..., this=0x6020111261c8) at ./core/future.hh:302
#5 0x000000000044dcf4 in set_exception (ex=..., this=0x6020111261c8) at ./core/future.hh:304
#6 promise<>::set_exception(std::__exception_ptr::exception_ptr) (this=this@entry=0x602000029e60, ex=...) at ./core/future.hh:429
#7 0x00000000009c5bef in net::tcpnet::ipv4_traits::tcb::abort_reader (this=0x602000029c00) at net/tcp.hh:1537
#8 0x00000000009cda65 in abort_reader (this=) at net/tcp.hh:771
#9 close_read (this=0x60200011e828) at net/tcp.hh:1892
#10 net::tcpnet::ipv4_traits::connection::~connection (this=0x60200011e828, __in_chrg=) at net/tcp.hh:768
#11 0x00000000009cdcd4 in ~native_connected_socket_impl (this=0x60200011e820, __in_chrg=) at net/native-stack-impl.hh:70
#12 net::native_connected_socket_implnet::tcp<net::ipv4_traits >::~native_connected_socket_impl (this=0x60200011e820, __in_chrg=) at net/native-stack-impl.hh:70
#13 0x0000000000622e6b in operator() (this=, __ptr=) at /usr/include/c++/4.9.2/bits/unique_ptr.h:76
#14 ~unique_ptr (this=0x60200003bd18, __in_chrg=) at /usr/include/c++/4.9.2/bits/unique_ptr.h:236
#15 ~connected_socket (this=0x60200003bd18, __in_chrg=) at ./core/reactor.hh:204
#16 cql_server::connection::~connection (this=0x60200003bd10, __in_chrg=) at transport/server.cc:130
#17 0x0000000000622ee4 in ~shared_ptr_count_for (this=0x60200003bd00, __in_chrg=) at ./core/shared_ptr.hh:316
#18 shared_ptr_count_for<cql_server::connection>::~shared_ptr_count_for (this=0x60200003bd00, __in_chrg=) at ./core/shared_ptr.hh:316
#19 0x0000000000612016 in ~shared_ptr (this=0x602005582ea8, __in_chrg=) at ./core/shared_ptr.hh:388
#20 ~ (this=0x602005582ea0, __in_chrg=) at transport/server.cc:247
#21 ~ (this=0x602005582e80, __in_chrg=) at ./core/future.hh:624
#22 ~continuation (this=0x602005582e70, __in_chrg=) at ./core/future.hh:329
#23 continuation<future::then(Func&&, Param&&) [with Ret = void; Func = cql_server::do_accepts(int)::<lambda(connected_socket, socket_address)> mutable::<lambda(future<>)>; Param = future::then_wrapped(Func&&) [with Func = cql_server::do_accepts(int)::<lambda(connected_socket, socket_address)> mutable::<lambda(future<>)>; Result = future<>; T = {}]::<lambda(future_state<>&&)>; T = {}; futurize_t = future<>]::<lambda(auto:1&&)> >::~continuation(void) (this=0x602005582e70, __in_chrg=) at ./core/future.hh:329
#24 0x0000000000930f5b in operator() (this=, __ptr=0x602005582e70) at /usr/include/c++/4.9.2/bits/unique_ptr.h:76
#25 reset (__p=0x602005582e70, this=) at /usr/include/c++/4.9.2/bits/unique_ptr.h:344
#26 reactor::run_tasks (this=this@entry=0x602000119000, tasks=..., quota=) at core/reactor.cc:1010
#27 0x0000000000950bc4 in reactor::run (this=0x602000119000) at core/reactor.cc:1106
#28 0x00000000009575b6 in smp::<lambda()>::operator()(void) const (__closure=0x60000005f700) at core/reactor.cc:1747
#29 0x0000000000932189 in dpdk_thread_adaptor (f=) at core/reactor.cc:1621
#30 0x0000000000bc1f03 in eal_thread_loop ()
#31 0x00007ffff425f52a in start_thread () from /lib64/libpthread.so.0
#32 0x00007ffff3f9b22d in clone () from /lib64/libc.so.6

(gdb)

Original issue stack trace

seastar: ./core/future.hh:302: void future_state<>::set_exception(std::__exception_ptr::exception_ptr): Assertion `_u.st == state::future' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff3ecf8d7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install boost-date-time-1.55.0-8.fc21.x86_64 boost-filesystem-1.55.0-8.fc21.x86_64 boost-program-options-1.55.0-8.fc21.x86_64 boost-system-1.55.0-8.fc21.x86_64 boost-test-1.55.0-8.fc21.x86_64 boost-thread-1.55.0-8.fc21.x86_64 cryptopp-5.6.2-5.fc21.x86_64 hwloc-libs-1.10.0-1.fc21.x86_64 jsoncpp-0.6.0-0.14.rc2.fc21.x86_64 keyutils-libs-1.5.9-4.fc21.x86_64 krb5-libs-1.12.2-15.fc21.x86_64 libaio-0.3.110-4.fc21.x86_64 libcom_err-1.42.12-4.fc21.x86_64 libgcc-4.9.2-6.fc21.x86_64 libpciaccess-0.13.3-0.3.fc21.x86_64 libselinux-2.3-9.fc21.x86_64 libstdc++-4.9.2-6.fc21.x86_64 libtool-ltdl-2.4.2-31.fc21.x86_64 libxml2-2.9.1-7.fc21.x86_64 lz4-r128-2.fc21.x86_64 numactl-libs-2.0.9-4.fc21.x86_64 openssl-libs-1.0.1k-6.fc21.x86_64 pcre-8.35-11.fc21.x86_64 snappy-1.1.1-3.fc21.x86_64 thrift-0.9.1-13.fc21.1.x86_64 xz-libs-5.1.2-14alpha.fc21.x86_64 yaml-cpp-0.5.1-4.fc21.x86_64 zlib-1.2.8-7.fc21.x86_64
(gdb) where
#0 0x00007ffff3ecf8d7 in raise () from /lib64/libc.so.6
#1 0x00007ffff3ed153a in abort () from /lib64/libc.so.6
#2 0x00007ffff3ec847d in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007ffff3ec8532 in __assert_fail () from /lib64/libc.so.6
#4 0x000000000040fb2a in future_state<>::set_exception(std::__exception_ptr::exception_ptr) (ex=..., this=0x60000b8e2b68) at ./core/future.hh:302
#5 0x000000000044dcf4 in set_exception (ex=..., this=0x60000b8e2b68) at ./core/future.hh:304
#6 promise<>::set_exception(std::__exception_ptr::exception_ptr) (this=this@entry=0x60000c19aa60, ex=...) at ./core/future.hh:429
#7 0x00000000009c5bef in net::tcpnet::ipv4_traits::tcb::abort_reader (this=0x60000c19a800) at net/tcp.hh:1537
#8 0x00000000009cda65 in abort_reader (this=) at net/tcp.hh:771
#9 close_read (this=0x600006273df8) at net/tcp.hh:1892
#10 net::tcpnet::ipv4_traits::connection::~connection (this=0x600006273df8, __in_chrg=) at net/tcp.hh:768
#11 0x00000000009cdcd4 in ~native_connected_socket_impl (this=0x600006273df0, __in_chrg=) at net/native-stack-impl.hh:70
#12 net::native_connected_socket_implnet::tcp<net::ipv4_traits >::~native_connected_socket_impl (this=0x600006273df0, __in_chrg=) at net/native-stack-impl.hh:70
#13 0x00000000006bd7b7 in operator() (this=, __ptr=) at /usr/include/c++/4.9.2/bits/unique_ptr.h:76
#14 ~unique_ptr (this=, __in_chrg=) at /usr/include/c++/4.9.2/bits/unique_ptr.h:236
#15 ~connected_socket (this=, __in_chrg=) at ./core/reactor.hh:204
#16 ~connection (this=0x60000177fd80, __in_chrg=) at thrift/server.cc:80
#17 operator() (__closure=, f=) at thrift/server.cc:156
#18 futurize::apply<thrift_server::do_accepts(int)::<lambda(connected_socket, socket_address)> mutable::<lambda(future<>)>, future<> >(<unknown type in /home/shlomi/urchin/build/release/seastar, CU 0x3f4fa56, DIE 0x4069d02>) (func=func@entry=<unknown type in /home/shlomi/urchin/build/release/seastar, CU 0x3f4fa56, DIE 0x4069d02>) at ./core/future.hh:954
#19 0x00000000006bd8e7 in operator()<future_state<> > (state=<unknown type in /home/shlomi/urchin/build/release/seastar, CU 0x3f4fa56, DIE 0x406a56d>, __closure=0x600004b60da0) at ./core/future.hh:626
#20 continuation<future::then(Func&&, Param&&) [with Ret = void; Func = thrift_server::do_accepts(int)::<lambda(connected_socket, socket_address)> mutable::<lambda(future<>)>; Param = future::then_wrapped(Func&&) [with Func = thrift_server::do_accepts(int)::<lambda(connected_socket, socket_address)> mutable::<lambda(future<>)>; Result = future<>; T = {}]::<lambda(future_state<>&&)>; T = {}; futurize_t = future<>]::<lambda(auto:1&&)> >::run(void) (this=0x600004b60d90) at ./core/future.hh:333
#21 0x0000000000930f52 in reactor::run_tasks (this=this@entry=0x600000242000, tasks=..., quota=) at core/reactor.cc:1009
#22 0x0000000000950bc4 in reactor::run (this=0x600000242000) at core/reactor.cc:1106
#23 0x0000000000976396 in app_template::run(int, char**, std::function<void ()>&&) (this=this@entry=0x7fffffffdf30, ac=ac@entry=30, av=av@entry=0x7fffffffe178,

func=func@entry=<unknown type in /home/shlomi/urchin/build/release/seastar, CU 0x8a2ce19, DIE 0x8ac3c8e>) at core/app-template.cc:104

#24 0x00000000004174f8 in main (ac=30, av=0x7fffffffe178) at main.cc:117

query::no_such_column is thrown when running in smp with multiple cassandra-stress trying to create tables in parallel

Running a single node with smp and multiple cassandra-stress clients that use user profile (trying to create table in parallel) we get

Urchin and cassandra-stress are executed as in #25

query::no_such_column thrown on

void create_table_from_table_row_and_column_rows(schema_builder& builder, const query::result_set_row& table_row, const schema_result::mapped_type& serialized_column_definitions)
1239        {
1240            auto ks_name = table_row.get_nonnull<sstring>("keyspace_name");

Using the following patch

-Subproject commit 3b5e1551b3dbe0f6b6dc4cc5ce7f5fd951984945
+Subproject commit 3b5e1551b3dbe0f6b6dc4cc5ce7f5fd951984945-dirty
diff --git a/query-result-set.hh b/query-result-set.hh
index 9f35eca..47f21c7 100644
--- a/query-result-set.hh
+++ b/query-result-set.hh
@@ -40,12 +40,17 @@ public:
     bool has(const sstring& column_name) const {
         return _cells.count(column_name) > 0;
     }
+
+    void throw_no_such_column(const sstring& column_name) const throw (no_such_column)  {
+        throw no_such_column(column_name);
+    }
+
     // Look up a deserialized row cell value by column name.
     const data_value&
     get_data_value(const sstring& column_name) const throw (no_such_column) {
         auto it = _cells.find(column_name);
         if (it == _cells.end()) {
-            throw no_such_column(column_name);
+            throw_no_such_column(column_name);
         }
         return it->second;
     }

I am able to catch it

Breakpoint 5, query::result_set_row::throw_no_such_column (this=this@entry=0x6040004bcac0, column_name=...) at ./query-result-set.hh:44
44      void throw_no_such_column(const sstring& column_name) const throw (no_such_column)  {
Missing separate debuginfos, use: debuginfo-install boost-date-time-1.55.0-8.fc21.x86_64 boost-filesystem-1.55.0-8.fc21.x86_64 boost-program-options-1.55.0-8.fc21.x86_64 boost-system-1.55.0-8.fc21.x86_64 boost-test-1.55.0-8.fc21.x86_64 boost-thread-1.55.0-8.fc21.x86_64 cryptopp-5.6.2-5.fc21.x86_64 hwloc-libs-1.10.0-1.fc21.x86_64 jsoncpp-0.6.0-0.14.rc2.fc21.x86_64 keyutils-libs-1.5.9-4.fc21.x86_64 krb5-libs-1.12.2-15.fc21.x86_64 libaio-0.3.110-4.fc21.x86_64 libcom_err-1.42.12-4.fc21.x86_64 libgcc-4.9.2-6.fc21.x86_64 libpciaccess-0.13.3-0.3.fc21.x86_64 libselinux-2.3-9.fc21.x86_64 libstdc++-4.9.2-6.fc21.x86_64 libtool-ltdl-2.4.2-31.fc21.x86_64 libxml2-2.9.1-7.fc21.x86_64 lz4-r128-2.fc21.x86_64 numactl-libs-2.0.9-4.fc21.x86_64 openssl-libs-1.0.1k-6.fc21.x86_64 pcre-8.35-11.fc21.x86_64 snappy-1.1.1-3.fc21.x86_64 thrift-0.9.1-13.fc21.1.x86_64 xz-libs-5.1.2-14alpha.fc21.x86_64 yaml-cpp-0.5.1-4.fc21.x86_64 zlib-1.2.8-7.fc21.x86_64
(gdb) where
#0  query::result_set_row::throw_no_such_column (this=this@entry=0x6040004bcac0, column_name=...) at ./query-result-set.hh:44
#1  0x00000000005564fd in get_data_value (column_name=..., this=0x6040004bcac0) at ./query-result-set.hh:53
#2  get<basic_sstring<char, unsigned int, 15u> > (column_name=..., this=0x6040004bcac0) at ./query-result-set.hh:61
#3  query::result_set_row::get_nonnull<basic_sstring<char, unsigned int, 15u> > (this=0x6040004bcac0, column_name=...) at ./query-result-set.hh:69
#4  0x00000000007dde2d in db::legacy_schema_tables::create_table_from_table_row_and_column_rows (builder=..., table_row=..., serialized_column_definitions=...) at db/legacy_schema_tables.cc:1240
#5  0x00000000007e25df in _ZZN6futureIISt4pairIK13basic_sstringIcjLj15EE13lw_shared_ptrIN5query10result_setEEEEE4thenIS4_IK6schemaEZN2db20legacy_schema_tables27create_table_from_table_rowERN7service13storage_proxyERKNS5_14result_set_rowEEUlT_E_ZNS9_4thenISN_S_IISD_EEEET0_OT_EUlO12future_stateIIS8_EEE_EEN8futurizeISR_E4typeEOSQ_OT1_ENUlOSM_E_clISU_EEDaSS_ () at db/legacy_schema_tables.cc:1197
#6  0x0000000000952a62 in reactor::run_tasks (this=this@entry=0x600000243000, tasks=..., quota=<optimized out>) at core/reactor.cc:1025
#7  0x0000000000974f74 in reactor::run (this=0x600000243000) at core/reactor.cc:1122
#8  0x0000000000999936 in app_template::run(int, char**, std::function<void ()>&&) (this=this@entry=0x7fffffffe0f0, ac=ac@entry=18, av=av@entry=0x7fffffffe338, 
    func=func@entry=<unknown type in /home/shlomi/urchin/build/release/seastar, CU 0x8f8b74e, DIE 0x90225f1>) at core/app-template.cc:104
#9  0x000000000041773d in main (ac=18, av=0x7fffffffe338) at main.cc:161
(gdb) l
39      { }
40      bool has(const sstring& column_name) const {
41          return _cells.count(column_name) > 0;
42      }
43  
44      void throw_no_such_column(const sstring& column_name) const throw (no_such_column)  {
45          throw no_such_column(column_name);
46      }
47  
48      // Look up a deserialized row cell value by column name.
(gdb) up
#1  0x00000000005564fd in get_data_value (column_name=..., this=0x6040004bcac0) at ./query-result-set.hh:53
53              throw_no_such_column(column_name);
(gdb) l
48      // Look up a deserialized row cell value by column name.
49      const data_value&
50      get_data_value(const sstring& column_name) const throw (no_such_column) {
51          auto it = _cells.find(column_name);
52          if (it == _cells.end()) {
53              throw_no_such_column(column_name);
54          }
55          return it->second;
56      }
57      // Look up a deserialized row cell value by column name.

Adding some more info

it seems db::legacy_schema_tables::create_table_from_table_row_and_column_row table_row is bad

(gdb) p (*table_row._schema._p)
$15 = {
  _count = 105690560160320, 
  _value = {
    _raw = {
      _id = {
        most_sig_bits = 105690560135448, 
        least_sig_bits = -72057104411656175
      }, 
      _ks_name = {
        u = {
          external = {
            str = 0x6020004b024c "\300\001K", 
            size = 27968, 
            pad = 32 ' '
          }, 
          internal = {
            str = "L\002K\000 `\000\000@m\000\000 `", 
            size = 0 '\000'
          }
        }, 
        static npos = 4294967295
      }, 
      _cf_name = {
        u = {
          external = {
            str = 0x602000006d40 "\220\302", <incomplete sequence \312>, 
            size = 2870193999, 
            pad = 85 'U'
          }, 
          internal = {
            str = "@m\000\000 `\000\000O\257\023\253U\256", <incomplete sequence \352>, 
            size = -53 '\313'
          }
        }, 
        static npos = 4294967295
      }, 
      _columns = std::vector of length 600514536562, capacity 41537358947795482 = {<error reading variable>, 
    _thrift = {<No data fields>}, 
    _offsets = {
      _M_elems = {1080581128159031495, 105690558293056, 6388478077433678178}
    }, 
    _columns_by_name = std::unordered_map with 7854399174923743845 elements<error reading variable: Cannot access memory at address 0x765f616d65686373>, 
    _regular_columns_by_name = std::map with 1080863910568919040 elements<error reading variable: Cannot access memory at address 0x6e6f697372658e>, 
    _partition_key_type = {
      _p = 0x6020002ee2e0
    }, 
    _clustering_key_type = {
      _p = 0x854153ca0d5f8ebe
    }, 
    _clustering_key_prefix_type = {
      _p = 0x0
---Type <return> to continue, or q <return> to quit---
    }, 
    static NAME_LENGTH = 48, 
    static DEFAULT_COMPRESSOR = {
      <std::experimental::_Optional_base<basic_sstring<char, unsigned int, 15u>, true>> = {
        {
          _M_empty = {<No data fields>}, 
          _M_payload = {
            u = {
              external = {
                str = 0x72706d6f43345a4c <error: Cannot access memory at address 0x72706d6f43345a4c>, 
                size = 1869837157, 
                pad = 114 'r'
              }, 
              internal = {
                str = "LZ4Compressor\000", 
                size = 13 '\r'
              }
            }, 
            static npos = 4294967295
          }
        }, 
        _M_engaged = true
      }, 
      <std::_Enable_copy_move<true, true, true, true, std::experimental::optional<basic_sstring<char, unsigned int, 15u> > >> = {<No data fields>}, <No data fields>}
  }
}

On a "normal" run I get

$17 = {
  _count = 12, 
  _value = {
    _raw = {
      _id = {
        most_sig_bits = 5041132583425687427, 
        least_sig_bits = -6673472403377510761
      }, 
      _ks_name = {
        u = {
          external = {
            str = 0x6d6574737973 "", 
            size = 13871818, 
            pad = 0 '\000'
          }, 
          internal = {
            str = "system\000\000ʪ\323\000\000\000", 
            size = 6 '\006'
          }
        }, 
        static npos = 4294967295
      }, 
      _cf_name = {
        u = {
          external = {
            str = 0x600000250bd0 "schema_columnfamilies", 
            size = 21, 
            pad = 0 '\000'
          }, 
          internal = {
            str = "\320\v%\000\000`\000\000\025\000\000\000\000\000", 
            size = -1 '\377'
          }
        }, 
        static npos = 4294967295
      }, 
      _columns = std::vector of length 26, capacity 26 = {{
          _name = {
            u = {
              external = {
                str = 0x656361707379656b <error: Cannot access memory at address 0x656361707379656b>, 
                size = 1835101791, 
                pad = 101 'e'
              }, 
              internal = {
                str = "keyspace_name", '\000' <repeats 17 times>, 
                size = 13 '\r'
              }
            }, 
            static npos = 4294967295
          }, 
.
.
.

Compilation is S L O W

Today, a fresh "configure.py --mode=release; ninja" takes a whopping 9 minutes on my machine.
All to often, chaning just one (header) file results in a large part of this being redone.
Life is too short for this :-)

We could try to improve these figures by reducing the inclusion of header files, moving more things to .cc files, not building all the tests by default, and perhaps other tricks.

P.S. as a reference to just how slow 9 minutes is, consider that a fresh compile of OSv on my machine takes just 4 minutes. Moreover in OSv, most changes require recompiling only a few files, and take just a few seconds.

Unable to create tables without compression

executing via cqlsh

CREATE KEYSPACE keyspace1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

CREATE TABLE keyspace1.standard2 (key text PRIMARY KEY, c0 text, c1 text, c2 text, c3 text, c4 text ) with compression = { 'sstable_compression' : ''};

<ErrorMessage code=0000 [Server error] message="Unsupported compression class ''''.">

Head 86d9139

Handle unbounded number of tasks entering the system

On a higher level, we need to ensure we don't allow an unbounded number of tasks to enter the system (admission control, we have this problem everywhere). In the case of gossip, the proper approach is to cancel old tasks (say by killing their connections). In other cases (main transaction processing) we need to slow/stop incoming requests until old requests have had the time to complete. It's going to be interesting.

Crash when reading cassandra stress

The data was inserted successfully (my own assumption) before.
Happened on the AWS setup

One crash:
'''
(gdb) bt
#0 allocate (this=) at core/memory.cc:705
#1 memory::cpu_pages::allocate_small (this=, size=) at core/memory.cc:448
#2 0x000000000086e762 in _M_init_functor (__f=, __functor=...) at /usr/include/c++/5.1.1/tr1/functional:1713
#3 _M_init_functor (__f=, __functor=...) at /usr/include/c++/5.1.1/tr1/functional:1684
#4 function<thrift_server::connection::process_one_request()::<lambda()>::<lambda(bool)> > (__f=..., this=0x7ffbc2187a60)

at /usr/include/c++/5.1.1/tr1/functional:2133

#5 thrift_server::connection::process_one_request()::{lambda()#1}::operator()() const (__closure=__closure@entry=0x61401dbd27f0) at thrift/server.cc:102
#6 0x000000000086fed0 in apply (args=, func=) at /mnt/urchin/seastar/core/apply.hh:34
#7 apply<thrift_server::connection::process_one_request()::<lambda()> > (args=, func=) at /mnt/urchin/seastar/core/apply.hh:42
#8 apply<thrift_server::connection::process_one_request()::<lambda()> > (args=, func=) at /mnt/urchin/seastar/core/future.hh:971
#9 operator()<future_state<> > (state=<unknown type in /mnt/urchin/build/release/scylla, CU 0x46b6ac7, DIE 0x47e06ca>, __closure=0x61401dbd27d0)

at /mnt/urchin/seastar/core/future.hh:636

#10 _ZN12continuationIZN6futureIJEE4thenIS1_ZN13thrift_server10connection19process_one_requestEvEUlvE_ZNS1_4thenIS5_S1_EET0_OT_EUlO12future_stateIJEEE_EEN8futurizeIS8_E4typeEOS7_OT1_EUlS9_E_JEE3runEv (this=0x61401dbd27c0) at /mnt/urchin/seastar/core/future.hh:333
#11 0x000000000043d62a in reactor::run_tasks (this=this@entry=0x614000119000, tasks=..., quota=) at core/reactor.cc:1037
#12 0x0000000000460a5c in reactor::run (this=0x614000119000) at core/reactor.cc:1134
#13 0x000000000046a021 in smp::<lambda()>::operator()(void) const (__closure=0x60000012e800) at core/reactor.cc:1784
#14 0x0000000000439b6e in operator() (this=) at /usr/include/c++/5.1.1/functional:2271
#15 dpdk_thread_adaptor (f=) at core/reactor.cc:1649
#16 0x00000000005e3c8b in eal_thread_loop ()
#17 0x00007ffff4422555 in start_thread () from /lib64/libpthread.so.0
#18 0x00007ffff415db9d in clone () from /lib64/libc.so.6

'''

Result of token() is showed by cqlsh as integer, whereas on urchin it's showed as bytes

cqlsh> create table cf (k blob, v int, primary key (k));
cqlsh> insert into cf (k, v) values (0x01, 0);

On Origin:

cqlsh> select k, token(k) from cf;

 k    | system.token(k)
------+---------------------
 0x01 | 8849112093580131862

On urchin:

cqlsh> select k, token(k) from cf;

 k    | system.token(k)
------+--------------------
 0x02 | 0x635e0e9e37361ef2

Not implemented: RANGE_QUERIES

This worked for me in the past, but it does not work anymore.

asias@hjpc urchin$ for i in {1..2}; do cqlsh 127.0.0.$i -e "select * from ks.tb";done
:1:<ErrorMessage code=0000 [Server error] message="Not implemented: RANGE_QUERIES">
:1:<ErrorMessage code=0000 [Server error] message="Not implemented: RANGE_QUERIES">

I tested with this commit:
commit 5339783
Author: Avi Kivity [email protected]
Date: Sun Jul 19 21:41:42 2015 +0300

build: handle case where ninja command is named 'ninja-build'

SIGFAULT while running select * in a 3 node cluster on AWS

Head at:

commit 5f88707
Merge: 5d68f33 83a64d7
Author: Avi Kivity [email protected]
Date: Wed Jul 15 14:52:15 2015 +0300

Merge "API: Replacing empty string to void" from Amnon

All nodes are running with --smp 1

Running cassandra-stress of insert works

cassandra/tools/bin/cassandra-stress user n=100 no-warmup profile=./simple_test_no_compression.yaml ops(insert=1) -mode cql3 native -rate threads=10 -node 172.31.19.92

Connecting with CQLSH and executing select * from keyspace2.standrad1 fails on the node I connect to

Program received signal SIGSEGV, Segmentation fault.
__destroy<partition*> (__last=0x0, __first=0x1) at /usr/include/c++/4.9/bits/stl_construct.h:103
103 std::_Destroy(std::__addressof(__first));
(gdb) where
#0 __destroy<partition
> (__last=0x0, __first=0x1) at /usr/include/c++/4.9/bits/stl_construct.h:103
#1 _Destroy<partition*> (__last=0x0, __first=) at /usr/include/c++/4.9/bits/stl_construct.h:126
#2 _Destroy<partition*, partition> (__last=0x0, __first=) at /usr/include/c++/4.9/bits/stl_construct.h:151
#3 std::vector<partition, std::allocator >::~vector (this=0x7fffffffda20, __in_chrg=) at /usr/include/c++/4.9/bits/stl_vector.h:424
#4 0x000000000058a1e8 in _M_move_assign (__x=<unknown type in /home/ubuntu/urchin/build/release/seastar, CU 0x2021b53, DIE 0x20cef30>, this=0x6000006d02c8) at /usr/include/c++/4.9/bits/stl_vector.h:1459
#5 operator= (__x=<unknown type in /home/ubuntu/urchin/build/release/seastar, CU 0x2021b53, DIE 0x20cef6c>, this=0x6000006d02c8) at /usr/include/c++/4.9/bits/stl_vector.h:453
#6 operator= (this=0x6000006d02c0) at mutation_query.hh:47
#7 db::serializer<reconcilable_result>::read (v=..., in=...) at mutation_query.cc:90
#8 0x00000000007a4ecf in operator() (__closure=, __closure=, buf=)

at ./message/messaging_service.hh:144

#9 apply (args=<unknown type in /home/ubuntu/urchin/build/release/seastar, CU 0x4a8be79, DIE 0x4f35684>, func=<unknown type in /home/ubuntu/urchin/build/release/seastar, CU 0x4a8be79, DIE 0x4f35672>)

at ./core/apply.hh:34

#10 applynet::serializer::read_serializable(input_stream<char&, Serializable&)::<lambda(auto:27)> mutable [with auto:27 = unsigned int; Serializable = reconcilable_result]::<lambda(temporary_buffer)>, temporary_buffer > (args=<unknown type in /home/ubuntu/urchin/build/release/seastar, CU 0x4a8be79, DIE 0x4ddddfe>,

func=<unknown type in /home/ubuntu/urchin/build/release/seastar, CU 0x4a8be79, DIE 0x4ddddea>) at ./core/apply.hh:42

#11 applynet::serializer::read_serializable(input_stream<char&, Serializable&)::<lambda(auto:27)> mutable [with auto:27 = unsigned int; Serializable = reconcilable_result]::<lambda(temporary_buffer)>, temporary_buffer > (args=<unknown type in /home/ubuntu/urchin/build/release/seastar, CU 0x4a8be79, DIE 0x4e341fa>,

func=<unknown type in /home/ubuntu/urchin/build/release/seastar, CU 0x4a8be79, DIE 0x4e341e7>) at ./core/future.hh:948

#12 ZZN6futureII16temporary_bufferIcEEE4thenIvZZN3net10serializer17read_serializableI19reconcilable_resultEES_IIEER12input_streamIcERT_ENUlSC_E_clIjEEDaT_EUlS1_E_ZNS2_4thenISH_S8_EET0_OSG_EUlO12future_stateIIS1_EEE_EEN8futurizeISG_E4typeEOSJ_OT1_ENUlOSC_E_clISM_EEDaSK (__closure=0x600000215cd8, state=) at ./core/future.hh:626
#13 0x00000000009a70f2 in reactor::run_tasks (this=0x7fffffffda20, this@entry=0x60000020b000, tasks=..., quota=105553117362128) at core/reactor.cc:1011
#14 0x00000000009c894b in reactor::run (this=0x60000020b000) at core/reactor.cc:1108
#15 0x00000000009ecc76 in app_template::run(int, char**, std::function<void ()>&&) (this=this@entry=0x7fffffffe3a0, ac=ac@entry=9, av=av@entry=0x7fffffffe5f8,

func=func@entry=<unknown type in /home/ubuntu/urchin/build/release/seastar, CU 0x9815342, DIE 0x98ab69a>) at core/app-template.cc:104

#16 0x000000000041622e in main (ac=9, av=0x7fffffffe5f8) at main.cc:157

Verified that if I do this toward a single node running on the same HW - everything is working ok

We don't handle expiring rows properly

Test case:

create table table3 (id int, ck int, v1 int, v2 int, primary key (id, ck));
insert into table3 (id, ck, v1) values (0, 0, 0) using ttl 1;
// wait 1 second
select * form table3;

Expected result:

 id | ck | v1 | v2
----+----+----+----

(0 rows)

Current result:


 id | ck | v1   | v2
----+----+------+------
  0 |  0 | null | null

(1 rows)

TRUNCATE_ERROR using cpp driver

When running Scylla against the cpp driver (instructions https://github.com/cloudius-systems/urchin/wiki/Using-example-perf.c-from-the-cpp-driver) I get a message on the client
saying 'Truncate error'. It obviously comes from the server since the client doesn't
have this in 'git grep' (I also verified the fprintf it comes from).

When using Origin on the same scenario there is an exception like this:
Error: java.lang.IndexOutOfBoundsException: index: 157, length: 1397051988 (expected: range(0, 202))

Apparently we get out of bounds too but we don't notice it.

In order to reproduce, use scylla with posix and run the perf example in parallel to a cassandra-stress (may not be a must).

These are some of the configs of perf.c:
#define NUM_THREADS 2
#define NUM_IO_WORKER_THREADS 1
#define NUM_CONCURRENT_REQUESTS 200
#define NUM_ITERATIONS 10000

#define DO_SELECTS 0
#define USE_PREPARED 0

const char* big_string = "012670123456701234567";

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.