GithubHelp home page GithubHelp logo

alibaba / hessian2-codec Goto Github PK

View Code? Open in Web Editor NEW
25.0 7.0 7.0 140 KB

hessian2-codec it is a complete C++ implementation of hessian2 spec

License: Apache License 2.0

Starlark 5.05% Shell 3.14% C++ 72.63% Java 19.16% C 0.02%
hessian2 dubbo envoy serializer

hessian2-codec's Introduction

hessian2-codec

CI License Coverage

hessian2-codec is a C++ library from Alibaba for hessian2 codec. It is a complete C++ implementation of hessian2 spec. Because it was originally intended to implement the Dubbo Filter of Envoy, it did not provide good support for serialization of user-defined types (there is only one way to implement user-defined types using ADL, but it is not very complete and does not support nested types well). At the moment it is simply deserializing content into some C++ intermediate types.

Getting Started

Install

  1. To download and install Bazel (and any of its dependencies), consult the Bazel Installation Guide.
  2. Refer to Supported Platforms installation related compiler.
  3. Use hessian2-codec, see the demo directory for details.
$ cd demo
$ bazel build //...
$ ./bazel-bin/demo

Basic usage

#include <iostream>

#include "hessian2/codec.hpp"
#include "hessian2/basic_codec/object_codec.hpp"

int main() {
  {
    std::string out;
    ::Hessian2::Encoder encode(out);
    encode.encode<std::string>("test string");
    ::Hessian2::Decoder decode(out);
    auto ret = decode.decode<std::string>();
    if (ret) {
      std::cout << *ret << std::endl;
    } else {
      std::cerr << "decode failed: " << decode.getErrorMessage() << std::endl;
    }
  }
  {
    std::string out;
    ::Hessian2::Encoder encode(out);
    encode.encode<int64_t>(100);
    ::Hessian2::Decoder decode(out);
    auto ret = decode.decode<int64_t>();
    if (ret) {
      std::cout << *ret << std::endl;
    } else {
      std::cerr << "decode failed: " << decode.getErrorMessage() << std::endl;
    }
  }

  return 0;
}

Advance usage

  1. Implement the serialization and deserialization of custom types with ADL
#include <iostream>

#include "hessian2/codec.hpp"
#include "hessian2/basic_codec/object_codec.hpp"

struct Person {
  int32_t age_{0};
  std::string name_;
};

// The custom struct needs to implement from_hessian and to_hessian methods to
// encode and decode

void fromHessian(Person&, ::Hessian2::Decoder&);
bool toHessian(const Person&, ::Hessian2::Encoder&);

void fromHessian(Person& p, ::Hessian2::Decoder& d) {
  auto age = d.decode<int32_t>();
  if (age) {
    p.age_ = *age;
  }

  auto name = d.decode<std::string>();
  if (name) {
    p.name_ = *name;
  }
}

bool toHessian(const Person& p, ::Hessian2::Encoder& e) {
  e.encode<int32_t>(p.age_);
  e.encode<std::string>(p.name_);
  return true;
}

int main() {
  std::string out;
  Hessian2::Encoder encode(out);
  Person s;
  s.age_ = 12;
  s.name_ = "test";

  encode.encode<Person>(s);
  Hessian2::Decoder decode(out);
  auto decode_person = decode.decode<Person>();
  if (!decode_person) {
    std::cerr << "hessian decode failed " << decode.getErrorMessage()
              << std::endl;
    return -1;
  }
  std::cout << "Age: " << decode_person->age_
            << " Name: " << decode_person->name_ << std::endl;
}

There is currently no way to serialize container nested custom types such asstd::list<Person>.

  1. Customize Reader and Writer

Hessian2-codec uses the std::string implementation of reader and Writer by default, although we can customize both implementations.

#include <vector>
#include <iostream>

#include "hessian2/codec.hpp"
#include "hessian2/basic_codec/object_codec.hpp"
#include "hessian2/reader.hpp"
#include "hessian2/writer.hpp"

#include "absl/strings/string_view.h"

struct Slice {
  const uint8_t* data_;
  size_t size_;
};

class SliceReader : public ::Hessian2::Reader {
 public:
  SliceReader(Slice buffer) : buffer_(buffer){};
  virtual ~SliceReader() = default;

  virtual void rawReadNBytes(void* out, size_t len,
                             size_t peek_offset) override {
    ABSL_ASSERT(byteAvailable() + peek_offset >= len);
    uint8_t* dest = static_cast<uint8_t*>(out);
    // offset() Returns the current position that has been read.
    memcpy(dest, buffer_.data_ + offset() + peek_offset, len);
  }
  virtual uint64_t length() const override { return buffer_.size_; }

 private:
  Slice buffer_;
};

class VectorWriter : public ::Hessian2::Writer {
 public:
  VectorWriter(std::vector<uint8_t>& data) : data_(data) {}
  ~VectorWriter() = default;
  virtual void rawWrite(const void* data, uint64_t size) {
    const char* src = static_cast<const char*>(data);
    for (size_t i = 0; i < size; i++) {
      data_.push_back(src[i]);
    }
  }
  virtual void rawWrite(absl::string_view data) {
    for (auto& ch : data) {
      data_.push_back(ch);
    }
  }

 private:
  std::vector<uint8_t>& data_;
};

int main() {
  std::vector<uint8_t> data;
  auto writer = std::make_unique<VectorWriter>(data);

  ::Hessian2::Encoder encode(std::move(writer));
  encode.encode<std::string>("test string");
  Slice s{static_cast<const uint8_t*>(data.data()), data.size()};
  auto reader = std::make_unique<SliceReader>(s);
  ::Hessian2::Decoder decode(std::move(reader));
  auto ret = decode.decode<std::string>();
  if (ret) {
    std::cout << *ret << std::endl;
  } else {
    std::cerr << "decode failed: " << decode.getErrorMessage() << std::endl;
  }
}

Type mapping

C++ does not have a global parent like Java Object, so there is no single type that can represent all hessian types, so we create an Object base class from which all hessian types are inherited.

hessian type java type C++ type
null null NullObject
binary byte[] BinaryObject
boolean boolean BooleanObject
date java.util.Date DateObject
double double DoubleObject
int int IntegerObject
long long LongObject
string java.lang.String StringObject
untyped list java.util.List UntypedListObject
typed list java.util.ArrayList TypedListObject
untyped map java.util.Map UntypedMapObject
typed map for some OO language TypedMapObject
object custom define object ClassInstance

Supported Platforms

hessian2-codec requires a codebase and compiler compliant with the C++14 standard or newer.

The hessian2-codec code is officially supported on the following platforms. Operating systems or tools not listed below are community-supported. For community-supported platforms, patches that do not complicate the code may be considered.

If you notice any problems on your platform, please file an issue on the hessian2-codec GitHub Issue Tracker. Pull requests containing fixes are welcome!

Operating Systems

  • Linux
  • macOS
  • Windows(Theoretically, yes, but I haven't tested it.)

Compilers

  • gcc 7.0+
  • clang 7.0+

Build Systems

Who Is Using hessian2-codec?

In addition to many internal projects at Alibaba, hessian2-codec is also used by the following notable projects:

  • The Envoy (Dubbo Filter in Envoy will use hessian2-codec as the serializer).

Related Open Source Projects

Contributing

Please read CONTRIBUTING.md for details on how to contribute to this project.

Happy testing!

Develop

Generate compile_commands.json for this repo by bazel run :refresh_compile_commands. Thank https://github.com/hedronvision/bazel-compile-commands-extractor for it provide the great script/tool to make this so easy!

License

hessian2-codec is distributed under Apache License 2.0.

Acknowledgements

hessian2-codec's People

Contributors

alibaba-oss avatar wbpcode avatar zyfjeff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

hessian2-codec's Issues

The hessian2 specification has some errors and also some details not explained

Not sure whether I should submit the issue here, but the specification is really confusing.

For example:

  1. binary: in the 3. Hessian Grammar section, 0x41 ('A') represents non-final trunk, while in 4.1 binary, it says x62 ('b') represents any non-final chunk

  2. string: in the 3. Hessian Grammar section, [x30-x34] <utf8-data> represents string of length 0~1023, while in 4.12. string, it's [x30-x33] b0 <utf8-data>, and has no further explanation.

I may add more cases later.

Performance of String decoding

It seems that parsing UTF-8 strings takes up too much CPU in the decode process.
I've done some research and it might be possible to optimize it here. ๐Ÿค”

string encoding/decoding is imcompatible with dubbo-hessian-lite for four char > 0xFFFF

According to the Hessian2 standard, the string should be UTF8 encoded. However, UTF16 string is used by the Java. And when the dubbo-hessian-lite encoding the string, every UTF16 char is treated as one UTF8 char which is wrong.

Considering that most of dubbo users using dubbo-hessian-lite, we should try to provide the similar support for compatibility, even it's wrong.

duplicated symbol error

If we need to decode/encode string/int32_t etc. in different .cc files, we need to include the header file in basic_codec either directly or indirectly in the .cc files. In this case, some duplicated symbol errors occur, because some methods are defined in different compilation units.

Maybe we need to declare the template specialization methods in basic_codec as inline or extern to solve this problem.

Unified naming convention

In the project, there are currently three different naming convention for member functions:

  • camelCase: method of most of class.
  • PascalCase: method of hessian2::Reader/hessian2::Writer such as ReadNbytes, ByteAvailable etc.
  • snake_case: some method of hessian2::Object to_string, to_long etc.

It would be nice if they could be unified.

Use Protobuf message replace Object and implement all hessian type

C++ does not have a global parent like Java Object, so there is no single type that can represent all hessian types, so we create an Object base class from which all hessian types are inherited. However, this Object is still not easy to use, complex to construct, and does not provide a complete API similar to that of a standard container. If we use Protobuf to define all Hessian types and use the ProtoBuf Message Object as our base Object class, we can reuse the entire Protobuf API that we provide, thus improving the overall usability.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.