GithubHelp home page GithubHelp logo

Encoding problems with bytea about libpqxx HOT 22 OPEN

Broollie avatar Broollie commented on August 27, 2024
Encoding problems with bytea

from libpqxx.

Comments (22)

jtv avatar jtv commented on August 27, 2024 1

The good news is that I got the documentation build working again. (With lots of scary warning messages, so I'll be chipping away at that pile still.) I've got one other small feature to build, and then hopefully I can swap this ticket back into my personal memory and see what I can do about it.

from libpqxx.

tt4g avatar tt4g commented on August 27, 2024

When you send prepared statement parameters with libpq, all parameters are converted once into strings.
If you only write $1 in the SQL, PostgreSQL will treat it as a string and will try to interpret it as client encoding UTF-8, which will fail.

Try replacing $1 with $1::bytea.

from libpqxx.

Broollie avatar Broollie commented on August 27, 2024

When you send prepared statement parameters with libpq, all parameters are converted once into strings. If you only write $1 in the SQL, PostgreSQL will treat it as a string and will try to interpret it as client encoding UTF-8, which will fail.

Try replacing $1 with $1::bytea.

Thank you for your suggestion, but I tried adding ::bytea earlier and it didn't work. But thanks to your explanation I ve got a soluction that does work. By passing input into pqxx::internal::esc_bin it will produce escaped string and will work correctly afterwards.

I would guess, that it also should not be too expensive by calling it this way:
pqxx::internal::esc_bin({badData.begin(), badData.end()})
as it will create a view for an input argument. But I wonder, if there is a better solution.

Anyways, i will also include my solution there in case somebody will find it useful:

#include <pqxx/pqxx>
#include <iostream>
#include <format>

inline constexpr std::size_t DataSize = 8;

using Data = std::array<std::byte, DataSize>;

namespace pqxx
{
    template <>
    struct string_traits<Data>
    {
        // static constexpr bool converts_to_string{true};
        static constexpr bool converts_from_string{true};

        static constexpr auto DataSize = internal::size_esc_bin(std::tuple_size_v<Data>);

        static Data from_string(std::string_view text)
        {
            if (text.size() + 1 != DataSize) //size_esc_bin adds 1 byte for \0 but std::string_view::size doesn't include it
            {
                throw pqxx::conversion_error(std::format("invalid data size: {}. Expected: {}", text.size(), DataSize));
            }

            Data buf;
            pqxx::internal::unesc_bin(text, reinterpret_cast<std::byte *>(buf.data()));
            return buf;
        }

        static zview to_buf(char *begin, char *end, Data const &value)
        {
            if (end - begin < DataSize)
            {
                throw pqxx::conversion_overrun(std::format("to_buf: unable to fit data into buffer with size {}", end - begin));
            }

            return generic_to_buf(begin, end, value);
        }

        static char *into_buf(char *begin, char *end, Data const &value)
        {
            if (end - begin < DataSize)
            {
                throw pqxx::conversion_overrun(std::format("into_buf: unable to fit data into buffer with size {}", end - begin));
            }

            for (auto i = 0; i < value.size(); i++)
            {
                begin[i] = static_cast<char>(value.at(i));
            }
            begin[value.size()] = '\0';

            return begin + DataSize;
        }

        static size_t size_buffer(Data const &value) noexcept
        {
            return DataSize;
        }
    };
}

namespace std
{
    template <>
    struct formatter<Data> : formatter<string>
    {
        auto format(const Data &data, format_context &ctx) const
        {
            int tmpData[DataSize];
            std::transform(data.begin(), data.end(), tmpData, [](auto elem)
                           { return static_cast<int>(elem); });
            return formatter<string>::format(
                std::format("{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}{:02x}",
                            tmpData[0], tmpData[1], tmpData[2], tmpData[3],
                            tmpData[4], tmpData[5], tmpData[6], tmpData[7]),
                ctx);
        }
    };
}

inline constexpr std::byte operator""_bt(unsigned long long c)
{
    return static_cast<std::byte>(c);
}

auto main(int argc, const char *argv[]) -> int
{
    static constexpr std::string_view ConnectionQuerySchema = "host={} port={} dbname={} user={} password={}";
    static constexpr std::string_view InsertionSchema = "INSERT INTO test_table VALUES ('\\x{}')";

    static constexpr Data okData{0x01_bt, 0x01_bt, 0x01_bt, 0x01_bt, 0x01_bt, 0x01_bt, 0x01_bt, 0x01_bt};
    static constexpr Data badData{0xd6_bt, 0x5f_bt, 0xd6_bt, 0x5f_bt, 0xd6_bt, 0x5f_bt, 0xd6_bt, 0x5f_bt};

    try
    {
        pqxx::connection con(std::format(ConnectionQuerySchema, argv[1], argv[2], argv[3], argv[4], argv[5]));
        con.prepare("insert", "INSERT INTO test_table VALUES ($1)");
        con.prepare("view", "SELECT * FROM test_table");

        pqxx::nontransaction nt(con);
        nt.exec0("CREATE TABLE IF NOT EXISTS test_table(data bytea NOT NULL)");

        std::cout << "Ok: " << std::boolalpha << nt.exec_prepared0("insert", okData).empty() << std::endl;
        const auto query = std::format(InsertionSchema, badData);
        std::cout << "Query: " << query << " Also ok: " << nt.exec0(query).empty() << std::endl;
        std::cout << "Bad: " << nt.exec_prepared0("insert", pqxx::internal::esc_bin({badData.begin(), badData.end()})).empty() << std::endl;

        const auto result = nt.exec_prepared("view");

        for(auto row : result)
        {
            auto [data] = std::move(row.as<Data>());
            std::cout << std::format("{}", data) << std::endl; 
        }
    }
    catch (const std::exception &e)
    {
        std::cerr << e.what() << '\n';
    }
}

from libpqxx.

tt4g avatar tt4g commented on August 27, 2024

Glad It's solved.
I should have noticed earlier that unesc_bin() was being called and not esc_bin().

from libpqxx.

jtv avatar jtv commented on August 27, 2024

Oops, and I'm only just trying to catch up! Thanks once again @tt4g for jumping in.

@Broollie I did notice this one little thing:

        static Data from_string(std::string_view text)
        {
            if (text.size() != DataSize)
            {
                throw pqxx::conversion_error(std::format("invalid data size: {}. Expected: {}", text.size(), DataSize));
            }
            // ...
        }

It looks to me here as if you're using DataSize for two very different quantities: on the one hand, it's the number of binary bytes in a Data value, but on the other, it's also the number of text bytes you expect in the SQL representation of a Data value. Those are two very different things!

from libpqxx.

jtv avatar jtv commented on August 27, 2024

(I took the liberty to edit the code snippets you posted above: when you add "cxx" after the three backticks at the beginning, you get C++ syntax highlighting. :-)

from libpqxx.

jtv avatar jtv commented on August 27, 2024

And I must admit that the mechanism by which libpq figures out the types of the parameters is mysterious to me as well.

By the way if you're working in C++20 @Broollie, do you actually need your own string_traits implementation? I think Data satisfies the binary concept defined in include/pqxx/strconv.hxx so you should have a conversion out of the box. ISTM you should be able to just use that, or if you want the size check in every conversion, then you could at least delegate to it.

Generally it's not a good idea to rely on items from the pqxx::internal:: namespace, because I reserve the right to change those on a whim. In particular, libpqxx 8 will require C++20 or better, and I'm hoping to replace a lot of these functions with ones that use std::span and what have you.

from libpqxx.

Broollie avatar Broollie commented on August 27, 2024

Oops, and I'm only just trying to catch up! Thanks once again @tt4g for jumping in.

It looks to me here as if you're using DataSize for two very different quantities: on the one hand, it's the number of binary bytes in a Data value, but on the other, it's also the number of text bytes you expect in the SQL representation of a Data value. Those are two very different things!

Hey, it s ok, i am myself always late :D Regarding DataSize, ye, that was my mistake on naming constants, didn't notice that I already have DataSize. It should be more like this

// ...
inline constexpr std::size_t DataSize = 8;
struct string_traits<Data>
{
    static constexpr auto EscDataSize = internal::size_esc_bin(std::tuple_size_v<Data>);
    // ...

By the way if you're working in C++20 @Broollie, do you actually need your own string_traits implementation? I think Data satisfies the binary concept defined in include/pqxx/strconv.hxx so you should have a conversion out of the box. ISTM you should be able to just use that, or if you want the size check in every conversion, then you could at least delegate to it.

That s actually what made me attempt to do it in the first place, because std::array satisfies std::ranges::contiguous_range
and so std::array<std::byte, N> should be usable for this. But it does require to define string_traits, specifically because it cannot resolve string_traits::from_string(text):

In file included from /home/linuxbrew/.linuxbrew/include/pqxx/internal/concat.hxx:7,
                 from /home/linuxbrew/.linuxbrew/include/pqxx/connection.hxx:36,
                 from /home/linuxbrew/.linuxbrew/include/pqxx/array.hxx:26,
                 from /home/linuxbrew/.linuxbrew/include/pqxx/pqxx:4,
                 from test.cpp:1:
/home/linuxbrew/.linuxbrew/include/pqxx/strconv.hxx: In instantiation of 'TYPE pqxx::from_string(std::string_view) [with TYPE = std::array<std::byte, 8>; std::string_view = std::basic_string_view<char>]':
/home/linuxbrew/.linuxbrew/include/pqxx/field.hxx:230:28:   required from 'T pqxx::field::as() const [with T = std::array<std::byte, 8>]'
/home/linuxbrew/.linuxbrew/include/pqxx/row.hxx:284:65:   required from 'auto pqxx::row::get_field() const [with TUPLE = std::tuple<std::array<std::byte, 8> >; long unsigned int index = 0]'
/home/linuxbrew/.linuxbrew/include/pqxx/row.hxx:278:53:   required from 'auto pqxx::row::get_tuple(std::index_sequence<__indices ...>) const [with TUPLE = std::tuple<std::array<std::byte, 8> >; long unsigned int ...indexes = {0}; std::index_sequence<__indices ...> = std::integer_sequence<long unsigned int, 0>]'
/home/linuxbrew/.linuxbrew/include/pqxx/row.hxx:192:42:   required from 'std::tuple<_UTypes ...> pqxx::row::as() const [with TYPE = {std::array<std::byte, 8>}]'
test.cpp:114:43:   required from here
/home/linuxbrew/.linuxbrew/include/pqxx/strconv.hxx:441:42: error: 'from_string' is not a member of 'pqxx::string_traits<std::array<std::byte, 8> >'
  441 |   return string_traits<TYPE>::from_string(text);

And it s not just pqxx::row::as(), but pqxx::row::reference::get<>() as well.

Generally it's not a good idea to rely on items from the pqxx::internal:: namespace, because I reserve the right to change those on a whim. In particular, libpqxx 8 will require C++20 or better, and I'm hoping to replace a lot of these functions with ones that use std::span and what have you.

Ye, looking through docs once again, i noticed convenient pqxx::binary_cast that not only works in my case, but also produces a view, so no copies (unlike pqxx::internal::esc_bin, which produces std::string)

from libpqxx.

Broollie avatar Broollie commented on August 27, 2024

Upon further investigation, i found out that PQXX_HAVE_CONCEPTS was not defined, so now I wonder if there are other defines, that may be missing and, if defined, should enable direct use of std::array<std::byte, N>

from libpqxx.

jtv avatar jtv commented on August 27, 2024

The configure or cmake step to the libpqxx build should detect whether your compiler supports concepts.

I'm writing this on my phone where it's hard to look up past conversation... Did you select C++20 (or newer) when configuring the build? Because if so, I would definitely have expected it to detect concepts support.

By the way beware that esc_bin and binary_cast do entirely different things! The former converts binary data to SQL text format. The latter merely creates a view from a pointer, without converting.

from libpqxx.

jtv avatar jtv commented on August 27, 2024

@Broollie when building libpqxx, did you use configure or cmake? If you used configure then the file config.log should show whether the configure script found __cpp_concepts defined or not. (And just in case: what matters is the C++ version you selected when configuring the build, i.e. when running either configure or cmake.)

from libpqxx.

Broollie avatar Broollie commented on August 27, 2024

@jtv Ok, i ve figured out what s going on. Mostly. So, basically in my main project i build libpqxx with cmake and all defines are present there. But for this test I use libpqxx installed with brew package manager. And defines are missing there because by default libpqxx v7.9 uses CMAKE_CXX_STANDARD 17. But it is not as important. So the reason why I can't use std::array<std::byte, N> directly is that string_traits already has implementation for std::array and it doesn't have from_string() defined. I checked if you can derive from internal::array_string_traits<> and simply implement from_string() and it does work in my case

from libpqxx.

Broollie avatar Broollie commented on August 27, 2024

But it also seems like I m missing something, because direct call to pqxx::string_traits<Data>::size_buffer(OkData) results in linkage error for pqxx::oops_forbidden_conversion<std::byte>(), which is not a great sign

from libpqxx.

Broollie avatar Broollie commented on August 27, 2024

By the way beware that esc_bin and binary_cast do entirely different things! The former converts binary data to SQL text format. The latter merely creates a view from a pointer, without converting.

Oh, and to clarify, I saw, that binary_cast simply creates a std::basic_string_view<std::byte> without escaping data and since its behavior is defined, it should work just fine. Seems to be correct

from libpqxx.

jtv avatar jtv commented on August 27, 2024

Ah yes, the forbidden conversion - I guess that's two concepts clashing.

A span of contiguous std::byte should act as a bytea. And I was expecting a std::array of std::byte to act as one of those.

But more generically, a std::array of any other type should convert as a bunch of individual elements, wrapped in SQL array syntax. But that won't work for std::byte because by design there is no conversion for std::byte.

So we'll need a way to tell the compiler that we want the former in this case, not the latter.

from libpqxx.

jtv avatar jtv commented on August 27, 2024

Or, given that the most specific specialisation should normally win out, perhaps it's a matter of removing a string_traits specialisation that has now become too specific.

from libpqxx.

Broollie avatar Broollie commented on August 27, 2024

That is indeed a clash of 2 while not completely different, but still contradicting concepts. There are enough ways of resolving this conflict while keeping backward compatibility (or you can ignore it in favor of implementing changes for 8.x version and some changes in traits API, if there are any). I would guess it may be solved either by creating more specialisations (like having different implementations for std::array<T, N> and std::array<std::byte, N>), that would make string_traits even more specific, or by going other way and creating more generic, yet more flexible traits API. But those are extremes of both sides, so a solution may lie between them

from libpqxx.

jtv avatar jtv commented on August 27, 2024

@Broollie for now I'm just utterly stuck on getting readthedocs working again. I'm rebuilding that from scratch, because I see no other way at this point. Hopefully after that I can focus on actual progress again.

from libpqxx.

Broollie avatar Broollie commented on August 27, 2024

@jtv sure, we can just keep the issue open so that you can return to it once you r done

from libpqxx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.