GithubHelp home page GithubHelp logo

Comments (16)

dehesa avatar dehesa commented on September 13, 2024 1

Would it be fine to get a cdv with random order when not setting the headers?

Yes and no. In CSV files, the column order implies meaning, therefore the user must know exactly the order of the generated outcome. It cannot be "execution magic". Also, the order must be consistent between several Student encoding.

That is however not the major problem, I believe we can find an "easy solution" for that.

from codablecsv.

lightsprint09 avatar lightsprint09 commented on September 13, 2024 1

Hey @dehesa,

sounds awesome. I will have a look

from codablecsv.

dehesa avatar dehesa commented on September 13, 2024

Hi @lightsprint09 sorry for the delay answering (somehow I am not receiving notifications on this project).

Any help is welcome, so if you want to toil at it, I would be happy to review or help. Most of the architecture for Encodable is there and it should be fairly similar to Decodable (which is currently working). To be honest I don't remember what it is missing (since it has been some time ago). I will try to look at it this weekend and I can give you a more thorough answer.

The problem is that adopting and developing for Codable is not straight forward since there is a lot of intermediate structures. It should be a fairly entertaining educational experience, though :D

from codablecsv.

lightsprint09 avatar lightsprint09 commented on September 13, 2024

Sounds good. Would you mind if I set up Travis CI?

from codablecsv.

dehesa avatar dehesa commented on September 13, 2024

That sounds interesting. I have never done it myself in an OpenSource project from Github. As far as I understand it is a CI system to run your test on every commit. But what will it actually entail? Do we have to keep a specific Git structure, who has access to its configuration?, etc.

from codablecsv.

dehesa avatar dehesa commented on September 13, 2024

@lightsprint09 I managed to take a look at the state of the code this weekend and the major problem with the Encodable adoption is the "random access" encoding that the user might decide to do while using encode(to:).

In simple words:

  • The Codable support is based in the underlying CSVReader/CSVWriter.
  • The CSVReader/CSVWriter are sequential and are based on streams (currently only supporting in-memory), but with only additive modifications they can access files directly or actually any other streams (including on-demand encoding/decoding for network streams).
  • The current problem with encoding is that the user might have written a CSV row that is further ahead (lets say index 10) and then decide to write in row at index 6 (for example).
    Based on the sequential stream nature of CSVWriter that is not allowed.

Thus, there are several options we can pursue:

  • The easiest/ugliest: To throw an encoding error as soon as the user tries to do that and implement the most straight forward CSVWriter operation.
  • Implement an intermediate in-memory cache where all records are being stored and only write them on the stream once the encoding has completed.
  • Don't use CSVWriter and implement something else instead not using Foundation's OutputStream (which will hinder future usage for file or network writing).

from codablecsv.

lightsprint09 avatar lightsprint09 commented on September 13, 2024

How is it possible to Writer further ahead? Using the same stream

from codablecsv.

dehesa avatar dehesa commented on September 13, 2024

Well, it is not yet implemented, but I suppose that if the user decide to write in row 10 and they are currently in row 5; the CSVWriter would write 4 row delimiters and then start writing whatever the user inputs.

Writing empty rows is actually quite common in CSV files.

from codablecsv.

lightsprint09 avatar lightsprint09 commented on September 13, 2024
let stream = OutputStream(url: ...)
let encoder = CSVEncoder(stream: stream)
let codable = ["A", "B", "C"]

try encoder.encode(codable)

How would I define where top write. I would expect to write at the end of there stream?

from codablecsv.

dehesa avatar dehesa commented on September 13, 2024

OutputStream is just a dumb pipeline that only "transmits" the bytes you are sending. Therefore, you can only write sequentially through its write(_ buffer: UnsafePointer<UInt8>, maxLength len: Int) function.

Encodable, however, offers a way to randomly-access where to write through their KeyedEncodingContainer (it is similar to a dictionary where you pass a CodingKey specifying where to write and the value you are encoding).

As you can see, there is an incompatibility from the sequential expectations of OutputStream and the freedom given to the user with KeyedEncodingContainer. The solution must decide how to operate:

  • Option 1 is to restrict the freedom to randomly-write by throwing an error when you do so.
    Users will probably find that a bit too "crash".
  • Option 2 is to implement an intermediate buffer and only when completed write sequentially from the beginning to OutputStream.
  • Option 3 won't use OutputStream.

from codablecsv.

lightsprint09 avatar lightsprint09 commented on September 13, 2024

Ok. I see.

Seems like I don't understand the Codable implementation correctly.

Can you show me an example how one would implement Encodable so it would be problematic?

from codablecsv.

dehesa avatar dehesa commented on September 13, 2024

Don't despair, the Codable internal behavior takes some time to get used to :D Let me give you several examples:

Compiler Generation Example

Let's suppose you have a CSV file listing students in a school. Each row is a student and the columns are: name, age, hasPet. The simple Encodable definition would be:

struct Student: Encodable {
    let name: String
    let age: Int
    let hasPet: Bool
}

Then you can simple use the CSVEncoder as so:

let students = [
    Student(name: "Heidrun", age: 12, hasPet: true),
    Student(name: "Gudrun", age: 11, hasPet: false)
]

let encoder = CSVEncoder()
encoder.encode(students)

Generated Code

You are letting the compiler generate the encode() function for you. The compiler will write the following:

extension Student {
    private enum CodingKeys: String, CodingKey {
        case name, age, hasPet
    }

    func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: CodingKeys.self)
        try container.encode(self.name, forKey: .name)
        try container.encode(self.age, forKey: .age)
        try container.encode(self.hasPet, forKey: .hasPet)
    }
}

But you are also using the internal encode() function from the Array implementation, which probably looks as follows:

extension Array {
    func encode(to encoder: Encoder) throws {
        var container = encoder.unkeyedContainer()
        for element in self {
            try container.encode(element)   // where an element is a student
        }
    }
}

All Together

So the actions when encoder.encode(students) is executed will look as follows:

  1. Call the internal Array implementation of encode().
  2. The function creates an UnkeyedContainer (from the shadow encoder) which will sequentially go through each element/student.
  3. Call the generated encode() on the Student structure.
  4. Create a KeyedEncodingContainer (from the unkeyedContainer) each time a student is reached.
  5. Encode the students properties.

The rows/students will be sequentially encoded, so we wouldn't have the "randomly-accessible writes" problems there. However, nowhere is defined the order of the row's fields (i.e. the students properties). The CodingKeys generated by the compiler use String values (not Ints). At this moment the program will likely crash. You can overcome this first problem in several ways:

  • The user shall specify the headers in the CSVEncoder.headers settings.
    There you are passing an array of Strings and thus you can match String values with Int values (i.e. the position in the array).
    let encoder = CSVEncoder()
    encoder.headers = ["name", "age", "hasPet"]     // The values must exactly matched the `CodingKeys` values generated by the compiler
  • Make our UnkeyedContainer "clever" by making it figure out the definition order of CodingKeys.
    Some solutions might be explored, such as CaseIterable requirements. I haven't seen whether this is possible.

If the previous problem is solved correctly. The program will most likely matched the CodingKeys with sequential Int values and it will generate the expected encoded CSV.

from codablecsv.

dehesa avatar dehesa commented on September 13, 2024

The previous example was the simple/naive one. Let's suppose the framework user is savvy on the Codable ways. In which case, she can define things more tightly.

Unkeyed Fields Example

Lets have the same Student listing as the previous example, but now lets write our own Encodable conformance.

struct Student: Encodable {
    let name: String
    let age: Int
    let hasPet: Bool

    func encode(to encoder: Encoder) throws {
        var container = encoder.unkeyedContainer()
        try container.encode(self.name)
        try container.encode(self.age)
        try container.encode(self.hasPet)
    }
}

In this example, we achieve the best performance since we are creating a sequential container (UnkeyedContainer) and then we are encoding one field at a time. For this case, we could write directly to CSVWriter.

Keyed Fields Examples (Int Values)

We could have decided to use a KeyedEncodingContainer instead of an UnkeyedContainer.

struct Student: Encodable {
    let name: String
    let age: Int
    let hasPet: Bool

    func encode(to encoder: Encoder) throws {
        var container = encoder.container(keyedBy: CodingKeys.self)
        try container.encode(self.name)
        try container.encode(self.age)
        try container.encode(self.hasPet)
    }

    private enum CodingKeys: Int, CodingKey {
        case name = 2
        case age = 1
        case hasPet = 0
    }
}

Interestingly, here we are defining the order of the fields within a row; and weirdly the definition order is not matching the encoding order. Here we encounter the first "randomly-access write" problem. Albeit, in a low scale, since it only pertain to a single row.

We could solve this problem by caching a whole row and only writing it when a new row is started or the end of the file is reached.

from codablecsv.

dehesa avatar dehesa commented on September 13, 2024

Unordered Rows Example

Lets suppose a similar case (same students listing), but now the row positions have meanings. Lets say:

struct Class: Encodable {
    var heidrun: Student
    var gutrun: Student
    var alexander: Student
    var sigrid: Student

    private enum CodingKeys: Int, CodingKey {
        case alexander = 1
        case gutrun = 2
        case heidrun = 3
        case sigrid = 4
    }
}

The compiler generated code will start encoding on heidrun, although the first row should be alexander. Here lays one of the major problems. For correct implementation, we would need to cache the whole CSV file and only write it on the OutputStream when we are sure of the order.

from codablecsv.

dehesa avatar dehesa commented on September 13, 2024

Hi @lightsprint09,

Just as a heads-up, I have reimplemented the Decodable part of the library making it much simpler, using small value types across the board, and building the buffering infrastructure.

I will probably write the Encodable system in the following two months, but if you ever wanted to challenge yourself and try a Codable implementation; now would be a good time 😄

Cheers

from codablecsv.

dehesa avatar dehesa commented on September 13, 2024

Alright, I am happy to announce that after some hard laboring, I have the first encodable implementation. It still needs polishing and support for other buffering strategy, but it currently works 😄

You can find it in the develop branch if you are still interested. It will take me a bit more time to land it in master, though.

from codablecsv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.