Comments (16)
Would it be fine to get a cdv with random order when not setting the headers?
Yes and no. In CSV files, the column order implies meaning, therefore the user must know exactly the order of the generated outcome. It cannot be "execution magic". Also, the order must be consistent between several Student
encoding.
That is however not the major problem, I believe we can find an "easy solution" for that.
from codablecsv.
Hey @dehesa,
sounds awesome. I will have a look
from codablecsv.
Hi @lightsprint09 sorry for the delay answering (somehow I am not receiving notifications on this project).
Any help is welcome, so if you want to toil at it, I would be happy to review or help. Most of the architecture for Encodable
is there and it should be fairly similar to Decodable
(which is currently working). To be honest I don't remember what it is missing (since it has been some time ago). I will try to look at it this weekend and I can give you a more thorough answer.
The problem is that adopting and developing for Codable
is not straight forward since there is a lot of intermediate structures. It should be a fairly entertaining educational experience, though :D
from codablecsv.
Sounds good. Would you mind if I set up Travis CI?
from codablecsv.
That sounds interesting. I have never done it myself in an OpenSource project from Github. As far as I understand it is a CI system to run your test on every commit. But what will it actually entail? Do we have to keep a specific Git structure, who has access to its configuration?, etc.
from codablecsv.
@lightsprint09 I managed to take a look at the state of the code this weekend and the major problem with the Encodable
adoption is the "random access" encoding that the user might decide to do while using encode(to:)
.
In simple words:
- The
Codable
support is based in the underlyingCSVReader
/CSVWriter
. - The
CSVReader
/CSVWriter
are sequential and are based on streams (currently only supporting in-memory), but with only additive modifications they can access files directly or actually any other streams (including on-demand encoding/decoding for network streams). - The current problem with encoding is that the user might have written a CSV row that is further ahead (lets say index
10
) and then decide to write in row at index6
(for example).
Based on the sequential stream nature ofCSVWriter
that is not allowed.
Thus, there are several options we can pursue:
- The easiest/ugliest: To throw an encoding error as soon as the user tries to do that and implement the most straight forward
CSVWriter
operation. - Implement an intermediate in-memory cache where all records are being stored and only write them on the stream once the encoding has completed.
- Don't use
CSVWriter
and implement something else instead not using Foundation'sOutputStream
(which will hinder future usage for file or network writing).
from codablecsv.
How is it possible to Writer further ahead? Using the same stream
from codablecsv.
Well, it is not yet implemented, but I suppose that if the user decide to write in row 10 and they are currently in row 5; the CSVWriter
would write 4 row delimiters and then start writing whatever the user inputs.
Writing empty rows is actually quite common in CSV files.
from codablecsv.
let stream = OutputStream(url: ...)
let encoder = CSVEncoder(stream: stream)
let codable = ["A", "B", "C"]
try encoder.encode(codable)
How would I define where top write. I would expect to write at the end of there stream?
from codablecsv.
OutputStream
is just a dumb pipeline that only "transmits" the bytes you are sending. Therefore, you can only write sequentially through its write(_ buffer: UnsafePointer<UInt8>, maxLength len: Int)
function.
Encodable
, however, offers a way to randomly-access where to write through their KeyedEncodingContainer
(it is similar to a dictionary where you pass a CodingKey
specifying where to write and the value you are encoding).
As you can see, there is an incompatibility from the sequential expectations of OutputStream
and the freedom given to the user with KeyedEncodingContainer
. The solution must decide how to operate:
- Option 1 is to restrict the freedom to randomly-write by throwing an error when you do so.
Users will probably find that a bit too "crash". - Option 2 is to implement an intermediate buffer and only when completed write sequentially from the beginning to
OutputStream
. - Option 3 won't use
OutputStream
.
from codablecsv.
Ok. I see.
Seems like I don't understand the Codable implementation correctly.
Can you show me an example how one would implement Encodable
so it would be problematic?
from codablecsv.
Don't despair, the Codable
internal behavior takes some time to get used to :D Let me give you several examples:
Compiler Generation Example
Let's suppose you have a CSV file listing students in a school. Each row is a student and the columns are: name
, age
, hasPet
. The simple Encodable
definition would be:
struct Student: Encodable {
let name: String
let age: Int
let hasPet: Bool
}
Then you can simple use the CSVEncoder
as so:
let students = [
Student(name: "Heidrun", age: 12, hasPet: true),
Student(name: "Gudrun", age: 11, hasPet: false)
]
let encoder = CSVEncoder()
encoder.encode(students)
Generated Code
You are letting the compiler generate the encode()
function for you. The compiler will write the following:
extension Student {
private enum CodingKeys: String, CodingKey {
case name, age, hasPet
}
func encode(to encoder: Encoder) throws {
var container = encoder.container(keyedBy: CodingKeys.self)
try container.encode(self.name, forKey: .name)
try container.encode(self.age, forKey: .age)
try container.encode(self.hasPet, forKey: .hasPet)
}
}
But you are also using the internal encode()
function from the Array
implementation, which probably looks as follows:
extension Array {
func encode(to encoder: Encoder) throws {
var container = encoder.unkeyedContainer()
for element in self {
try container.encode(element) // where an element is a student
}
}
}
All Together
So the actions when encoder.encode(students)
is executed will look as follows:
- Call the internal
Array
implementation ofencode()
. - The function creates an
UnkeyedContainer
(from the shadow encoder) which will sequentially go through each element/student. - Call the generated
encode()
on theStudent
structure. - Create a
KeyedEncodingContainer
(from theunkeyedContainer
) each time a student is reached. - Encode the students properties.
The rows/students will be sequentially encoded, so we wouldn't have the "randomly-accessible writes" problems there. However, nowhere is defined the order of the row's fields (i.e. the students properties). The CodingKeys
generated by the compiler use String
values (not Int
s). At this moment the program will likely crash. You can overcome this first problem in several ways:
- The user shall specify the headers in the
CSVEncoder.headers
settings.
There you are passing an array ofString
s and thus you can matchString
values withInt
values (i.e. the position in the array).let encoder = CSVEncoder() encoder.headers = ["name", "age", "hasPet"] // The values must exactly matched the `CodingKeys` values generated by the compiler
- Make our
UnkeyedContainer
"clever" by making it figure out the definition order ofCodingKeys
.
Some solutions might be explored, such asCaseIterable
requirements. I haven't seen whether this is possible.
If the previous problem is solved correctly. The program will most likely matched the CodingKeys
with sequential Int
values and it will generate the expected encoded CSV.
from codablecsv.
The previous example was the simple/naive one. Let's suppose the framework user is savvy on the Codable
ways. In which case, she can define things more tightly.
Unkeyed Fields Example
Lets have the same Student
listing as the previous example, but now lets write our own Encodable
conformance.
struct Student: Encodable {
let name: String
let age: Int
let hasPet: Bool
func encode(to encoder: Encoder) throws {
var container = encoder.unkeyedContainer()
try container.encode(self.name)
try container.encode(self.age)
try container.encode(self.hasPet)
}
}
In this example, we achieve the best performance since we are creating a sequential container (UnkeyedContainer
) and then we are encoding one field at a time. For this case, we could write directly to CSVWriter
.
Keyed Fields Examples (Int Values)
We could have decided to use a KeyedEncodingContainer
instead of an UnkeyedContainer
.
struct Student: Encodable {
let name: String
let age: Int
let hasPet: Bool
func encode(to encoder: Encoder) throws {
var container = encoder.container(keyedBy: CodingKeys.self)
try container.encode(self.name)
try container.encode(self.age)
try container.encode(self.hasPet)
}
private enum CodingKeys: Int, CodingKey {
case name = 2
case age = 1
case hasPet = 0
}
}
Interestingly, here we are defining the order of the fields within a row; and weirdly the definition order is not matching the encoding order. Here we encounter the first "randomly-access write" problem. Albeit, in a low scale, since it only pertain to a single row.
We could solve this problem by caching a whole row and only writing it when a new row is started or the end of the file is reached.
from codablecsv.
Unordered Rows Example
Lets suppose a similar case (same students listing), but now the row positions have meanings. Lets say:
struct Class: Encodable {
var heidrun: Student
var gutrun: Student
var alexander: Student
var sigrid: Student
private enum CodingKeys: Int, CodingKey {
case alexander = 1
case gutrun = 2
case heidrun = 3
case sigrid = 4
}
}
The compiler generated code will start encoding on heidrun
, although the first row should be alexander
. Here lays one of the major problems. For correct implementation, we would need to cache the whole CSV file and only write it on the OutputStream
when we are sure of the order.
from codablecsv.
Hi @lightsprint09,
Just as a heads-up, I have reimplemented the Decodable
part of the library making it much simpler, using small value types across the board, and building the buffering infrastructure.
I will probably write the Encodable
system in the following two months, but if you ever wanted to challenge yourself and try a Codable
implementation; now would be a good time 😄
Cheers
from codablecsv.
Alright, I am happy to announce that after some hard laboring, I have the first encodable implementation. It still needs polishing and support for other buffering strategy, but it currently works 😄
You can find it in the develop
branch if you are still interested. It will take me a bit more time to land it in master, though.
from codablecsv.
Related Issues (20)
- Make it conform to TopLevelDecoder/Encoder to use with Combine HOT 2
- Encoding a single object produces a confusing error message. HOT 2
- Lazy decoding a CSV with CRLF line endings fails without an error HOT 4
- Extra comma in header or data line causes failure to parse subsequent lines HOT 8
- Decoding CSV file with CRLF line endings fail with error if the last column is quoted HOT 9
- trimStrategy does not trim characters inside of a quoted field HOT 2
- Value of type 'CSVReader' has no member 'columns'
- doesn't handle strings with "," HOT 1
- Option to escape Excel-unfriendly strings HOT 5
- How can i install on linux? HOT 1
- Add support for field delimiter detection HOT 1
- Give CSVWriter.Configuration its own delimiter type which has no inference API
- Skip Column in Encoder (and Decoder)
- Last column decodes as blank
- Last column decodes as blank HOT 1
- question about escaping
- Define headers, but suppress header output?
- Customise cell parsing for declarative decoder
- in iOS17 can't read file from CloudDocs
- Build error in Xcode v16.0 beta 5 HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from codablecsv.