CodableCSV - Read and write CSV files row-by-row or through Swift's Codable interface.

Marcos Sánchez-Dehesa

Last update: Jan 8, 2023

Overview

CodableCSV provides:

Imperative CSV reader/writer.
Declarative CSV encoder/decoder.
Support multiple inputs/outputs: Strings, Data blobs, URLs, and Streams (commonly used for stdin).
Support numerous string encodings and Byte Order Markers (BOM).
Extensive configuration: delimiters, escaping scalar, trim strategy, codable strategies, presampling, etc.
RFC4180 compliant with default configuration and CRLF (\r\n) row delimiter.
Multiplatform support with no dependencies (the Swift Standard Library and Foundation are implicit dependencies).

Usage

To use this library, you need to:

Add CodableCSV to your project.

You can choose to add the library through SPM or Cocoapods:

SPM (Swift Package Manager).

// swift-tools-version:5.1
import PackageDescription

let package = Package(
    /* Your package name, supported platforms, and generated products go here */
    dependencies: [
        .package(url: "https://github.com/dehesa/CodableCSV.git", from: "0.6.6")
    ],
    targets: [
        .target(name: /* Your target name here */, dependencies: ["CodableCSV"])
    ]
)

Cocoapods.
```
pod 'CodableCSV', '~> 0.6.6'
```

Import CodableCSV in the file that needs it.

import CodableCSV

There are two ways to use this library:

imperatively, as a row-by-row and field-by-field reader/writer.
declaratively, through Swift's Codable interface.

Imperative Reader/Writer

The following types provide imperative control on how to read/write CSV data.

CSVReader

A CSVReader parses CSV data from a given input (String, Data, URL, or InputStream) and returns CSV rows as a Strings array. CSVReader can be used at a high-level, in which case it parses an input completely; or at a low-level, in which each row is decoded when requested.

Complete input parsing.

let data: Data = ...
let result = try CSVReader.decode(input: data)

Once the input is completely parsed, you can choose how to access the decoded data:

let headers: [String] = result.headers
// Access the CSV rows (i.e. raw [String] values)
let rows = result.rows
let row = result[0]
// Access the CSV record (i.e. convenience structure over a single row)
let records = result.records
let record = result[record: 0]
// Access the CSV columns through indices or header values.
let columns = result.columns
let column = result[column: 0]
let column = result[column: "Name"]
// Access fields through indices or header values.
let fieldB: String = result[row: 3, column: 2]
let fieldA: String? = result[row: 2, column: "Age"]

Row-by-row parsing.

let reader = try CSVReader(input: string) { $0.headerStrategy = .firstLine }
let rowA = try reader.readRow()

Parse a row at a time, till nil is returned; or exit the scope and the reader will clean up all used memory.

// Let's assume the input is:
let string = "numA,numB,numC\n1,2,3\n4,5,6\n7,8,9"
// The headers property can be accessed at any point after initialization.
let headers: [String] = reader.headers  // ["numA", "numB", "numC"]
// Keep querying rows till `nil` is received.
guard let rowB = try reader.readRow(),  // ["4", "5", "6"]
      let rowC = try reader.readRow()   /* ["7", "8", "9"] */ else { ... }

Alternatively you can use the readRecord() function which also returns the next CSV row, but it wraps the result in a convenience structure. This structure lets you access each field with the header name (as long as the headerStrategy is marked with .firstLine).

let reader = try CSVReader(input: string) { $0.headerStrategy = .firstLine }
let headers = reader.headers      // ["numA", "numB", "numC"]

let recordA = try reader.readRecord()
let rowA = recordA.row         // ["1", "2", "3"]
let fieldA = recordA[0]        // "1"
let fieldB = recordA["numB"]   // "2"

let recordB = try reader.readRecord()

Sequence syntax parsing.
```
let reader = try CSVReader(input: URL(...), configuration: ...)
for row in reader {
    // Do something with the row: [String]
}
```
Please note the Sequence syntax (i.e. IteratorProtocol) doesn't throw errors; therefore if the CSV data is invalid, the previous code will crash. If you don't control the CSV data origin, use readRow() instead.

Reader Configuration

CSVReader accepts the following configuration properties:

encoding (default nil) specify the CSV file encoding.

This String.Encoding value specify how each underlying byte is represented (e.g. .utf8, .utf32littleEndian, etc.). If it is nil, the library will try to figure out the file encoding through the file's Byte Order Marker. If the file doesn't contain a BOM, .utf8 is presumed.
delimiters (default (field: ",", row: "\n")) specify the field and row delimiters.

CSV fields are separated within a row with field delimiters (commonly a "comma"). CSV rows are separated through row delimiters (commonly a "line feed"). You can specify any unicode scalar, String value, or nil for unknown delimiters.
escapingStrategy (default ") specify the Unicode scalar used to escape fields.

CSV fields can be escaped in case they contain privilege characters, such as field/row delimiters. Commonly the escaping character is a double quote (i.e. "), by setting this configuration value you can change it (e.g. a single quote), or disable the escaping functionality.
headerStrategy (default .none) indicates whether the CSV data has a header row or not.

CSV files may contain an optional header row at the very beginning. This configuration value lets you specify whether the file has a header row or not, or whether you want the library to figure it out.
trimStrategy (default empty set) trims the given characters at the beginning and end of each parsed field.

The trim characters are applied for the escaped and unescaped fields. The set cannot include any of the delimiter characters or the escaping scalar. If so, an error will be thrown during initialization.
presample (default false) indicates whether the CSV data should be completely loaded into memory before parsing begins.

Loading all data into memory may provide faster iteration for small to medium size files, since you get rid of the overhead of managing an InputStream.

The configuration values are set during initialization and can be passed to the CSVReader instance through a structure or with a convenience closure syntax:

let reader = CSVReader(input: ...) {
    $0.encoding = .utf8
    $0.delimiters.row = "\r\n"
    $0.headerStrategy = .firstLine
    $0.trimStrategy = .whitespaces
}

CSVWriter

A CSVWriter encodes CSV information into a specified target (i.e. a String, or Data, or a file). It can be used at a high-level, by encoding completely a prepared set of information; or at a low-level, in which case rows or fields can be written individually.

Complete CSV rows encoding.

let input = [
    ["numA", "numB", "name"        ],
    ["1"   , "2"   , "Marcos"      ],
    ["4"   , "5"   , "Marine-Anaïs"]
]
let data   = try CSVWriter.encode(rows: input)
let string = try CSVWriter.encode(rows: input, into: String.self)
try CSVWriter.encode(rows: input, into: URL("~/Desktop/Test.csv")!, append: false)

Row-by-row encoding.

let writer = try CSVWriter(fileURL: URL("~/Desktop/Test.csv")!, append: false)
for row in input {
    try writer.write(row: row)
}
try writer.endEncoding()

Alternatively, you may write directly to a buffer in memory and access its Data representation.

let writer = try CSVWriter { $0.headers = input[0] }
for row in input.dropFirst() {
    try writer.write(row: row)
}
try writer.endEncoding()
let result = try writer.data()

Field-by-field encoding.
```
let writer = try CSVWriter(fileURL: URL("~/Desktop/Test.csv")!, append: false)
try writer.write(row: input[0])

input[1].forEach {
    try writer.write(field: field)
}
try writer.endRow()

try writer.write(fields: input[2])
try writer.endRow()

try writer.endEncoding()
```
CSVWriter has a wealth of low-level imperative APIs, that let you write one field, several fields at a time, end a row, write an empty row, etc.

Please notice that a CSV requires all rows to have the same amount of fields.

CSVWriter enforces this by throwing an error when you try to write more the expected amount of fields, or filling a row with empty fields when you call endRow() but not all fields have been written.

Writer Configuration

CSVWriter accepts the following configuration properties:

delimiters (default (field: ",", row: "\n")) specify the field and row delimiters.

CSV fields are separated within a row with field delimiters (commonly a "comma"). CSV rows are separated through row delimiters (commonly a "line feed"). You can specify any unicode scalar, String value, or nil for unknown delimiters.
escapingStrategy (default .doubleQuote) specify the Unicode scalar used to escape fields.

CSV fields can be escaped in case they contain privilege characters, such as field/row delimiters. Commonly the escaping character is a double quote (i.e. "), by setting this configuration value you can change it (e.g. a single quote), or disable the escaping functionality.
headers (default []) indicates whether the CSV data has a header row or not.

CSV files may contain an optional header row at the very beginning. If this configuration value is empty, no header row is written.
encoding (default nil) specify the CSV file encoding.

This String.Encoding value specify how each underlying byte is represented (e.g. .utf8, .utf32littleEndian, etc.). If it is nil, the library will try to figure out the file encoding through the file's Byte Order Marker. If the file doesn't contain a BOM, .utf8 is presumed.
bomStrategy (default .convention) indicates whether a Byte Order Marker will be included at the beginning of the CSV representation.

The OS convention is that BOMs are never written, except when .utf16, .utf32, or .unicode string encodings are specified. You could however indicate that you always want the BOM written (.always) or that is never written (.never).

The configuration values are set during initialization and can be passed to the CSVWriter instance through a structure or with a convenience closure syntax:

let writer = CSVWriter(fileURL: ...) {
    $0.delimiters.row = "\r\n"
    $0.headers = ["Name", "Age", "Pet"]
    $0.encoding = .utf8
    $0.bomStrategy = .never
}

CSVError

Many of CodableCSV's imperative functions may throw errors due to invalid configuration values, invalid CSV input, file stream failures, etc. All these throwing operations exclusively throw CSVErrors that can be easily caught with do-catch clause.

do {
    let writer = try CSVWriter()
    for row in customData {
        try writer.write(row: row)
    }
} catch let error {
    print(error)
}

CSVError adopts Swift Evolution's SE-112 protocols and CustomDebugStringConvertible. The error's properties provide rich commentary explaining what went wrong and indicate how to fix the problem.

type: The error group category.
failureReason: Explanation of what went wrong.
helpAnchor: Advice on how to solve the problem.
errorUserInfo: Arguments associated with the operation that threw the error.
underlyingError: Optional underlying error, which provoked the operation to fail (most of the time is nil).
localizedDescription: Returns a human readable string with all the information contained in the error.

You can get all the information by simply printing the error or calling the localizedDescription property on a properly casted CSVError or CSVError.

Declarative Decoder/Encoder

The encoders/decoders provided by this library let you use Swift's Codable declarative approach to encode/decode CSV data.

CSVDecoder

CSVDecoder transforms CSV data into a Swift type conforming to Decodable. The decoding process is very simple and it only requires creating a decoding instance and call its decode function passing the Decodable type and the input data.

let decoder = CSVDecoder()
let result = try decoder.decode(CustomType.self, from: data)

CSVDecoder can decode CSVs represented as a Data blob, a String, an actual file in the file system, or an InputStream (e.g. stdin).

let decoder = CSVDecoder { $0.bufferingStrategy = .sequential }
let content = try decoder.decode([Student].self, from: URL("~/Desktop/Student.csv"))

If you are dealing with a big CSV file, it is preferred to used direct file decoding, a .sequential or .unrequested buffering strategy, and set presampling to false; since then memory usage is drastically reduced.

Decoder Configuration

The decoding process can be tweaked by specifying configuration values at initialization time. CSVDecoder accepts the same configuration values as CSVReader plus the following ones:

nilStrategy (default: .empty) indicates how the nil concept (absence of value) is represented on the CSV.
boolStrategy (default: .insensitive) defines how strings are decoded to Bool values.
nonConformingFloatStrategy (default .throw) specifies how to handle non-numbers (e.g. NaN and infinity).
decimalStrategy (default .locale) indicates how strings are decoded to Decimal values.
dateStrategy (default .deferredToDate) specify how strings are decoded to Date values.
dataStrategy (default .base64) indicates how strings are decoded to Data values.
bufferingStrategy (default .keepAll) controls the behavior of KeyedDecodingContainers.

Selecting a buffering strategy affects the decoding performance and the amount of memory used during the decoding process. For more information check the README's Tips using Codable section and the Strategy.DecodingBuffer definition.

The configuration values can be set during CSVDecoder initialization or at any point before the decode function is called.

let decoder = CSVDecoder {
    $0.encoding = .utf8
    $0.delimiters.field = "\t"
    $0.headerStrategy = .firstLine
    $0.bufferingStrategy = .keepAll
    $0.decimalStrategy = .custom({ (decoder) in
        let value = try Float(from: decoder)
        return Decimal(value)
    })
}

CSVDecoder.Lazy

A CSV input can be decoded on demand (i.e. row-by-row) with the decoder's lazy(from:) function.

let decoder = CSVDecoder(configuration: config).lazy(from: fileURL)
let student1 = try decoder.decodeRow(Student.self)
let student2 = try decoder.decodeRow(Student.self)

CSVDecoder.Lazy conforms to Swift's Sequence protocol, letting you use functionality such as map(), allSatisfy(), etc. Please note, CSVDecoder.Lazy cannot be used for repeated access; It consumes the input CSV.

let decoder = CSVDecoder().lazy(from: fileData)
let students = try decoder.map { try $0.decode(Student.self) }

A nice benefit of using the lazy operation, is that it lets you switch how a row is decoded at any point. For example:

let decoder = CSVDecoder().lazy(from: fileString)
// The first 100 rows are students.
let students = (  0..<100).map { _ in try decoder.decode(Student.self) }
// The second 100 rows are teachers.
let teachers = (100..<110).map { _ in try decoder.decode(Teacher.self) }

Since CSVDecoder.Lazy exclusively provides sequential access; setting the buffering strategy to .sequential will reduce the decoder's memory usage.

let decoder = CSVDecoder {
    $0.headerStrategy = .firstLine
    $0.bufferingStrategy = .sequential
}.lazy(from: fileURL)

CSVEncoder

CSVEncoder transforms Swift types conforming to Encodable into CSV data. The encoding process is very simple and it only requires creating an encoding instance and call its encode function passing the Encodable value.

let encoder = CSVEncoder()
let data = try encoder.encode(value, into: Data.self)

The Encoder's encode() function creates a CSV file as a Data blob, a String, or an actual file in the file system.

let encoder = CSVEncoder { $0.headers = ["name", "age", "hasPet"] }
try encoder.encode(value, into: URL("~/Desktop/Students.csv"))

If you are dealing with a big CSV content, it is preferred to use direct file encoding and a .sequential or .assembled buffering strategy, since then memory usage is drastically reduced.

Encoder Configuration

The encoding process can be tweaked by specifying configuration values. CSVEncoder accepts the same configuration values as CSVWriter plus the following ones:

nilStrategy (default: .empty) indicates how the nil concept (absence of value) is represented on the CSV.
boolStrategy (default: .deferredToString) defines how Boolean values are encoded to String values.
nonConformingFloatStrategy (default .throw) specifies how to handle non-numbers (i.e. NaN and infinity).
decimalStrategy (default .locale) indicates how decimal numbers are encoded to String values.
dateStrategy (default .deferredToDate) specify how dates are encoded to String values.
dataStrategy (default .base64) indicates how data blobs are encoded to String values.
bufferingStrategy (default .keepAll) controls the behavior of KeyedEncodingContainers.

Selecting a buffering strategy directly affect the encoding performance and the amount of memory used during the process. For more information check this README's Tips using Codable section and the Strategy.EncodingBuffer definition.

The configuration values can be set during CSVEncoder initialization or at any point before the encode function is called.

let encoder = CSVEncoder {
    $0.headers = ["name", "age", "hasPet"]
    $0.delimiters = (field: ";", row: "\r\n")
    $0.dateStrategy = .iso8601
    $0.bufferingStrategy = .sequential
    $0.floatStrategy = .convert(positiveInfinity: "∞", negativeInfinity: "-∞", nan: "≁")
    $0.dataStrategy = .custom({ (data, encoder) in
        let string = customTransformation(data)
        var container = try encoder.singleValueContainer()
        try container.encode(string)
    })
}

The .headers configuration is required if you are using keyed encoding container.

CSVEncoder.Lazy

A series of codable types (representing CSV rows) can be encoded on demand with the encoder's lazy(into:) function.

let encoder = CSVEncoder().lazy(into: Data.self)
for student in students {
    try encoder.encodeRow(student)
}
let data = try encoder.endEncoding()

Call endEncoding() once there is no more values to be encoded. The function will return the encoded CSV.

let encoder = CSVEncoder().lazy(into: String.self)
students.forEach {
    try encoder.encode($0)
}
let string = try encoder.endEncoding()

A nice benefit of using the lazy operation, is that it lets you switch how a row is encoded at any point. For example:

let encoder = CSVEncoder(configuration: config).lazy(into: fileURL)
students.forEach { try encoder.encode($0) }
teachers.forEach { try encoder.encode($0) }
try encoder.endEncoding()

Since CSVEncoder.Lazy exclusively provides sequential encoding; setting the buffering strategy to .sequential will reduce the encoder's memory usage.

let encoder = CSVEncoder {
    $0.bufferingStrategy = .sequential
}.lazy(into: String.self)

Tips using `Codable`

Codable is fairly easy to use and most Swift standard library types already conform to it. However, sometimes it is tricky to get custom types to comply to Codable for specific functionality.

Basic adoption.

When a custom type conforms to Codable, the type is stating that it has the ability to decode itself from and encode itself to a external representation. Which representation depends on the decoder or encoder chosen. Foundation provides support for JSON and Property Lists and the community provide many other formats, such as: YAML, XML, BSON, and CSV (through this library).

Usually a CSV represents a long list of entities. The following is a simple example representing a list of students.

let string = """
    name,age,hasPet
    John,22,true
    Marine,23,false
    Alta,24,true
    """

A student can be represented as a structure:

struct Student: Codable {
    var name: String
    var age: Int
    var hasPet: Bool
}

To decode the list of students, create a decoder and call decode on it passing the CSV sample.

let decoder = CSVDecoder { $0.headerStrategy = .firstLine }
let students = try decoder.decode([Student].self, from: string)

The inverse process (from Swift to CSV) is very similar (and simple).

let encoder = CSVEncoder { $0.headers = ["name", "age", "hasPet"] }
let newData = try encoder.encode(students)

Specific behavior for CSV data.

When encoding/decoding CSV data, it is important to keep several points in mind:

Codable's automatic synthesis requires CSV files with a headers row.

Codable is able to synthesize init(from:) and encode(to:) for your custom types when all its members/properties conform to Codable. This automatic synthesis create a hidden CodingKeys enumeration containing all your property names.

During decoding, CSVDecoder tries to match the enumeration string values with a field position within a row. For this to work the CSV data must contain a headers row with the property names. If your CSV doesn't contain a headers row, you can specify coding keys with integer values representing the field index.

struct Student: Codable {
    var name: String
    var age: Int
    var hasPet: Bool

    private enum CodingKeys: Int, CodingKey {
        case name = 0
        case age = 1
        case hasPet = 2
    }
}

Using integer coding keys has the added benefit of better encoder/decoder performance. By explicitly indicating the field index, you let the decoder skip the functionality of matching coding keys string values to headers.

A CSV is a long list of rows/records.

CSV formatted data is commonly used with flat hierarchies (e.g. a list of students, a list of car models, etc.). Nested structures, such as the ones found in JSON files, are not supported by default in CSV implementations (e.g. a list of users, where each user has a list of services she uses, and each service has a list of the user's configuration values).

You can support complex structures in CSV, but you would have to flatten the hierarchy in a single model or build a custom encoding/decoding process. This process would make sure there is always a maximum of two keyed/unkeyed containers.

As an example, we can create a nested structure for a school with students who own pets.

struct School: Codable {
    let students: [Student]
}

struct Student: Codable {
    var name: String
    var age: Int
    var pet: Pet
}

struct Pet: Codable {
    var nickname: String
    var gender: Gender

    enum Gender: Codable {
        case male, female
    }
}

By default the previous example wouldn't work. If you want to keep the nested structure, you need to overwrite the custom init(from:) implementation (to support Decodable).

extension School {
    init(from decoder: Decoder) throws {
        var container = try decoder.unkeyedContainer()
        while !container.isAtEnd {
            self.student.append(try container.decode(Student.self))
        }
    }
}

extension Student {
    init(from decoder: Decoder) throws {
        var container = try decoder.container(keyedBy: CustomKeys.self)
        self.name = try container.decode(String.self, forKey: .name)
        self.age = try container.decode(Int.self, forKey: .age)
        self.pet = try decoder.singleValueContainer.decode(Pet.self)
    }
}

extension Pet {
    init(from decoder: Decoder) throws {
        var container = try decoder.container(keyedBy: CustomKeys.self)
        self.nickname = try container.decode(String.self, forKey: .nickname)
        self.gender = try container.decode(Gender.self, forKey: .gender)
    }
}

extension Pet.Gender {
    init(from decoder: Decoder) throws {
        var container = try decoder.singleValueContainer()
        self = try container.decode(Int.self) == 1 ? .male : .female
    }
}

private CustomKeys: Int, CodingKey {
    case name = 0
    case age = 1
    case nickname = 2
    case gender = 3
}

You could have avoided building the initializers overhead by defining a flat structure such as:

struct Student: Codable {
    var name: String
    var age: Int
    var nickname: String
    var gender: Gender

    enum Gender: Int, Codable {
        case male = 1
        case female = 2
    }
}

Encoding/decoding strategies.

SE167 proposal introduced to Foundation JSON and PLIST encoders/decoders. This proposal also featured encoding/decoding strategies as a new way to configure the encoding/decoding process. CodableCSV continues this tradition and mirrors such strategies including some new ones specific to the CSV file format.

To configure the encoding/decoding process, you need to set the configuration values of the CSVEncoder/CSVDecoder before calling the encode()/decode() functions. There are two ways to set configuration values:

At initialization time, passing the Configuration structure to the initializer.

var config = CSVDecoder.Configuration()
config.nilStrategy = .empty
config.decimalStrategy = .locale(.current)
config.dataStrategy = .base64
config.bufferingStrategy = .sequential
config.trimStrategy = .whitespaces
config.encoding = .utf16
config.delimiters.row = "\r\n"

let decoder = CSVDecoder(configuration: config)

Alternatively, there are convenience initializers accepting a closure with a inout Configuration value.

let decoder = CSVDecoder {
    $0.nilStrategy = .empty
    $0.decimalStrategy = .locale(.current)
    // and so on and so forth
}

CSVEncoder and CSVDecoder implement @dynamicMemberLookup exclusively for their configuration values. Therefore you can set configuration values after initialization or after a encoding/decoding process has been performed.
```
let decoder = CSVDecoder()
decoder.bufferingStrategy = .sequential
decoder.decode([Student].self, from: url1)

decoder.bufferingStrategy = .keepAll
decoder.decode([Pets].self, from: url2)
```

The strategies labeled with .custom let you insert behavior into the encoding/decoding process without forcing you to manually conform to init(from:) and encode(to:). When set, they will reference the targeted type for the whole process. For example, if you want to encode a CSV file where empty fields are marked with the word null (for some reason). You could do the following:

let decoder = CSVDecoder()
decoder.nilStrategy = .custom({ (encoder) in
    var container = encoder.singleValueContainer()
    try container.encode("null")
})

Type-safe headers row.

You can generate type-safe name headers using Swift introspection tools (i.e. Mirror) or explicitly defining the CodingKey enum with String raw value conforming to CaseIterable.

struct Student {
    var name: String
    var age: Int
    var hasPet: Bool

    enum CodingKeys: String, CodingKey, CaseIterable {
        case name, age, hasPet
    }
}

Then configure your encoder with explicit headers.

let encoder = CSVEncoder {
    $0.headers = Student.CodingKeys.allCases.map { $0.rawValue }
}

Performance advices.

#warning("TODO:")

Roadmap

The library has been heavily documented and any contribution is welcome. Check the small How to contribute document or take a look at the Github projects for a more in-depth roadmap.

Community

If CodableCSV is not of your liking, the Swift community offers other CSV solutions:

CSV.swift contains an imperative CSV reader/writer and a lazy row decoder and adheres to the RFC4180 standard.
SwiftCSV is a well-tested parse-only library which loads the whole CSV in memory (not intended for large files).
CSwiftV is a parse-only library which loads the CSV in memory and parses it in a single go (no imperative reading).
CSVImporter is an asynchronous parse-only library with support for big CSV files (incremental loading).
SwiftCSVExport reads/writes CSV imperatively with Objective-C support.
swift-csv offers an imperative CSV reader/writer based on Foundation's streams.

There are many good tools outside the Swift community. Since writing them all would be a hard task, I will just point you to the great AwesomeCSV github repo. There are a lot of treasures to be found there.

Comments

Is encoding Double or Float supported?
When I change the floatStrategy as .convert I get an fatal error in this code:

case .throw: throw CSVEncoder.Error._invalidFloatingPoint(value, codingPath: self.codingPath) case .convert(let positiveInfinity, let negativeInfinity, let nan): if value.isNaN { return nan } else if value.isInfinite { return (value < 0) ? negativeInfinity : positiveInfinity } else { fatalError() } }

So with either strategy I either get the thrown error or a fatal error if a valid Double is attempted to be encoded. Is this expected or is there another configuration I'm missing?

Similarly, when I try to decode a double, I get an error thrown, but when I decode it as a string and convert the string to a double in my structs init(from: Decoder) I process the field correctly.
bug
opened by leisurehound 16
Create CSVReader from InputStream
Hey @dehesa, had another idea for a feature.

Specifically, I was looking for some way to create a CSVReader reader from stdin.

Since reading from a URL was already implemented on top of InputStream, it seemed like exposing a public API to accept streams directly could work.

CSVReader(input: InputStream(fileAtPath: "/dev/stdin"))

I also saw there's a FileHandle API could maybe accomplish something similar. But I'm not sure how we'd implement converting a FileHandle -> InputStream

CSVReader(input: FileHandle.standardInput)

I'm just noting this as a possible alternative direction. I'm not too familiar with the differences between the APIs. Please share any historical knowledge if you have any 😄

This PR is a draft attempt at the InputStream approach. Will add tests and clean it up if you think it's a good idea.

Thanks! @josh
enhancement
opened by josh 15
Support for iOS 10

I needed to support iOS 10 in my app but the library only supports iOS >= 12. So, I had to replace the library.

But I was curious to know why the library requires iOS 12. So I added the source code directly to a test project that targets iOS 10, and it builds successfully!

It looks like the library actually supports iOS 10 as it is now and no need for extra work. I suggest changing the requirements for the library to the minimum version of every OS to allow it to be used in a wider range of projects.
enhancement

opened by mhdhejazi 13

Decoding CSV file with CRLF line endings fail with error if the last column is quoted

Describe the bug

A clear and concise description of what the bug is.

Decoding a CSV file with CRLF line endings fails with an error, if the last field in a row is quoted.

The error:

Invalid input
	Reason: The last field is escaped (through quotes) and an EOF (End of File) was encountered before the field was properly closed (with a final quote character).
	Help: End the targeted field with a quote.

To Reproduce

Steps to reproduce the behavior:

Using a CSV file with CRLF line endings (url), decode with this:

    let decoder = try CSVDecoder(configuration: {
        $0.encoding = .utf8
        $0.bufferingStrategy = .sequential
        $0.headerStrategy = .firstLine
        $0.trimStrategy = .whitespaces
        $0.delimiters.row = "\r\n" // or "\n", also fails
    }).lazy(from: url)

Expected behavior

No error

System

CodableCSV: 0.6.6

Additional context

This was introduced in v0.6.6

bug

opened by xsleonard 9

Decodable sequence

Hey @dehesa!

I think I was too slow, it looks like you already implemented the sequential buffering strategy for 0.5.2. I was taking some time to learn about the Decoder protocol internals.

What I learned is that it's possible to decode an UnkeyedDecodingContainer into any sequence without buffering. ShadowDecoder.UnkeyedContainer seems to do a good job of iteratively decoding each item.

The README demos decoding into a preallocated array.

let decoder = CSVDecoder { $0.bufferingStrategy = .sequential }
let content: [Student] = try decoder([Student].self, from: URL("~/Desktop/Student.csv"))

Instead of an Array, I created a custom sequence wrapper. With the added benefit of customizing how the result is wrapped. I had my 🤞 that AnySequence was Decodable, but it's not.

class DecodableSequence<T: Decodable>: Sequence, IteratorProtocol, Decodable {
    private var container: UnkeyedDecodingContainer

    required init(from decoder: Decoder) throws {
        container = try decoder.unkeyedContainer()
    }

    func next() -> Result<T, Error>? {
        if container.isAtEnd {
            return nil
        }
        // or could use a try! here
        return Result { try container.decode(T.self) }
    }
}

Then:

let decoder = CSVDecoder { $0.bufferingStrategy = .sequential }
let url = URL(fileURLWithPath: "Student.csv")
let results = try decoder.decode(DecodableSequence<Student>.self, from: url)

for result in results {
    print(try result.get())
}

Any thoughts on this technique or Alternatives? Would a sequence wrapper like this be useful to include as part of the library?

Thanks! @josh

enhancement question

opened by josh 9

Extra comma in header or data line causes failure to parse subsequent lines
Describe the bug

Having an extra comma in a data line (which is usually caused by the CSV creator failing to quote a field) causes that line and all subsequent lines to fail to parse. Having an extra comma at the end of the header line causes all subsequent data lines to fail to parse.

To Reproduce

Please see the attached test file (it is really a .swift file, but I changed the extension to .txt in order to attach it). DecodingBadInputTests.txt

Expected behavior

Both of these situations (additional commas in either header or data line) are forbidden by rfc4180, so I would expect an exception to be raised.

System

OS: macOS 11.2.3

CodableCSV: 0.6.2

Additional context

I encountered both of these instances of ill-formed CSV in files I downloaded from my banks. I'm using CodableCSV in a Swift app I've written to take the differently formatted CSV from each bank and create a standard format which I then import into a spreadsheet for further analysis.
bug
opened by DressTheMonkey 8
Make CSVReader IteratorProtocol safe
Hey @dehesa,

This one is more of a direction for an idea rather than a concrete change.

I do really like the Sequence reading API. I'm working with a very large CSV file (in the gigabytes) can't parse it all at once. But the current design has the documented downside of crashing on bad data.

Maybe the iterator can just return a Result capturing the row or error instead? I think this could be a good start, but it would be a backwards incompatible change. I do like that it removes a try! from a public facing API.

This change only allows for iterating over parsed rows [String] but iterating over parsed records Record could also be nice. Though, I suppose what I really want is a Sequence for the decoder.

Another direction would be to deprecate CSVReader being an IteratorProtocol itself and offer subtypes for multiple ways of iteration. And there was can choose to return a Result instead.

reader.makeRowIterator() // RowIterator reader.makeRecordIterator() // RecordIterator reader.makeDecoderIterator(CustomType.self) // DecodingIterator // or reader.rows // Sequence (RowIterator) reader.records // Sequence (RecordIterator) reader.decoding(CustomType.self) // Sequence (DecodingIterator)

Thanks for reading! Josh
opened by josh 8
CSVEncoder.lazy should support an appending strategy

Is your feature request related to a problem?

Sorry if this is available already, but couldn't find it in sources except in CSVWriter, but not CSVEncoder or in README.

Describe the solution you'd like

Simply, for a live time series serialization would want to append new data to the URL instead of overwrite it with a bufferingStrategy of sequential (but can also see where users wouldn't want it to append, so should be a separate strategy).
enhancement

opened by leisurehound 7
Configure escaping scalar

Hi @dehesa, thanks for publishing this library and keeping it super well documented. The code looks great.

I'm currently trying to parse a foreign TSV file that includes unescaped " characters in it's fields. I did see escapingScalar was exposed as an internal setting but wasn't publicly configurable. For my use case, disabling field escaping worked perfect. And it would be great to expose this ability on the reader configuration.

I added a new enum for "escaping strategy". My use case fits into the new .none case. And the default case is .doubleQuote. Though, I was wondering if we should definite the options narrowly and just provide the two, or allow for any character to be used as a escaping pair. Maybe there's a single quoted escaped CSV out there? Would love to hear your thoughts.

Thanks! @josh
enhancement

opened by josh 6
Add line offset headers
Useful for skipping rows in CSVs that have table titles

Description

This PR adds a header strategy to parse headers from the specified row number, ignoring any previous row (and fixes a few typos). The change is additive, leaving the existing .firstLine option in place. I've been using the the header strategy in of of my own apps without any issues.

The tests have been updated to include the new strategy by way of a table title, and all pass.

Checklist

The following list must only be fulfilled by code-changing PRs. If you are making changes on the documentation, ignore these.

[x] Include in-code documentation at the top of the property/function/structure/class (if necessary).

[x] Merge to develop.

[x] Add to existing tests or create new tests (if necessary).
opened by emorydunn 5
Cocoapods installation is not possible
Describe the bug

A clear and concise description of what the bug is.

The pod name "CodableCSV" is occupied by a different project with the same name as this one: https://github.com/pauljohanneskraft/CodableCSV.

Following the pod install instructions for this repo will fail.

The error message from pod is:

[!] CocoaPods could not find compatible versions for pod "CodableCSV": In Podfile: CodableCSV (~> 0.6.1) None of your spec sources contain a spec satisfying the dependency: `CodableCSV (~> 0.6.1)`.

Performing a pod search CodableCSV:

-> CodableCSV (0.4.0) CodableCSV allows you to encode and decode CSV files using Codable model types. pod 'CodableCSV', '~> 0.4.0' - Homepage: https://github.com/pauljohanneskraft/CodableCSV - Source: https://github.com/pauljohanneskraft/CodableCSV.git - Versions: 0.4.0, 0.2.0, 0.1.1 [trunk repo]

To Reproduce

Steps to reproduce the behavior:

Add pod "CodableCSV", "~> 0.6.1" to Podfile, as per the readme. Perform: pod install

Expected behavior

This package will be installed by pod.
bug
opened by xsleonard 5
Define headers, but suppress header output?

Question

In a large scale streaming situation, the csv is being used to 'chunk' rows. I'd like to be able to pass in headers, but not send them to the CSV (since the header is already out there).

Is this possible? I can't use CodingKeys - because they are already being used as 'string' for a JSON decoder.

I've been trying to find a form of 'Lazy' where I could 'flushEncoding()' which reset the rows and left a usable lazy encoder.

Keeping a root encoder and making a new lazy() as needed also works great, except lacking the ability to suppress the header after the 1st lazy instance. (any way I've tried removing it 'after' the fact breaks CodingKeys lookup - as expected)
question

opened by joerohde 0
Fix Swift 5.7 warnings.

The following warning is emitted for a number of function declarations when compiling with Swift 5.7 (Xcode 14 Beta): "Non-'@objc' instance method in extensions cannot be overridden; use 'public' instead"

Changing the access levels of the functions from open to public resolves the warnings.

opened by robo-fish 1
Jonlidgard
Description

I have been trying to parse some bank statements using your code. Unfortunately the statement fields are not quoted text & the last field contains payee descriptions that sometimes include the comma delimiter. This was causing the parse to fail. I have added an option 'lastFieldDelimiterStrategy' to ignore delimiters in the final field of each line so that these files can be successfully parsed.

Checklist

The following list must only be fulfilled by code-changing PRs. If you are making changes on the documentation, ignore these.

[ ] Include in-code documentation at the top of the property/function/structure/class (if necessary).

[ ] Merge to develop.

[ ] Add to existing tests or create new tests (if necessary).
opened by jonlidgard 0
question about escaping
Question

With the default config, how can I escape commas and line returns within a field in order to ensure the resulting CSV is readable?

Additional Context

I'm hacking together an app with Swift and Xcode and I'm a complete novice. To provide the app with some basic data I have provided it with csv files, which are parsed with CodableCSV. Many thanks for the package!

Using basic data it's working fine. I have tried not to fiddle with the configuration. Delimiters are commas and end of line is "\r".

However, for one of my tables I need now to expand one of the fields to include sentences or even paragraphs of text, which contain commas and newlines. Initially I understood from the documentation that the way to do this is to enclose the whole field in double quotes ("..."). That crashed the app and so did escaping the individual offending characters with double quotes (",) or with a backslash (,).

Many thanks for any pointers!

Example extract of table:

id,title,introduction reg,Regular Models,"The good news for learners of Spanish..." irr_i,Essentials I, irr_ii,Essentials II,

Error message:

CodableCSV/Reader.swift:75:` Fatal error: 'try!' expression unexpectedly raised an error: [CSVReader] Invalid input Reason: The targeted field parsed successfully. However, the character right after it was not a field nor row delimiter. Help: If your CSV is CRLF, change the row delimiter to "\r\n" or add a trim strategy for "\r". User info: Row index: 1, Field: The good news for learners of `Spanish...

System

OS: macOS 12.3

CodableCSV: 0.6.7

Xcode 13

question
opened by ghost 0
Last column decodes as blank

Question Hi all. Not sure if this is pilot error or if its a bug but it appears that the last column in our CSV consistently decodes to blank. We've got a correct header line and I'm using a .firstLine strategy. Have also confirmed that my data model has the same number of columns as vars. The only solution to fix this appears to be using a dummy column at the end.

System OS: macOS 12.3.1, Xcode 12.3 CodableCSV 0.6.7
question

opened by joshdistler 0
Added new writer configuration option 'quoteAll'
Introduces a new encoder option.

Description

Example:

let encoder = CSVEncoder { // ... $0.quoteAll = true }

The effect is that each field is quoted in the output regardless of whether it needs to be escaped. This may be useful for round-trip testing where the input file contains redundant quotation marks.

Checklist

The following list must only be fulfilled by code-changing PRs. If you are making changes on the documentation, ignore these.

[x] Include in-code documentation at the top of the property/function/structure/class (if necessary).

[x] Merge to develop.

[x] Add to existing tests or create new tests (if necessary).
opened by JamesWidman 0

Releases(0.6.7)

0.6.7(Aug 27, 2021)
Xcode 13 has been supported by raising the iOS requirements to iOS 11. The library still supports iOS 7+, but the SPM now defines iOS 11+. If you want to support older OSes, modify the SPM manifest.

Support for more than one row delimiter (e.g. for the standard \r\n and \n) at the same time. CSVReader will end a row when one of the row delimiters is encountered. Use just one row delimiter for better performance.

Source code(tar.gz)
Source code(zip)
0.6.6(Mar 14, 2021)
CSVReader and CSVDecoder are now "stricter" while parsing CSV data. Now errors are thrown when encountering ill-formed rows, instead of just finishing the parsing process (#34).

CSVReader and CSVDecoder ignore empty lines. Usually encountered at the end of a file.

Minor enhancements on documentation.

Source code(tar.gz)
Source code(zip)
0.6.4(Nov 22, 2020)
CSVWriter encoder is faster for UTF8 encodings.

Shift-JIS encoding is now supported (#29).

CSVEncoder and CSVDecoder have experienced major speed ups, due to drops on unnecessary retains/releases.

Source code(tar.gz)
Source code(zip)
0.6.1(May 31, 2020)
The first batch of optimizations have landed.

Fix for floating-point encoding/decoding (#20)

Change decodeNil() behavior to more closely follow the documentation.

decodeIfPresent can now be safely used.

Source code(tar.gz)
Source code(zip)
0.6.0(May 11, 2020)
Both CSVEncoder and CSVDecoder now support lazy functionality.

The lazy API has been renamed to be as similar as possible in the encoder/decoder.

README has been expanded with more examples.

Source code(tar.gz)
Source code(zip)
0.5.5(Apr 28, 2020)
CSVEncoder/CSVDecoder adopts TopLevelEncoder/TopLevelDecoder when Combine is present.

CSVReader and CSVDecoder now also accept InputStreams. The introduction of this feature allows easier usage of CodableCSV by Command-line applications reading information from the .standardInput (i.e. stdin).

Most errors thrown by CodableCSV functions are now CSVErrors. All CSVErrors include the failure reason and provide help cues to avoid said problem.

Documentation has been expanded to cover all public and internal functionality.

Source code(tar.gz)
Source code(zip)
0.5.4(Apr 10, 2020)
CSVDecoder now supports on demand (lazy) decoding.

New encoding/decoding configuration strategies have been added.

Source code(tar.gz)
Source code(zip)
0.5.2(Mar 31, 2020)
A full-fledge CSVEncoder has finally been implemented. Full support for Codable has been achieved. You can now use keyed, unkeyed and single value containers when neeeded.

CSVEncoder and CSVDecoder support for different buffering strategies. This translates in less memory usage for sequential or assembled runs.

CSVReader/CSVWriter API have been renamed to match CSVEncoder/CSVDecoder API.

Source code(tar.gz)
Source code(zip)
0.5.1(Mar 26, 2020)
Custom escaping scalars are supported thanks to @josh (#13). This includes the ability to disable escaping functionality on parsing or serializing CSVs.

Linux is officially supported. All tests now also run on Linux (Ubuntu 18.04) through Github actions.

Trim strategy now throws an error at initialization when it contains delimiter characters or escaping scalars.

The repo now provides not only a high-level roadmap, but also a detailed plan on which features are being worked next.

Source code(tar.gz)
Source code(zip)
0.5.0(Mar 23, 2020)
Expand input/output support to Data, String, and files (through URLs).

Reimplemented CSVReader and CSVWriter for greater performance.

Introduction of CSVError adopting SE-112 protocols for easier debugging.

Make Decoder fully immutable.

Expand tests on CSVReader, CSVWriter, and CSVDecoder.

OS requirements reduced to macOS 10.10, iOS 8, tvOS 9, watchOS 2.

First trials on Linux.

Fixed bug on trim character strategy.

Source code(tar.gz)
Source code(zip)
0.4.0(Mar 10, 2020)
The Decoder implementation has been completely reworked to use small-sized value types and less protocols. All in all, the implementation complexity has been greatly reduced.

The Decoder initializer exposes the most used configuration parameters.

The Decoder now allows random access through keyed containers. A new configuration parameter has been added (bufferingStrategy), letting the user control the amount of memory used by the decoder. Currently only .keepAll is implemented. But later on, it will allow great performance usage for CSV decoded through file handles.

Source code(tar.gz)
Source code(zip)
0.3.0(Oct 12, 2019)

Swift Package Manager is supported for all Apple platforms (not Linux yet).
Source code(tar.gz)
Source code(zip)