Learn about how SoulverCore can give Swift "better than regex" data parsing features (for many common tasks)

Overview

String Parsing with Soulver Core

A declarative & type-safe approach to parsing data from strings

SoulverCore gives you human-friendly, type-safe & performant data parsing from Swift strings.

Specify types you want to parse from a string. If they are present, you get back ready-to-use data primitives (not strings!).

This approach to data parsing allows you to ignore:

  1. The specifics of how the data you need is formatted in text
  2. Random words (or other data points), surrounding the data you need

Examples

Let's look at a few examples:

let (testCount, failureCount, timeTaken) = "Executed 4 tests, with 1 failure in 0.009 seconds".find(.number, .number, .time)!

testCount // 4
failureCount // 1
timeTaken // 0.009 seconds

let (date, temperature, humidity) = "On August 23, 2022 the temperature in Chicago was 68.3 ºF (with a humidity of 74%)".find(.date, .temperature, .percentage)!

date // August 23, 2022
temperature // 68.3 ºF
humidity // 74%

let (earnings, fileSize, url) = "Total Earnings From PDF: $12.2k (3.25 MB, at https://lifeadvice.co.uk/pdfs/download?id=guide)".find(.currency, .fileSize, .url)!

earnings // 12,200 USD
fileSize // 3.25 MB
url // https://lifeadvice.co.uk/pdfs/download?id=guide

Note: the returned data points are not strings. They are native Swift data types (available as elements on a tuple), on which you can immediately perform operations:

let numbers = "100 + 20".find(.number, .number)!
let sum = numbers.0 + numbers.1 // 120

Up to 6 data points can be requested in a single call. Variadic generics are planned for Swift 6, so we'll support more in the future.

The beauty of high order data extraction

Observe the beauty of the higher order concepts used here: numbers come in many formats (1,000, 30k, .456), yet a simple ".number" query "matches" them all. And .date "matches" dates in commonly used date formats.

For cases where the locale plays a role in the format of data, you may specify a locale in the find method (otherwise the current system Locale is used):

let europeanNumber = "€1.333,24".find(.currency, locale: Locale(identifier: "en_DE"))
let americanDate = "05/30/21".find(.date, locale: Locale(identifier: "en_US")) // month/day/year

Where possible, standard Swift primitives are returned (URL, Date, Decimal, etc). In cases where no Swift primitive wholly captures the data present in the string, a SoulverCore value type is returned with properties containing the relevant data.

Supported data types

Symbol Match Examples Return Type
.number 123.45, 10k, -.3, 3,000, 50_000 Decimal
.binaryNumber 0b1011010 UInt
.hexNumber 0x31FE28 UInt
.boolean 'true' or 'false' Bool
.percentage 10%, 230.99% Decimal
.date March 12, 2004, 21/04/77, July the 4th, etc Date
.unixTimestamp 1661259854 TimeInterval
.place Paris, Tokyo, Bali, Israel SoulverCore.Place
.airport SFO, LAX, SYD SoulverCore.Place
.timezone AEST, GMT, EST SoulverCore.Place
.currencyCode USD, EUR, DOGE String
.currency $10.00, AU$30k, 350 JPY SoulverCore.UnitExpression
.time 10 s, 3 min, 4 weeks SoulverCore.UnitExpression
.distance 10 km, 3 miles, 4 cm SoulverCore.UnitExpression
.temperature 25 °C, 77 °F, 10C, 5 F SoulverCore.UnitExpression
.weight 10kg, 45 lb SoulverCore.UnitExpression
.area 30 m2, 40 in2 SoulverCore.UnitExpression
.speed 30 mph SoulverCore.UnitExpression
.volume 3 litres, 4 cups, 10 fl oz SoulverCore.UnitExpression
.timespan 3 hours 12 minutes SoulverCore.Timespan
.laptime 01:30:22.490 (hh:mm:ss.ms) SoulverCore.Laptime
.timecode 03:10:21:16 (hh:mm:ss:frames) SoulverCore.Frametime
.pitch A4, Bb7, C#9 SoulverCore.Pitch
.url https://soulver.app URL
.emailAddress [email protected] String
.hashTag #this_is_a_tag String
.whitespace All whitespace characters (including tabs) are collapsed into a single whitespace token String

Getting started

  • The SoulverCore framework includes a highly optimized string parser, which can produce an array of tokens representing data types in a given string. This is exactly what we need.
  • Add the SoulverCore binary framework to your project. The package is located at https://github.com/soulverteam/SoulverCore (In Xcode, go File > Add Packages…)
  • Be sure to "import SoulverCore" at the top of any Swift files in which you wish to process strings

Finding data in strings

As we saw above, finding a data point in a string is as simple as asking for it:

let percent = "Results of likeness test: 83% match".find(.percentage)
// percent is the decimal 0.83

Extracting multiple data points is no harder. A tuple is returned with the correct number of arguments and data types:

let payrollEntry = "CREDIT			03/02/2022			Payroll from employer				$200.23" // this string has inconsistent whitespace between entities, but this isn't a problem for us
let (date, currency) = payrollEntry.find(.date, .currency)!
date // Either February 3, or March 2, depending on your system locale
currency // UnitExpression object (use .value to get the decimalValue, and .unit.identifier to get the currency code - USD)

Extracting a data point from an array of strings

We can also call find with a single data type on an array of strings, and get back an array of the corresponding data type of the match:

let amounts = ["Zac spent $50", "Molly spent US$81.9 (with her 10% discount)", "Jude spent $43.90 USD"].find(.currency)

let totalAmount = amounts.reduce(0.0) {
    $0 + $1.value
}

// totalAmount is $175.80

Transforming data in strings

Imagine we wanted to standardize the whitespace in the string from the previous example:

let standardized = "CREDIT			03/02/2022			Payroll from employer				$200.23".replacingAll(.whitespace) { whitespace in
    return " "
}

// standardized is "CREDIT 03/02/2022 Payroll from employer $200.23"

Or perhaps you want to convert European formatted numbers into Swift "standard" ones:

let standardized = "10.330,99 8.330,22 330,99".replacingAll(.number, locale: Locale(identifier: "en_DE")) { number in
    return NumberFormatter.localizedString(from: number as NSNumber, number: .decimal)
}

// standardized is "10,330.99 8,330.22 330.99")

Or perhaps you want to convert Celsius temperatures into Fahrenheit:

let convertedTemperatures = ["25 °C", "12.5 degrees celsius", "-22.6 C"].replacingAll(.temperature) { celsius in
    
    let measurementC: Measurement<UnitTemperature> = Measurement(value: celsius.value.doubleValue, unit: .celsius)
    let measurementF = measurementC.converted(to: .fahrenheit)
    
    let formatter = MeasurementFormatter()
    formatter.unitOptions = .providedUnit
    return formatter.string(from: measurementF)
    
}

// convertedTemperatures is ["77°F", "54.5°F", "-8.68°F"]

Extending SoulverCore with your own custom types

Let's imagine we had strings with the following format, describing some containers:

  • "Color: blue, size: medium, volume: 12.5 cm3"
  • "Color: red, size: small, volume: 6.2 cm3"
  • "Color: yellow, size: large, volume: 17.82 cm3"

We want to extract this data into a custom Swift type that represents a Container.

  1. Define our model classes (if they don't exist already)
enum Color: String, RawRepresentable {
	case blue
	case red
	case yellow
}

enum Size: String, RawRepresentable {
	case small
	case medium
	case large
}

struct Container {
   let color: Color
   let size: Size
   let volume: Decimal

   init(_ data: (Color, Size, UnitExpression)) {
        self.color = data.0
        self.size = data.1
        self.volume = data.2.value
    }
}
  1. Then create parsers for Color and Size, and add them static variables on DataPoint
struct ColorParser: DataFromTokenParser {
    typealias DataType = Color
    
    func parseDataFrom(token: SoulverCore.Token) -> Color? {
        return Color(rawValue: token.stringValue.lowercased())
    }
}

struct SizeParser: DataFromTokenParser {
    typealias DataType = Size

    func parseDataFrom(token: SoulverCore.Token) -> Size? {
        return Size(rawValue: token.stringValue.lowercased())
    }
}

extension DataPoint {
    static var color: DataPoint<ColorParser> {
        return DataPoint<ColorParser>(parser: ColorParser())
    }

    static var size: DataPoint<SizeParser> {
        return DataPoint<SizeParser>(parser: SizeParser())
    }
}
  1. That's all the setup. You can now parse the data from the string, and populate your model objects:
  let container1 = Container("Color: blue, size: medium, volume: 12.5 cm3".find(.color, .size, .volume)!)
  let container2 = Container("Color: red, size: small, volume: 6.2 cm3".find(.color, .size, .volume)!)
  let container3 = Container("Color: yellow, size: large, volume: 17.82 cm3".find(.color, .size, .volume)!)

Using SoulverCore as a parser inside Swift Regex Builder (coming in 5.7)

SoulverCore will be able to be used to parse data inside the Swift regex builder DSL coming in 5.7. This is often easier than figuring out how to match the format of your data with a regular expression.

if #available(macOS 13.0, iOS 16.0, *) {
    let input = "Cost: 365.45, Date: March 12, 2022"
    
    let regex = Regex {
        "Cost: "
        Capture {
            DataPoint<NumberFromTokenParser>.number
        }
        ", Date: "
        Capture {
            DataPoint<DateFromTokenParser>.date
        }
    }
    
    let match = input.wholeMatch(of: regex).1 // 365.45
}

Note: it's confusing and unfortunate that the Swift compiler can't seem to infer the DataPoint generic parameter from a static variable on DataPoint (anyone know why?).

Until this is fixed, you must explicitly specify the DataFromTokenParser corresponding to the type of the data you want to match.

Performance

SoulverCore is unlikely to be your app's bottleneck.

In our testing SoulverCore does ~6k operations/second on Intel and 10k+ operations/second on  Silicon.

While this is admittedly not as fast as regex, in fairness, SoulverCore is doing a lot more work. Before your query is checked for matches, SoulverCore parses the complete string into tokens representing various data types, of which it can identify more than 20 (including dates, numbers & units in various formats, places, timezones and more…).

A regex that did this would be impossible to construct, and even if such a regex were possible, it would run much more slowly than SoulverCore does.

Comparison with other data parsing approaches

Apple's toolkit for string parsing includes Regex, NSScanner & NSDataDetector. Let's compare and contrast each of these with SoulverCore.

Regular Expressions

Regular expressions will always be with us, but ask yourself, do you really want to use them for data processing?

They're non-trivial to understand at a glance, and constructing a correct regex to match data is, at the minimum, tedious (if not mentally quite challenging sometimes).

Regex only "sees" sets of characters/numbers/whitespace so it forces you to think about the string format of the data you want to parse, and also often about how to skip past other strings leading up to it.

So even with the significant enhancements to regex in Swift 5.7 (type-safe tuple matches & the regex builder syntax), regex makes you think about data parsing at the wrong level of abstraction (i.e. characters, rather than data types).

If Swift is to achieve its goal of becoming the world's greatest string & data processing language, it needs something more human friendly at the level of abstraction of data, not character sets.

NSScanner

A scanner is an imperative (rather than declarative) approach to parsing data out of strings. You move a scanner through a string step-by-step, scanning out the components that you want.

One benefit of NSScanner is that it's able to ignore parts of strings you don't care about. However scanner still only knows about numbers and strings - not higher level data types.

Here is a StackOverflow post that illustrates the use of NSScanner to scan the integer from the string "user logged (3 attempts)".

NSString *logString = @"user logged (3 attempts)";
NSString *numberString;
NSScanner *scanner = [NSScanner scannerWithString:logString];
[scanner scanUpToCharactersFromSet:[NSCharacterSet decimalDigitCharacterSet] intoString:nil];
[scanner scanCharactersFromSet:[NSCharacterSet decimalDigitCharacterSet] intoString:&numberString];
NSLog(@"Attempts: %i", [numberString intValue]); // 3

Regex (in Swift 5.7+) is somewhat more concise

if #available(macOS 13.0, iOS 16.0, *) {
    let match = "user logged (3 attempts)".firstMatch(of: /([+\\-]?[0-9]+)/)
    let numberSubstring = match!.0
    let number = Int(numberSubstring)
}

And now SoulverCore:

let number = "user logged (3 attempts)".find(.number)

NSDataDetector

NSDataDetector is an NSRegularExpression subclass that is able to scan a string for dates, URLs, phone numbers, addresses, and flight details. It's a great class, and supports many different formats. Additionally, it return propers data types from strings, like URL and Date (much like SoulverCore).

Compare:

NSDataDetector
let input = "Learn more at https://fascinatingcaptian.com today."
let detector = try! NSDataDetector(types: NSTextCheckingResult.CheckingType.link.rawValue)
let url = detector.firstMatch(in: input, options: [], range: NSRange(location: 0, length: input.utf16.count))!.url!
SoulverCore
let url = "Learn more at https://fascinatingcaptian.com today".find(.url)

NSDataDetector's downsides are that the API is not particularly "Swifty", supported data types are limited, and it's not part of the platform-independent implementation of Foundation (so you can't use it on Linux, Windows, etc)

Licence

SoulverCore is a commercially licensable, closed-source Swift framework. The standard licensing terms of SoulverCore do apply for its use in string processing (see SoulverCore Licence).

For personal (non-commercial) projects, you do not need a license. So go ahead and use this great library in your personal projects!

There are also attribution-only licences available for a few commercial use cases.

You might also like...
A camera designed in Swift for easily integrating CoreML models - as well as image streaming, QR/Barcode detection, and many other features
A camera designed in Swift for easily integrating CoreML models - as well as image streaming, QR/Barcode detection, and many other features

Would you like to use a fully-functional camera in an iOS application in seconds? Would you like to do CoreML image recognition in just a few more sec

Intuitive cycling tracker app for iOS built with SwiftUI using Xcode. Features live route tracking, live metrics, storage of past cycling routes and many customization settings.
Intuitive cycling tracker app for iOS built with SwiftUI using Xcode. Features live route tracking, live metrics, storage of past cycling routes and many customization settings.

GoCycling Available on the iOS App Store https://apps.apple.com/app/go-cycling/id1565861313 App Icon About Go Cycling is a cycling tracker app built e

Protocol oriented, Cocoa UI abstractions based library that helps to handle view controllers composition, navigation and deep linking tasks in the iOS application. Can be used as the universal replacement for the Coordinator pattern. iScheduleYourDay is a watchOS 8.5 app that can help order your daily tasks
iScheduleYourDay is a watchOS 8.5 app that can help order your daily tasks

Currently developing an App for watchOS 8.5 to help order your tasks daily. The app is a simple approach to the actual Apple App Remainders to become an improved version of it

Elevate is a JSON parsing framework that leverages Swift to make parsing simple, reliable and composable

Elevate is a JSON parsing framework that leverages Swift to make parsing simple, reliable and composable. Elevate should no longer be used for

This to learn such as : Add Target , NSNotification Center Send/Get Data , Observer Override , resize Data By Byte , UIImagePicker Delegate , UIAlert Handle , Top ViewController , Get pickerController

Technicalisto How to Create UIButton Class to Pick Data Image Purpose Learn this topics With exact Task Add Target NSNotification Center Send/Get Data

CoreML-Face-Parsing - how to use face-parsing CoreML model in iOS
CoreML-Face-Parsing - how to use face-parsing CoreML model in iOS

CoreML-Face-Parsing The simple sample how to use face-parsing CoreML model in iO

A todo list iOS app developed with swift5 and coredata to persist data, this app help people organise their tasks on categories.
A todo list iOS app developed with swift5 and coredata to persist data, this app help people organise their tasks on categories.

A todo list iOS app developed with swift5 and coredata to persist data, this app help people organise their tasks on categories. The app is simple, intuitive, and easy to use and update tasks informations.

Give pull-to-refresh & infinite scrolling to any UIScrollView with 1 line of code.

SVPullToRefresh + SVInfiniteScrolling These UIScrollView categories makes it super easy to add pull-to-refresh and infinite scrolling fonctionalities

ESPullToRefresh is an easy-to-use component that give pull-to-refresh and infinite-scrolling implemention for developers.
ESPullToRefresh is an easy-to-use component that give pull-to-refresh and infinite-scrolling implemention for developers.

ESPullToRefresh is an easy-to-use component that give pull-to-refresh and infinite-scrolling implemention for developers.

What if you could give your wallpapers, a little touch? On the fly, of course
What if you could give your wallpapers, a little touch? On the fly, of course

Amēlija On the fly preferences. Features Custom Blurs for your LockScreen. Custom Blurs for your HomeScreen. Blur Types Epic (Gaussian). Dark. Light.

DocumenterXcode - Attempt to give a new life for VVDocumenter-Xcode as source editor extension.
DocumenterXcode - Attempt to give a new life for VVDocumenter-Xcode as source editor extension.

DocumenterXcode Xcode source editor extension which helps you write documentation comment easier, for both Objective-C and Swift. This project is an a

Quotes shows you famous quotes to, hopefully, give you enlightment
Quotes shows you famous quotes to, hopefully, give you enlightment

"Quotes" shows you famous quotes to, hopefully, give you enlightment! You can also save/favorite the quotes that you liked to review later or show to your friends!

Spokestack: give your iOS app a voice interface!
Spokestack: give your iOS app a voice interface!

Spokestack provides an extensible speech recognition pipeline for the iOS platform. It includes a variety of built-in speech processors for Voice Acti

🥳 Give birthday celebration message to foster children 🎂
🥳 Give birthday celebration message to foster children 🎂

GiveCake Team Icon 디자이너와 개발자 - DBAL App Icon Introduction 경북 위탁가정과 기부자 사이에서 케익 기부를 위한 돈만 보내는 것이 아니라, 축하와 감사의 마음 또한 서로 보낼 수 있도록 도와주는 앱입니다. Functions 기

With the Coverless App, you can discover many books of various genres
With the Coverless App, you can discover many books of various genres

Coverless Não julgue um livro pela capa: use a sinopse! Com o App Coverless, você pode descobrir muitos livros de vários gêneros. Salve seus livros de

PTPopupWebView is a simple and useful WebView for iOS, which can be popup and has many of the customized item.
PTPopupWebView is a simple and useful WebView for iOS, which can be popup and has many of the customized item.

PTPopupWebView PTPopupWebView is a simple and useful WebView for iOS, which can be popup and has many of the customized item. Requirement iOS 8.0 Inst

TTextField is developed to help developers can initiate a fully standard textfield including title, placeholder and error message in fast and convinient way without having to write many lines of codes
TTextField is developed to help developers can initiate a fully standard textfield including title, placeholder and error message in fast and convinient way without having to write many lines of codes

TTextField is developed to help developers can initiate a fully standard textfield including title, placeholder and error message in fast and convinient way without having to write many lines of codes

This project server as a demo for anyone who wishes to learn Core Data in Swift.

CoreDataDemo This project server as a demo for anyone who wishes to learn Core Data in Swift. The purpose of this project is to help someone new to Co

Comments
  • Running into any issues in your string parsing? Let us know!

    Running into any issues in your string parsing? Let us know!

    The parser in SoulverCore was built to be the modern math engine parser for Soulver.

    Expanding SoulverCore's objectives to include general purpose string parsing has proved quite successful, but there may still be cases where the parser behaves contrary to how you would expect (as it's optimizing for the prior use case).

    If you do run into a situation where matches are not returned, please post the issue in this repository and we'll look into what's going on.

    Cheers, and happy parsing ☀️.

    opened by zcohan 0
Owner
Soulver
Frameworks by the team that brought you Soulver
Soulver
Perl-like regex =~ operator for Swift

SwiftRegex Perl-like regex =~ operator for Swift This package implements a =~ string infix operator for use in testing regular expressions and retriev

Gregory Todd Williams 112 Oct 15, 2022
Easily deal with Regex in Swift in a Pythonic way

PySwiftyRegex Easily deal with Regex in Swift in a Pythonic way. 简体中文 日本語 한국어 This is Easy import PySwiftyRegex if let m = re.search("[Tt]his is (.*?

Ce Zheng 232 Oct 12, 2022
Regular expressions for swift

Regex Advanced regular expressions for Swift Goals Regex library was mainly introduced to fulfill the needs of Swift Express - web application server

Crossroad Labs 328 Nov 20, 2022
Regex class for Swift. Wraps NSRegularExpression.

Regex.swift install Use CocoaPods. Add to your Podfile: pod 'Regex' And then run pod install from the shell: $ pod install usage Simple use cases: Str

Bryn Bellomy 67 Sep 14, 2022
This is a repo for my implementation of Gang of Four Book: Software Design Patterns. All written in Swift.

GoF-Swift-Design-Patterns This repo is intended to implement the known Software Design Patterns from the Gang of Four book using Swift Programming Lan

Noor El-Din Walid 3 Jul 11, 2022
XRepository: lightweight implementation of Repository pattern in Swift

XRepository is based on QBRepository by QuickBirds Studios. It is lightweight im

Sashko Potapov 2 Jan 10, 2022
Specification pattern implemented in swift (iOS/OSX)

SpecificationPattern The Specification design pattern implemented in swift for iOS/OSX. In computer programming, the specification pattern is a partic

Simon Strandgaard 46 Sep 21, 2022
DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning.

DL4S provides a high-level API for many accelerated operations common in neural networks and deep learning. It furthermore has automatic differentiati

DL4S Team 2 Dec 5, 2021
[Deprecated] A shiny JSON parsing library in Swift :sparkles: Loved by many from 2015-2021

?? Deprecation Notice ?? Gloss has been deprecated in favor of Swift's Codable framework. The existing Gloss source is not going away, however updates

Harlan Kellaway 1.6k Nov 24, 2022
A simple menubar app can give you quick access to some macOS functions

OneClick This simple menubar app can give you quick access to some macOS functio

mik3 32 Dec 19, 2022