A result builder that build HTML parser and transform HTML elements to strongly-typed result, inspired by RegexBuilder.

Overview

HTMLParserBuilder

A result builder that build HTML parser and transform HTML elements to strongly-typed result, inspired by RegexBuilder.

Note: CaptureTransform.swift, TypeConstruction.swift are copied from apple/swift-experimental-string-processing.

Installation

Requirement

  • Swift 5.7 (buildPartialBlock)
  • macOS 10.15
  • iOS 13.0
  • tvOS 13.0
  • watchOS 6.0

Note: HTMLParserBuilder currently only supports platforms where Objective-C runtime are supported, because it has dependency on HTMLKit, an Objective-C HTML parser library.

dependencies: [
    // ...
    .package(name: "HTMLParserBuilder", url: "https://github.com/danny1113/html-parser-builder.git", from: "1.0.0")
]

Introduction

Parsing HTML can be complicated, for example you want to parse the simple html below:

<h1 id="hello">hello, world</h1>

<div id="group">
    <h1>INSIDE GROUP h1</h1>
    <h2>INSIDE GROUP h2</h2>
</div>

Existing HTML parsing library have these downside:

  • Name every captured element
  • It can be more complex as the element you want to capture become more and more
  • Error handling can be hard
let htmlString = "<html>...</html>"
let doc = HTMLDocument(string: htmlString)
let first = doc.querySelector("#hello")?.textContent

let group = doc.querySelector("#group")
let second = group?.querySelector("h1")?.textContent
let third = group?.querySelector("h2")?.textContent

if  let first = first,
    let second = second,
    let third = third {
    
    // ...
} else {
    // ...
}

HTMLParserBuilder comes with some really great advantages:

  • Strongly-typed capture result
  • Structrued syntax
  • Composible API
  • Support for async await
  • Error handling built in

You can construct your parser which reflect your original HTML structure:

let capture = HTML {
    TryCapture("#hello") { (element: HTMLElement?) -> String? in
        return element?.textContent
    } // => HTML<String?>
    
    Local("#group") {
        Capture("h1", transform: \.textContent) // => HTML<String>
        Capture("h2", transform: \.textContent) // => HTML<String>
    } // => HTML<(String, String)>
    
} // => HTML<(String?, String, String)>


let htmlString = "<html>...</html>"
let doc = HTMLDocument(string: htmlString)

let output = try doc.parse(capture)
// => (String?, String, String)
// output: (Optional("hello, world"), "INSIDE GROUP h1", "INSIDE GROUP h2")

Note: You can now compose up to 10 components inside the builder, but you can group your captures inside Local as a workaround.

API Detail Usage

Parsing

HTMLParserBuilder provides 2 functions for parsing:

public func parse<Output>(_ html: HTML<Output>) throws -> Output
public func parse<Output>(_ html: HTML<Output>) async throws -> Output

Note: You can choose the async version for even better performance, since it use structured concurrency to parallelize child tasks.

HTML

You can construct your parser inside HTML, it can also transform to other data type.

struct Group {
    let h1: String
    let h2: String
}

let capture = HTML {
    Capture("#group h1", transform: \.textContent) // => HTML<String>
    Capture("#group h2", transform: \.textContent) // => HTML<String>
    
} transform: { (output: (String, String)) -> Group in
    return Group(
        h1: output.0,
        h2: output.1
    )
} // => HTML<Group>

Capture

Using Capture is the same as querySelector, you pass in CSS selector to find the HTML element, and you can transform it to any other type you want:

  • innerHTML
  • textContent
  • attributes
  • ...

Note: If Capture can't find the HTML element that match the selector, it will throw an error cause the whole parse fail, for failable capture, see TryCapture.

You can use this API with various declaration that is most suitable for you:

Capture("#hello", transform: \.textContent)
Capture("#hello") { $0.textContent }
Capture("#hello") { (e: HTMLElement) -> String in
    return e.textContent
}

TryCapture

TryCapture is a litte different from Capture, it also calls querySelector to find the HTML element, but it returns an optional HTML element.

For this example, it will produce the result type of String?, and the result will be nil when the HTML element can't be found.

TryCapture("#hello") { (e: HTMLElement?) -> String? in
    return e?.innerHTML
}

CaptureAll

Using CaptureAll is the same as querySelectorAll, you pass in CSS selector to find all HTML elements that match the selector, and you can transform it to any other type you want:

You can use this API with various declaration that is most suitable for you:

CaptureAll("h1") { $0.map(\.textContent) }
CaptureAll("h1") { (e: [HTMLElement]) -> [String] in
    return e.map(\.textContent)
}

You can also capture other elements inside and transform to other type:

<div class="group">
    <h1>Group 1</h1>
</div>
<div class="group">
    <h1>Group 2</h1>
</div>
CaptureAll("div.group") { (elements: [HTMLElement]) -> [String] in
    return elements.compactMap { e in
        return e.querySelector("h1")?.textContent
    }
}
// => [String]
// output: ["Group 1", "Group 2"]

Local

Local will find a HTML element that match the selector, and all the captures inside will find its element based on the element found by Local, this is useful when you just want to capture element that is inside the local group.

Just like HTML, Local can also transform captured result to other data type by adding transform:

struct Group {
    let h1: String
    let h2: String
}

Local("#group") {
    Capture("h1", transform: \.textContent) // => HTML<String>
    Capture("h2", transform: \.textContent) // => HTML<String>
} transform: { (output: (String, String)) -> Group in
    return Group(
        h1: output.0,
        h2: output.1
    )
} // => Group

Note: If Local can't find the HTML element that match the selector, it will throw an error cause the whole parse fail, you can use TryCapture as alternative.

LateInit

This library also comes with a handy property wrapper: LateInit, which can delay the initialization until the first time you access it.

struct Container {
    @LateInit var capture = HTML {
        Capture("h1", transform: \.textContent)
    }
}

// it needs to be `var` to perform late initialization
var container = Container()
let output = doc.parse(container.capture)
// ...

Wrap Up

API Use Case
Capture Throws error when element can't be captured
TryCapture Returns nil when element can't be captured
CaptureAll Capture all elements match the selector
Local Capture elements in the local scope
LateInit Delay the initialization to first time you access it

Advanced use case

  • Pass HTMLComponent into another
  • Transform to custom data structure before parasing
struct Group {
    let h1: String
    let h2: String
}

//       |--------------------------------------------------------------|
let groupCapture = HTML {                                            // |
    Local("#group") {                                                // |
        Capture("h1", transform: \.textContent) // => HTML<String>   // |
        Capture("h2", transform: \.textContent) // => HTML<String>   // |
    } // => HTML<(String, String)>                                   // |
                                                                     // |
} transform: { output -> Group in                                    // |
    return Group(                                                    // |
        h1: output.0,                                                // |
        h2: output.1                                                 // |
    )                                                                // |
} // => HTML<Group>                                                  // |
                                                                     // |
let capture = HTML {                                                 // |
    TryCapture("#hello") { (element: HTMLElement?) -> String? in     // |
        return element?.textContent                                  // |
    } // => HTML<String?>                                            // |
                                                                     // |
    groupCapture // => HTML<Group> -------------------------------------|
    
} // => HTML<(String?, Group)>


let htmlString = "<html>...</html>"
let doc = HTMLDocument(string: htmlString)

let output = try doc.parse(capture)
// => (String?, Group)
You might also like...
An SSH config parser library with a fancy API

The SshConfig makes it quick and easy to load, parse, and decode/encode the SSH configs. It also helps to resolve the properties by hostname and use them safely in your apps (thanks for Optional and static types in Swift).

.DS_Store file parser/viewer.
.DS_Store file parser/viewer.

.DS_Store file parser/viewer.

A Powerful , Extensible CSS Parser written in pure Swift.
A Powerful , Extensible CSS Parser written in pure Swift.

A Powerful , Extensible CSS Parser written in pure Swift.

A simple, but efficient CSV Parser, written in Swift.

CSV CSV.swift is a powerful swift library for parsing CSV files that supports reading as [String], [String: String] and Decodable, without sacrificing

A parser combinator library written in the Swift programming language.

SwiftParsec SwiftParsec is a Swift port of the Parsec parser combinator library. It allows the creation of sophisticated parsers from a set of simple

ParserCombinators - String Parser Construction Kit

ParserCombinators provides a set of elementary building blocks for deriving stru

HxSTLParser is a basic STL parser capable of loading STL files into an SCNNode

HxSTLParser HxSTLParser is a basic STL parser capable of loading STL files into an SCNNode. Installing Via Carthage Just add it to your Cartfile githu

C4 is an open-source creative coding framework that harnesses the power of native iOS programming with a simplified API that gets you working with media right away. Build artworks, design interfaces and explore new possibilities working with media and interaction.
A handy collection of Swift method and Tools to build project faster and more efficient.

SwifterKnife is a collection of Swift extension method and some tools that often use in develop project, with them you might build project faster and

Comments
  • Wrong construct type.

    Wrong construct type.

    This will cause the whole app crash since the type construct by the builder doesn't match the output result type.

    Please prevent return any tuple type or name your tuple as describe in workaround below.

    let capture = HTML {
        Capture("h1") { e -> (String, Int) in
            return (e.textContent, 1)
        }
        Capture("#hello") { e -> (String, test: Int) in
            return (e.textContent, 2)
        }
    }
    
    // expected output type: ((String, Int), (String, test: Int))
    // actual output type:   (String, Int, (String, test: Int))
    

    Workaround:

    let capture = HTML {
        Capture("h1") { e -> (String, test: Int) in
            return (e.textContent, 1)
        }
        Capture("#hello") { e -> (String, test: Int) in
            return (e.textContent, 2)
        }
    }
    
    // output type: ((String, test: Int), (String, test: Int))
    
    opened by danny1113 0
Releases(1.0.1)
Owner
null
A result builder that allows to define shape building closures

ShapeBuilder A result builder implementation that allows to define shape building closures and variables. Problem In SwiftUI, you can end up in a situ

Daniel Peter 47 Dec 2, 2022
Transform strings easily in Swift.

swift-string-transform Transform strings easily in Swift. Table of Contents Installation How to use Contribution Installation Swift Package Manager (R

null 18 Apr 21, 2022
Swift Parser Combinator library inspired by NimbleParsec for Elixir.

SimpleParsec Simple parser combinator library for Swift inspired by NimbleParsec for Elixir. Each function in the library creates a Parser which can b

null 0 Dec 27, 2021
Strong typed, autocompleted resources like images, fonts and segues in Swift projects

R.swift Get strong typed, autocompleted resources like images, fonts and segues in Swift projects Why use this? It makes your code that uses resources

Mathijs Kadijk 8.9k Jan 4, 2023
🏹 Bow is a cross-platform library for Typed Functional Programming in Swift

Bow is a cross-platform library for Typed Functional Programming in Swift. Documentation All documentation and API reference is published in our websi

Bow 613 Dec 20, 2022
🗃 Powerful and easy to use Swift Query Builder for Vapor 3.

⚠️ This lib is DEPRECATED ⚠️ please use SwifQL with Bridges Quick Intro struct PublicUser: Codable { var name: String var petName: String

iMike 145 Sep 10, 2022
A collection of useful result builders for Swift and Foundation value types

Swift Builders A collection of useful result builders for Swift and Foundation value types. Motivation Arrays, dictionaries, and other collection-base

David Roman 3 Oct 14, 2022
An eject button for Interface Builder to generate swift code

Eject Eject is a utility to transition from Interface Builder to programatic view layout. This is done by using code generation to create a .swift fil

Rightpoint 524 Dec 29, 2022
iOS Logs, Events, And Plist Parser

iLEAPP iOS Logs, Events, And Plists Parser Details in blog post here: https://abrignoni.blogspot.com/2019/12/ileapp-ios-logs-events-and-properties.htm

Brigs 421 Jan 5, 2023
Support library of BudouX.swift to handle HTML

HTMLBudouX.swift HTMLBudouX.swift is a support library of BudouX.swift to handle HTML. Detail about BudouX.swift is here Usage You can translate an HT

griffin-stewie 1 Dec 31, 2021