Kanna(鉋) is an XML/HTML parser for Swift.

Overview

Kanna(鉋)

Kanna(鉋) is an XML/HTML parser for cross-platform(macOS, iOS, tvOS, watchOS and Linux!).

It was inspired by Nokogiri(鋸).

Build Status Platform Cocoapod Carthage compatible Swift Package Manager Reference Status

ℹ️ Documentation

Features

  • XPath 1.0 support for document searching
  • CSS3 selector support for document searching
  • Support for namespaces
  • Comprehensive test suite

Installation for Swift 5

CocoaPods

Add the following to your Podfile:

use_frameworks!
pod 'Kanna', '~> 5.2.2'

Carthage

Add the following to your Cartfile:

github "tid-kijyun/Kanna" ~> 5.2.2

For xcode 11.3 and earlier, the following settings are required.

  1. In the project settings add $(SDKROOT)/usr/include/libxml2 to the "header search paths" field

Swift Package Manager

  1. Installing libxml2 to your computer:
// macOS: For xcode 11.3 and earlier, the following settings are required.
$ brew install libxml2
$ brew link --force libxml2

// Linux(Ubuntu):
$ sudo apt-get install libxml2-dev
  1. Add the following to your Package.swift:
// swift-tools-version:5.0
import PackageDescription

let package = Package(
    name: "YourProject",
    dependencies: [
        .package(url: "https://github.com/tid-kijyun/Kanna.git", from: "5.2.2"),
    ],
    targets: [
        .target(
            name: "YourTarget",
            dependencies: ["Kanna"]),
    ]
)
$ swift build

Note: When a build error occurs, please try run the following command:

// Linux(Ubuntu)
$ sudo apt-get install pkg-config

Manual Installation

  1. Add these files to your project:
    Kanna.swift
    CSS.swift
    libxmlHTMLDocument.swift
    libxmlHTMLNode.swift
    libxmlParserOption.swift
    Modules
  2. In the target settings add $(SDKROOT)/usr/include/libxml2 to the Search Paths > Header Search Paths field
  3. In the target settings add $(SRCROOT)/Modules to the Swift Compiler - Search Paths > Import Paths field

Installation for swift 4

Installation for swift 3

Synopsis

import Kanna

let html = "<html>...</html>"

if let doc = try? HTML(html: html, encoding: .utf8) {
    print(doc.title)
    
    // Search for nodes by CSS
    for link in doc.css("a, link") {
        print(link.text)
        print(link["href"])
    }
    
    // Search for nodes by XPath
    for link in doc.xpath("//a | //link") {
        print(link.text)
        print(link["href"])
    }
}
let xml = "..."
if let doc = try? Kanna.XML(xml: xml, encoding: .utf8) {
    let namespaces = [
                    "o":  "urn:schemas-microsoft-com:office:office",
                    "ss": "urn:schemas-microsoft-com:office:spreadsheet"
                ]
    if let author = doc.at_xpath("//o:Author", namespaces: namespaces) {
        print(author.text)
    }
}

Donation

If you like Kanna, please donate via GitHub sponsors or PayPal.
It is used to improve and maintain the library.

License

The MIT License. See the LICENSE file for more information.

Comments
  • Incorrect decoding data for nodes swift 3

    Incorrect decoding data for nodes swift 3

    Description:

    When my program was on swift 2 and use old version of library, all was fine. Now with version 2.0 and swift 3 my xml data decode incorrectly.

    I have next code:

    pageEncoding:String.Encoding = .utf8
    
    guard let xmlConfing = self.defaults.object(forKey: "xmlConfing") as? Data else {
                return []
            }
    
    guard let doc = Kanna.XML(xml: xmlConfing, encoding: pageEncoding) else {
                return []
            }
    
    let current = doc.xpath("//ROOT/*").map({ $0 })
    let siteMatches = current.xpath("./MATCHES/*").map({ $0 })
    
    for siteMatch in siteMatches {
                print(siteMatch.toXML)
            }
    

    return meOptional("<STORE><><tem></tem></></STORE>")

    Here is my xml:

    
    <ROOT>     
        <NODENAME>         
              <MATCHES>             
                    <STORE>
                          <STORE_NAME>My name</STORE_NAME>
                    </STORE>
               </MATCHES>
         </NODENAME>  
    </ROOT> 
    

    Nothing changed on server. This code worked before:(

    If i do: let current = doc.xpath("//ROOT/*").map({ $0.toXML }) it show correct string but with \n at the end of every line.

    And sometimes this happens:

    screen shot 2016-10-04 at 16 33 58 But i'm not sure it is my problem with decoding.

    Installation method:

    • [x] CocoaPods(1.1.0 or later)

    Library version:

    • [X] v2.0.0

    Xcode version:

    • [X] 8.0 (Swift 3.0)
    Bug 
    opened by Arti3DPlayer 23
  • Swift Build not Working on Ubuntu 16.04

    Swift Build not Working on Ubuntu 16.04

    Description:

    Swift Version:

    Swift version 3.0.2 (swift-3.0.2-RELEASE)
    Target: x86_64-unknown-linux-gnu
    

    Get a bunch of errors, including:

    /usr/include/libxml2/libxml/tree.h:17:10: error: 'libxml/xmlversion.h' file not found
    #include <libxml/xmlversion.h>
             ^
    

    Which doesn't seem right, as xmlversion.h definitely exists in /usr/include/libxml2.

    Package.swift:

    import PackageDescription
    
    let package = Package(
        name: "cl_scraper",
    
        dependencies: [
            .Package(url: "https://github.com/tid-kijyun/Kanna.git", majorVersion: 2)
        ]
    )
    

    Installation method:

    • [ ] Carthage
    • [ ] CocoaPods(1.1.0 or later)
    • [x] Swift Package Manager
    • [ ] Manually
    • [ ] other: ()

    Library version:

    • [x] v2.1.1
    • [ ] other: ()

    Xcode version:

    • [ ] 8.1 (Swift 3)
    • [ ] 8.1 (Swift 2.3)
    • [ ] 7.3.1
    • [x] other: (On Linux)
    opened by moosichu 21
  • extension XMLNodeSet / Argument passed to call that takes no arguments

    extension XMLNodeSet / Argument passed to call that takes no arguments

    I just manually implemented Kanna and got the following parser error in Kanna.swift: Edit: The error occurs even when it's installed via CocoaPods.

    bildschirmfoto 2016-03-09 um 17 21 11

    Tested with OS X El Capitan 10.11.3, Xcode 7.2.1, Swift-Project

    Duplicate 
    opened by ixeau 21
  • Could not build Objective-C module 'libxml2'

    Could not build Objective-C module 'libxml2'

    Description:

    I ran both the brew install libxml2 and brew link --force libxml2 commands successfully, I added the header search path $(SDKROOT)/usr/include/libxml2, and I still get these build errors:

    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator10.2.sdk/usr/include/libxml2/libxml/HTMLtree.h:15:10: 'libxml/xmlversion.h' file not found

    Could not build Objective-C module 'libxml2'

    Installation method:

    • [x] Carthage
    • [ ] CocoaPods(1.1.0 or later)
    • [ ] Swift Package Manager
    • [ ] Manually
    • [ ] other: ()

    Library version:

    • [x] v2.1.1
    • [ ] other: ()

    Xcode version:

    • [x] 8.2.1 (Swift 3.0.2)
    • [ ] 8.1 (Swift 2.3)
    • [ ] 7.3.1
    • [ ] other: ()
    opened by mhillebrand 20
  • error: could not build Objective-C module 'libxml2' when use pod Kanna 1.0.6 with CocoaPods 1.0.0

    error: could not build Objective-C module 'libxml2' when use pod Kanna 1.0.6 with CocoaPods 1.0.0

    It works when use use pod Kanna 1.0.2 with CocoaPods 1.0.0.

    <module-includes>:1:9: note: in file included from <module-includes>:1:
    #import "libxml2-kanna.h"
            ^
    /Users/nix/dev/github/clones/Yep/Pods/Kanna/Modules/libxml2-kanna.h:1:9: note: in file included from /Users/nix/dev/github/clones/Yep/Pods/Kanna/Modules/libxml2-kanna.h:1:
    #import <libxml2/libxml/HTMLtree.h>
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator9.3.sdk/usr/include/libxml2/libxml/HTMLtree.h:15:10: error: 'libxml/xmlversion.h' file not found
    #include <libxml/xmlversion.h>
             ^
    <unknown>:0: error: could not build Objective-C module 'libxml2'
    
    opened by nixzhu 18
  • No such module - libxml2 : Build fails

    No such module - libxml2 : Build fails

    Hi, i have followed the steps mentioned in the link :https://github.com/tid-kijyun/Kanna. But i keep getting the error : No such module - libxml2.

    After using your files, my project looks like this - screen shot 2016-02-17 at 6 32 04 pm Also i have set the Swift Compiler - Search Paths > Import Paths as "$(SRCROOT)/Bhagavad Geetha Book/Modules".

    Also, I am a little unclear with - "Step : 1 Add files to your project:" and "Step : 2 Copy folder to your project:" . Hopefully i have set your files correctly.

    Kindly let me know if i have missed anything. Thanks in advance.

    opened by Kavisha-Dev 18
  • Getting build error after adding Kanna to a Swift 3.0 project

    Getting build error after adding Kanna to a Swift 3.0 project

    Description:

    I get these build errors after adding Kanna using CocoaPods (1.1.1) /..../Kanna.swift:64:12: Argument labels '(xml:, encoding:, option:)' do not match any available overloads this repeats for all of the XML and HTML functions in Kanna.swift

    Installation method:

    • [ ] Carthage
    • [X] CocoaPods(1.1.0 or later) 1.2.0 beta 1
    • [ ] CocoaPods(older)
    • [ ] Manually

    Library version:

    • [ ] v2.0.0
    • [ ] v1.1.1
    • [ ] v1.0.6
    • [ ] v1.0.2
    • [X] other: (Please fill in the version you are using.) 2.1.1

    Xcode version:

    • [ ] 8.0 (Swift 3.0)
    • [ ] 8.0 (Swift 2.3)
    • [ ] 7.3.1
    • [ ] other: (Please fill in the version you are using.)
    opened by mkostersitz 10
  • Build Error

    Build Error

    Description:

    I created an empty swift 3.0 project with swift package init --type executable command, then added dependency that is written in README.md file for Kanna. Then I try to swift build but I have many compile errors : /Packages/Kanna-2.0.1/Sources/libxmlHTMLNode.swift:111:30: error: use of undeclared type 'htmlDocPtr' fileprivate var docPtr: htmlDocPtr? = nil This one is the first one.

    Installation method:

    • [ ] Carthage
    • [ ] CocoaPods(1.1.0 or later)
    • [ ] CocoaPods(older)
    • [X] Manually

    Library version:

    • [ ] v2.0.0
    • [ ] v1.1.1
    • [ ] v1.0.6
    • [ ] v1.0.2
    • [X] other: 2.0.1

    Xcode version:

    • [X] 8.0 (Swift 3.0)
    • [ ] 8.0 (Swift 2.3)
    • [ ] 7.3.1
    • [ ] other: (Please fill in the version you are using.)

    Am I doing something wrong ?

    opened by CedricEugeni 10
  • Does not parse latin characters and cedilla

    Does not parse latin characters and cedilla

    Whenever I parse content that contains Spanish letters like ç, ó etc it fails to do so. Instead, i get outputs like - ó for ó ñ for ñ ç for ç Help would be much appreciated!

    opened by vihanggodbole 9
  • Kanna was compiled with optimization - stepping may behave oddly; variables may not be available.

    Kanna was compiled with optimization - stepping may behave oddly; variables may not be available.

    So I was using Kanna to parse some HTML on my iOS app.

    Everything works well on Debug mode no problem.

    When I tested the app on TestFlight it just crashed. Took me a while to figure out the problem. It happens only in Release mode so I went ahead and changed the scheme to Release mode.

    Firing up the app it crashes as the TestFlight version, as "expected".

    The problem resides in the class libxmlHTMLNode in the following method:

    private func libxmlGetNodeContent(nodePtr: xmlNodePtr) -> String? {
        let content = xmlNodeGetContent(nodePtr)
        if let result  = String.fromCString(UnsafePointer(content)) {
            content.dealloc(1)
            return result
        }
        content.dealloc(1)
        return nil
    }
    

    The line let content = xmlNodeGetContent(nodePtr) has a EXC_BAD_ACCESS and when I try to print out the nodePtr I get the following message:

    Kanna was compiled with optimization - stepping may behave oddly; variables may not be available.

    So I went ahead and turned `Swift Compiler Optimization Level to None[-Onone] and the app runs propperly.

    My app is a Swift app using Kanna installed via CocoaPods.

    Not sure what I can do. Anybody had this problem before?

    opened by nunogoncalves 9
  • 'libxml/xmlversion.h' file not found

    'libxml/xmlversion.h' file not found

    Description:

    I've solved my issue already, but I'm posting my solution in case someone is experiencing the same problem.

    After updating version from 2.0.0 to 2.2.1 the following error started happening 'libxml/xmlversion.h' file not found Could not build Objective-C module 'libxml2'

    Right now I'm using Kanna on my private lib that loads it from .podspec file

    I was able to replicate my solution by adding: s.xcconfig = { 'HEADER_SEARCH_PATHS' => '$(SDKROOT)/usr/include/libxml2', 'SWIFT_INCLUDE_PATHS' => '$(SRCROOT)/Kanna/Modules' }

    to my .podspec right above: s.dependency 'Kanna', '~> 2.2.1'

    That narrowed the error down to the "Project name"_Tests target and I've noticed that some people mentioned about adding "$(SDKROOT)/usr/include/libxml2" to the header search paths, my target already had that path, but the "Project name"_Tests target didn't, once I've added the the search path to the "Project name"_Tests, cleaned the project, crossed my fingers and finally hit pod update.

    It cleared the error for me, hopefully it will help someone else

    PS. I don't know all the in and outs of cocoapods, neither Kanna, and I have no idea what I did, but I was able to replicate the solution a few times, feel free to explain or tell me why/how I'm wrong

    Installation method:

    • [ ] Carthage
    • [x] CocoaPods(1.1.0 or later)
    • [ ] Swift Package Manager
    • [ ] Manually
    • [ ] other: ()

    Library version:

    • [x] v2.1.1
    • [ ] other: ()

    Xcode version:

    • [x] 8.1 (Swift 3)
    • [ ] 8.1 (Swift 2.3)
    • [ ] 7.3.1
    • [ ] other: ()
    opened by gschafer 8
  • What is the best way to handle a Base64 encoded image?

    What is the best way to handle a Base64 encoded image?

    Description:

    When I parse img urls from a website I am getting Base64 image urls. Is there any way to get the actual url to the image or is there a way to convert this to an image url or even an actual image?

    Example: I was parsing polygon.com and some of the image urls were formatted in Base64 like this:

    data:image/gif;base64,R0lGODlhAQABAIAAAAUEBAAAACwAAAAAAQABAAACAkQBADs

    Installation method:

    • [x] CocoaPods

    Kanna version (or commit hash):

    5.2.7

    swift --version

    5

    opened by PatrickAdams 0
  • Kanna is very good ,but  i dont know how to use namespace。can you help me?

    Kanna is very good ,but i dont know how to use namespace。can you help me?

    Description:

    for link in doc.css("", namespaces: [:]) for link in doc.at_css("", namespaces:"")

    How can I use it? Can you give me a more personal example? Not just what I just suggested,All examples thanks

    Installation method:

    • [ ] Carthage
    • [ ] CocoaPods
    • [ ] Swift Package Manager
    • [ ] Manually
    • [ ] other: ()

    Kanna version (or commit hash):

    swift --version

    Xcode version (optional):

    opened by tianbinbin 0
  • 请问能支持XPath 函数吗?

    请问能支持XPath 函数吗?

    作者您好: 在解析网页的时候,我使用text()函数无法解析,不知道是什么原因? 麻烦作者能帮我解决这个难题~~

    这样写,可以显示数据 htmlDocument.xpath("//div[@id='content']").first?.text

    而这样写,加了一个text(),无法解析数据出来: htmlDocument.xpath("//div[@id='content']/text()").first?.text

    opened by kuangtao22 0
  • HTML of Kanna instance stripped down on Xcode 12

    HTML of Kanna instance stripped down on Xcode 12

    Description:

    I am seeing some strange behavior that I have not seen before. If I create a Kanna instance and then access the .toHTML property of that instance, it does not return the full HTML of the instance. The weird thing is that it is different depending on the deployment target. When building for iOS 14 the .toHTML property returns almost all the raw data, but if I am building for iOS 13.5 it returns only a small portion. I noticed the issue when starting to use Xcode 12 and its toolchain.

    Installation method:

    • [ X] CocoaPods

    Kanna version:

    5.2.2

    swift --version:

    5.3

    Xcode version:

    Version 12.0 (12A7209)

    How to reproduce:

    Create a Kanna instance with some HTML:

    let document = try! Kanna.HTML(html: htmlText, encoding: String.Encoding.utf8)
                
    print(document.toHTML)
    

    When looking at the console log you can see that the HTML is not complete.

    Anyone else seeing this?

    Regards, Erik

    wontfix 
    opened by fishfisher 6
  • Chained xpaths are searching from root level

    Chained xpaths are searching from root level

    Description:

    All node xpaths are calling for root document level, not for node. For example, this test will fail.

    func testInnerXpath() {
        let input = """
                    <html>
                    <head>
                        <title>test title</title>
                    </head>
                    <body>
                        <div id="1"><div><h1>test header 1</h1></div></div>
                        <div id="2"><div><h1>test header 2</h1></div></div>
                    </body>
                    </html>
                    """
        do {
            let doc = try HTML(html: input, encoding: .utf8)
            //all this asserts will fail:
            XCTAssertNil(doc.at_xpath("//head")?.at_xpath("//h1")?.toHTML)
            XCTAssertNil(doc.at_xpath("//head")?.at_xpath("//body")?.toHTML)
            XCTAssertNil(doc.at_xpath("//body")?.at_xpath("//title")?.toHTML)
            XCTAssertEqual(doc.at_xpath("//body/div[@id='2']")?.at_xpath("//h1")?.text, "test header 2")
            //only this assert is ok, passes:
            XCTAssertEqual(doc.at_xpath("//body/div[@id='2']//h1")?.text, "test header 2")
        } catch {
            XCTFail("Abnormal test data")
        }
    }
    

    Is it bug or feature?

    I've started implementing fix of this problem I'm casting xmlNodePtr to xmlDocPtr and initing xmlXPathNewContext with this casted object and then all xpaths starting work properly.

    Feature 
    opened by anivaros 1
  • 标题解析不出来

    标题解析不出来

    Description:

    https://mp.weixin.qq.com/s/kQTxb7CO0njHX-yYOjNjww,微信公众号标题解析有问题,master 最新的版本

    Installation method:

    • [ ] Carthage
    • [ ] CocoaPods
    • [ ] Swift Package Manager
    • [ ] Manually
    • [ ] other: ()

    Kanna version (or commit hash):

    swift --version

    5.0.0

    Xcode version (optional): 11.4

    opened by caolin358688599 2
Releases(5.2.7)
Owner
Atsushi Kiwaki
Atsushi Kiwaki
SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)

SwiftSoup is a pure Swift library, cross-platform (macOS, iOS, tvOS, watchOS and Linux!), for working with real-world HTML. It provides a very conveni

Nabil Chatbi 3.7k Dec 28, 2022
An awesome Swift HTML DSL library using result builders.

SwiftHtml An awesome Swift HTML DSL library using result builders. let doc = Document(.html5) { Html { Head { Meta()

Binary Birds 204 Dec 25, 2022
A light weight network library with automated model parser for rapid development

Gem A light weight network library with automated model parser for rapid development. Managing all http request with automated model parser calls in a

Albin CR 10 Nov 19, 2022
Server-side Swift. The Perfect core toolset and framework for Swift Developers. (For mobile back-end development, website and API development, and more…)

Perfect: Server-Side Swift 简体中文 Perfect: Server-Side Swift Perfect is a complete and powerful toolbox, framework, and application server for Linux, iO

PerfectlySoft Inc. 13.9k Jan 6, 2023
Socket framework for Swift using the Swift Package Manager. Works on iOS, macOS, and Linux.

BlueSocket Socket framework for Swift using the Swift Package Manager. Works on iOS, macOS, and Linux. Prerequisites Swift Swift Open Source swift-5.1

Kitura 1.3k Dec 26, 2022
Swift Express is a simple, yet unopinionated web application server written in Swift

Documentation <h5 align="right"><a href="http://demo.swiftexpress.io/">Live ?? server running Demo <img src="https://cdn0.iconfinder.com/data/icons/

Crossroad Labs 850 Dec 2, 2022
Swift backend / server framework (Pure Swift, Supports Linux)

NetworkObjects NetworkObjects is a #PureSwift backend. This framework compiles for OS X, iOS and Linux and serves as the foundation for building power

Alsey Coleman Miller 258 Oct 6, 2022
Swift-multipart-formdata - MultipartFormData: Build multipart/form-data type-safe in Swift

MultipartFormData Build multipart/form-data type-safe in Swift. A result builder

Felix Herrmann 21 Dec 29, 2022
Approov Integration Examples 0 Jan 26, 2022
Swift-flows - Simplistic hot and cold flow-based reactive observer pattern for Swift… ideal for MVVM architectures

SwiftFlows Simplistic hot and cold flow-based reactive observer pattern for Swif

Tyler Suehr 0 Feb 2, 2022
Elegant HTTP Networking in Swift

Alamofire is an HTTP networking library written in Swift. Features Component Libraries Requirements Migration Guides Communication Installation Usage

Alamofire 38.7k Jan 8, 2023
Robust Swift networking for web APIs

Conduit Conduit is a session-based Swift HTTP networking and auth library. Within each session, requests are sent through a serial pipeline before bei

Mindbody 52 Oct 26, 2022
Easy to use OAuth 2 library for iOS, written in Swift.

Heimdallr Heimdallr is an OAuth 2.0 client specifically designed for easy usage. It currently supports the resource owner password credentials grant f

trivago N.V. 628 Oct 17, 2022
Swift HTTP for Humans

Just is a client-side HTTP library inspired by python-requests - HTTP for Humans. Features Just lets you to the following effortlessly: URL queries cu

Daniel Duan 1.4k Dec 30, 2022
Network abstraction layer written in Swift.

Moya 14.0.0 A Chinese version of this document can be found here. You're a smart developer. You probably use Alamofire to abstract away access to URLS

Moya 14.4k Jan 1, 2023
Versatile HTTP Networking in Swift

Net is a versatile HTTP networking library written in Swift. ?? Features URL / JSON / Property List Parameter Encoding Upload File / Data / Stream / M

Intelygenz 124 Dec 6, 2022
A type-safe, high-level networking solution for Swift apps

What Type-safe network calls made easy Netswift offers an easy way to perform network calls in a structured and type-safe way. Why Networking in Swift

Dorian Grolaux 23 Apr 27, 2022
OAuth2 framework for macOS and iOS, written in Swift.

OAuth2 OAuth2 frameworks for macOS, iOS and tvOS written in Swift 5.0. ⤵️ Installation ?? Usage ?? Sample macOS app (with data loader examples) ?? Tec

Pascal Pfiffner 1.1k Jan 8, 2023
Swift based OAuth library for iOS

OAuthSwift Swift based OAuth library for iOS and macOS. Support OAuth1.0, OAuth2.0 Twitter, Flickr, Github, Instagram, Foursquare, Fitbit, Withings, L

OAuthSwift 3.1k Jan 6, 2023