Kanna(鉋) is an XML/HTML parser for Swift.

Overview

Kanna(鉋)

Kanna(鉋) is an XML/HTML parser for cross-platform(macOS, iOS, tvOS, watchOS and Linux!).

It was inspired by Nokogiri(鋸).

Build Status Platform Cocoapod Carthage compatible Swift Package Manager Reference Status

ℹ️ Documentation

Features

  • XPath 1.0 support for document searching
  • CSS3 selector support for document searching
  • Support for namespaces
  • Comprehensive test suite

Installation for Swift 5

CocoaPods

Add the following to your Podfile:

use_frameworks!
pod 'Kanna', '~> 5.2.2'

Carthage

Add the following to your Cartfile:

github "tid-kijyun/Kanna" ~> 5.2.2

For xcode 11.3 and earlier, the following settings are required.

  1. In the project settings add $(SDKROOT)/usr/include/libxml2 to the "header search paths" field

Swift Package Manager

  1. Installing libxml2 to your computer:
// macOS: For xcode 11.3 and earlier, the following settings are required.
$ brew install libxml2
$ brew link --force libxml2

// Linux(Ubuntu):
$ sudo apt-get install libxml2-dev
  1. Add the following to your Package.swift:
// swift-tools-version:5.0
import PackageDescription

let package = Package(
    name: "YourProject",
    dependencies: [
        .package(url: "https://github.com/tid-kijyun/Kanna.git", from: "5.2.2"),
    ],
    targets: [
        .target(
            name: "YourTarget",
            dependencies: ["Kanna"]),
    ]
)
$ swift build

Note: When a build error occurs, please try run the following command:

// Linux(Ubuntu)
$ sudo apt-get install pkg-config

Manual Installation

  1. Add these files to your project:
    Kanna.swift
    CSS.swift
    libxmlHTMLDocument.swift
    libxmlHTMLNode.swift
    libxmlParserOption.swift
    Modules
  2. In the target settings add $(SDKROOT)/usr/include/libxml2 to the Search Paths > Header Search Paths field
  3. In the target settings add $(SRCROOT)/Modules to the Swift Compiler - Search Paths > Import Paths field

Installation for swift 4

Installation for swift 3

Synopsis

import Kanna

let html = "<html>...</html>"

if let doc = try? HTML(html: html, encoding: .utf8) {
    print(doc.title)
    
    // Search for nodes by CSS
    for link in doc.css("a, link") {
        print(link.text)
        print(link["href"])
    }
    
    // Search for nodes by XPath
    for link in doc.xpath("//a | //link") {
        print(link.text)
        print(link["href"])
    }
}
let xml = "..."
if let doc = try? Kanna.XML(xml: xml, encoding: .utf8) {
    let namespaces = [
                    "o":  "urn:schemas-microsoft-com:office:office",
                    "ss": "urn:schemas-microsoft-com:office:spreadsheet"
                ]
    if let author = doc.at_xpath("//o:Author", namespaces: namespaces) {
        print(author.text)
    }
}

Donation

If you like Kanna, please donate via GitHub sponsors or PayPal.
It is used to improve and maintain the library.

License

The MIT License. See the LICENSE file for more information.

Comments
  • Incorrect decoding data for nodes swift 3

    Incorrect decoding data for nodes swift 3

    Description:

    When my program was on swift 2 and use old version of library, all was fine. Now with version 2.0 and swift 3 my xml data decode incorrectly.

    I have next code:

    pageEncoding:String.Encoding = .utf8
    
    guard let xmlConfing = self.defaults.object(forKey: "xmlConfing") as? Data else {
                return []
            }
    
    guard let doc = Kanna.XML(xml: xmlConfing, encoding: pageEncoding) else {
                return []
            }
    
    let current = doc.xpath("//ROOT/*").map({ $0 })
    let siteMatches = current.xpath("./MATCHES/*").map({ $0 })
    
    for siteMatch in siteMatches {
                print(siteMatch.toXML)
            }
    

    return meOptional("<STORE><><tem></tem></></STORE>")

    Here is my xml:

    
    <ROOT>     
        <NODENAME>         
              <MATCHES>             
                    <STORE>
                          <STORE_NAME>My name</STORE_NAME>
                    </STORE>
               </MATCHES>
         </NODENAME>  
    </ROOT> 
    

    Nothing changed on server. This code worked before:(

    If i do: let current = doc.xpath("//ROOT/*").map({ $0.toXML }) it show correct string but with \n at the end of every line.

    And sometimes this happens:

    screen shot 2016-10-04 at 16 33 58 But i'm not sure it is my problem with decoding.

    Installation method:

    • [x] CocoaPods(1.1.0 or later)

    Library version:

    • [X] v2.0.0

    Xcode version:

    • [X] 8.0 (Swift 3.0)
    Bug 
    opened by Arti3DPlayer 23
  • Swift Build not Working on Ubuntu 16.04

    Swift Build not Working on Ubuntu 16.04

    Description:

    Swift Version:

    Swift version 3.0.2 (swift-3.0.2-RELEASE)
    Target: x86_64-unknown-linux-gnu
    

    Get a bunch of errors, including:

    /usr/include/libxml2/libxml/tree.h:17:10: error: 'libxml/xmlversion.h' file not found
    #include <libxml/xmlversion.h>
             ^
    

    Which doesn't seem right, as xmlversion.h definitely exists in /usr/include/libxml2.

    Package.swift:

    import PackageDescription
    
    let package = Package(
        name: "cl_scraper",
    
        dependencies: [
            .Package(url: "https://github.com/tid-kijyun/Kanna.git", majorVersion: 2)
        ]
    )
    

    Installation method:

    • [ ] Carthage
    • [ ] CocoaPods(1.1.0 or later)
    • [x] Swift Package Manager
    • [ ] Manually
    • [ ] other: ()

    Library version:

    • [x] v2.1.1
    • [ ] other: ()

    Xcode version:

    • [ ] 8.1 (Swift 3)
    • [ ] 8.1 (Swift 2.3)
    • [ ] 7.3.1
    • [x] other: (On Linux)
    opened by moosichu 21
  • extension XMLNodeSet / Argument passed to call that takes no arguments

    extension XMLNodeSet / Argument passed to call that takes no arguments

    I just manually implemented Kanna and got the following parser error in Kanna.swift: Edit: The error occurs even when it's installed via CocoaPods.

    bildschirmfoto 2016-03-09 um 17 21 11

    Tested with OS X El Capitan 10.11.3, Xcode 7.2.1, Swift-Project

    Duplicate 
    opened by ixeau 21
  • Could not build Objective-C module 'libxml2'

    Could not build Objective-C module 'libxml2'

    Description:

    I ran both the brew install libxml2 and brew link --force libxml2 commands successfully, I added the header search path $(SDKROOT)/usr/include/libxml2, and I still get these build errors:

    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator10.2.sdk/usr/include/libxml2/libxml/HTMLtree.h:15:10: 'libxml/xmlversion.h' file not found

    Could not build Objective-C module 'libxml2'

    Installation method:

    • [x] Carthage
    • [ ] CocoaPods(1.1.0 or later)
    • [ ] Swift Package Manager
    • [ ] Manually
    • [ ] other: ()

    Library version:

    • [x] v2.1.1
    • [ ] other: ()

    Xcode version:

    • [x] 8.2.1 (Swift 3.0.2)
    • [ ] 8.1 (Swift 2.3)
    • [ ] 7.3.1
    • [ ] other: ()
    opened by mhillebrand 20
  • error: could not build Objective-C module 'libxml2' when use pod Kanna 1.0.6 with CocoaPods 1.0.0

    error: could not build Objective-C module 'libxml2' when use pod Kanna 1.0.6 with CocoaPods 1.0.0

    It works when use use pod Kanna 1.0.2 with CocoaPods 1.0.0.

    <module-includes>:1:9: note: in file included from <module-includes>:1:
    #import "libxml2-kanna.h"
            ^
    /Users/nix/dev/github/clones/Yep/Pods/Kanna/Modules/libxml2-kanna.h:1:9: note: in file included from /Users/nix/dev/github/clones/Yep/Pods/Kanna/Modules/libxml2-kanna.h:1:
    #import <libxml2/libxml/HTMLtree.h>
            ^
    /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator9.3.sdk/usr/include/libxml2/libxml/HTMLtree.h:15:10: error: 'libxml/xmlversion.h' file not found
    #include <libxml/xmlversion.h>
             ^
    <unknown>:0: error: could not build Objective-C module 'libxml2'
    
    opened by nixzhu 18
  • No such module - libxml2 : Build fails

    No such module - libxml2 : Build fails

    Hi, i have followed the steps mentioned in the link :https://github.com/tid-kijyun/Kanna. But i keep getting the error : No such module - libxml2.

    After using your files, my project looks like this - screen shot 2016-02-17 at 6 32 04 pm Also i have set the Swift Compiler - Search Paths > Import Paths as "$(SRCROOT)/Bhagavad Geetha Book/Modules".

    Also, I am a little unclear with - "Step : 1 Add files to your project:" and "Step : 2 Copy folder to your project:" . Hopefully i have set your files correctly.

    Kindly let me know if i have missed anything. Thanks in advance.

    opened by Kavisha-Dev 18
  • Getting build error after adding Kanna to a Swift 3.0 project

    Getting build error after adding Kanna to a Swift 3.0 project

    Description:

    I get these build errors after adding Kanna using CocoaPods (1.1.1) /..../Kanna.swift:64:12: Argument labels '(xml:, encoding:, option:)' do not match any available overloads this repeats for all of the XML and HTML functions in Kanna.swift

    Installation method:

    • [ ] Carthage
    • [X] CocoaPods(1.1.0 or later) 1.2.0 beta 1
    • [ ] CocoaPods(older)
    • [ ] Manually

    Library version:

    • [ ] v2.0.0
    • [ ] v1.1.1
    • [ ] v1.0.6
    • [ ] v1.0.2
    • [X] other: (Please fill in the version you are using.) 2.1.1

    Xcode version:

    • [ ] 8.0 (Swift 3.0)
    • [ ] 8.0 (Swift 2.3)
    • [ ] 7.3.1
    • [ ] other: (Please fill in the version you are using.)
    opened by mkostersitz 10
  • Build Error

    Build Error

    Description:

    I created an empty swift 3.0 project with swift package init --type executable command, then added dependency that is written in README.md file for Kanna. Then I try to swift build but I have many compile errors : /Packages/Kanna-2.0.1/Sources/libxmlHTMLNode.swift:111:30: error: use of undeclared type 'htmlDocPtr' fileprivate var docPtr: htmlDocPtr? = nil This one is the first one.

    Installation method:

    • [ ] Carthage
    • [ ] CocoaPods(1.1.0 or later)
    • [ ] CocoaPods(older)
    • [X] Manually

    Library version:

    • [ ] v2.0.0
    • [ ] v1.1.1
    • [ ] v1.0.6
    • [ ] v1.0.2
    • [X] other: 2.0.1

    Xcode version:

    • [X] 8.0 (Swift 3.0)
    • [ ] 8.0 (Swift 2.3)
    • [ ] 7.3.1
    • [ ] other: (Please fill in the version you are using.)

    Am I doing something wrong ?

    opened by CedricEugeni 10
  • Does not parse latin characters and cedilla

    Does not parse latin characters and cedilla

    Whenever I parse content that contains Spanish letters like ç, ó etc it fails to do so. Instead, i get outputs like - ó for ó ñ for ñ ç for ç Help would be much appreciated!

    opened by vihanggodbole 9
  • Kanna was compiled with optimization - stepping may behave oddly; variables may not be available.

    Kanna was compiled with optimization - stepping may behave oddly; variables may not be available.

    So I was using Kanna to parse some HTML on my iOS app.

    Everything works well on Debug mode no problem.

    When I tested the app on TestFlight it just crashed. Took me a while to figure out the problem. It happens only in Release mode so I went ahead and changed the scheme to Release mode.

    Firing up the app it crashes as the TestFlight version, as "expected".

    The problem resides in the class libxmlHTMLNode in the following method:

    private func libxmlGetNodeContent(nodePtr: xmlNodePtr) -> String? {
        let content = xmlNodeGetContent(nodePtr)
        if let result  = String.fromCString(UnsafePointer(content)) {
            content.dealloc(1)
            return result
        }
        content.dealloc(1)
        return nil
    }
    

    The line let content = xmlNodeGetContent(nodePtr) has a EXC_BAD_ACCESS and when I try to print out the nodePtr I get the following message:

    Kanna was compiled with optimization - stepping may behave oddly; variables may not be available.

    So I went ahead and turned `Swift Compiler Optimization Level to None[-Onone] and the app runs propperly.

    My app is a Swift app using Kanna installed via CocoaPods.

    Not sure what I can do. Anybody had this problem before?

    opened by nunogoncalves 9
  • 'libxml/xmlversion.h' file not found

    'libxml/xmlversion.h' file not found

    Description:

    I've solved my issue already, but I'm posting my solution in case someone is experiencing the same problem.

    After updating version from 2.0.0 to 2.2.1 the following error started happening 'libxml/xmlversion.h' file not found Could not build Objective-C module 'libxml2'

    Right now I'm using Kanna on my private lib that loads it from .podspec file

    I was able to replicate my solution by adding: s.xcconfig = { 'HEADER_SEARCH_PATHS' => '$(SDKROOT)/usr/include/libxml2', 'SWIFT_INCLUDE_PATHS' => '$(SRCROOT)/Kanna/Modules' }

    to my .podspec right above: s.dependency 'Kanna', '~> 2.2.1'

    That narrowed the error down to the "Project name"_Tests target and I've noticed that some people mentioned about adding "$(SDKROOT)/usr/include/libxml2" to the header search paths, my target already had that path, but the "Project name"_Tests target didn't, once I've added the the search path to the "Project name"_Tests, cleaned the project, crossed my fingers and finally hit pod update.

    It cleared the error for me, hopefully it will help someone else

    PS. I don't know all the in and outs of cocoapods, neither Kanna, and I have no idea what I did, but I was able to replicate the solution a few times, feel free to explain or tell me why/how I'm wrong

    Installation method:

    • [ ] Carthage
    • [x] CocoaPods(1.1.0 or later)
    • [ ] Swift Package Manager
    • [ ] Manually
    • [ ] other: ()

    Library version:

    • [x] v2.1.1
    • [ ] other: ()

    Xcode version:

    • [x] 8.1 (Swift 3)
    • [ ] 8.1 (Swift 2.3)
    • [ ] 7.3.1
    • [ ] other: ()
    opened by gschafer 8
  • What is the best way to handle a Base64 encoded image?

    What is the best way to handle a Base64 encoded image?

    Description:

    When I parse img urls from a website I am getting Base64 image urls. Is there any way to get the actual url to the image or is there a way to convert this to an image url or even an actual image?

    Example: I was parsing polygon.com and some of the image urls were formatted in Base64 like this:

    

    Installation method:

    • [x] CocoaPods

    Kanna version (or commit hash):

    5.2.7

    swift --version

    5

    opened by PatrickAdams 0
  • Kanna is very good ,but  i dont know how to use namespace。can you help me?

    Kanna is very good ,but i dont know how to use namespace。can you help me?

    Description:

    for link in doc.css("", namespaces: [:]) for link in doc.at_css("", namespaces:"")

    How can I use it? Can you give me a more personal example? Not just what I just suggested,All examples thanks

    Installation method:

    • [ ] Carthage
    • [ ] CocoaPods
    • [ ] Swift Package Manager
    • [ ] Manually
    • [ ] other: ()

    Kanna version (or commit hash):

    swift --version

    Xcode version (optional):

    opened by tianbinbin 0
  • 请问能支持XPath 函数吗?

    请问能支持XPath 函数吗?

    作者您好: 在解析网页的时候,我使用text()函数无法解析,不知道是什么原因? 麻烦作者能帮我解决这个难题~~

    这样写,可以显示数据 htmlDocument.xpath("//div[@id='content']").first?.text

    而这样写,加了一个text(),无法解析数据出来: htmlDocument.xpath("//div[@id='content']/text()").first?.text

    opened by kuangtao22 0
  • HTML of Kanna instance stripped down on Xcode 12

    HTML of Kanna instance stripped down on Xcode 12

    Description:

    I am seeing some strange behavior that I have not seen before. If I create a Kanna instance and then access the .toHTML property of that instance, it does not return the full HTML of the instance. The weird thing is that it is different depending on the deployment target. When building for iOS 14 the .toHTML property returns almost all the raw data, but if I am building for iOS 13.5 it returns only a small portion. I noticed the issue when starting to use Xcode 12 and its toolchain.

    Installation method:

    • [ X] CocoaPods

    Kanna version:

    5.2.2

    swift --version:

    5.3

    Xcode version:

    Version 12.0 (12A7209)

    How to reproduce:

    Create a Kanna instance with some HTML:

    let document = try! Kanna.HTML(html: htmlText, encoding: String.Encoding.utf8)
                
    print(document.toHTML)
    

    When looking at the console log you can see that the HTML is not complete.

    Anyone else seeing this?

    Regards, Erik

    wontfix 
    opened by fishfisher 6
  • Chained xpaths are searching from root level

    Chained xpaths are searching from root level

    Description:

    All node xpaths are calling for root document level, not for node. For example, this test will fail.

    func testInnerXpath() {
        let input = """
                    <html>
                    <head>
                        <title>test title</title>
                    </head>
                    <body>
                        <div id="1"><div><h1>test header 1</h1></div></div>
                        <div id="2"><div><h1>test header 2</h1></div></div>
                    </body>
                    </html>
                    """
        do {
            let doc = try HTML(html: input, encoding: .utf8)
            //all this asserts will fail:
            XCTAssertNil(doc.at_xpath("//head")?.at_xpath("//h1")?.toHTML)
            XCTAssertNil(doc.at_xpath("//head")?.at_xpath("//body")?.toHTML)
            XCTAssertNil(doc.at_xpath("//body")?.at_xpath("//title")?.toHTML)
            XCTAssertEqual(doc.at_xpath("//body/div[@id='2']")?.at_xpath("//h1")?.text, "test header 2")
            //only this assert is ok, passes:
            XCTAssertEqual(doc.at_xpath("//body/div[@id='2']//h1")?.text, "test header 2")
        } catch {
            XCTFail("Abnormal test data")
        }
    }
    

    Is it bug or feature?

    I've started implementing fix of this problem I'm casting xmlNodePtr to xmlDocPtr and initing xmlXPathNewContext with this casted object and then all xpaths starting work properly.

    Feature 
    opened by anivaros 1
  • 标题解析不出来

    标题解析不出来

    Description:

    https://mp.weixin.qq.com/s/kQTxb7CO0njHX-yYOjNjww,微信公众号标题解析有问题,master 最新的版本

    Installation method:

    • [ ] Carthage
    • [ ] CocoaPods
    • [ ] Swift Package Manager
    • [ ] Manually
    • [ ] other: ()

    Kanna version (or commit hash):

    swift --version

    5.0.0

    Xcode version (optional): 11.4

    opened by caolin358688599 2
Releases(5.2.7)
Owner
Atsushi Kiwaki
Atsushi Kiwaki
Ji (戟) is an XML/HTML parser for Swift

Ji 戟 Ji (戟) is a Swift wrapper on libxml2 for parsing XML/HTML. Features Build XML/HTML Tree and Navigate. XPath Query Supported. Comprehensive Unit T

HongHao Zhang 824 Dec 15, 2022
A sensible way to deal with XML & HTML for iOS & macOS

Ono (斧) Foundation lacks a convenient, cross-platform way to work with HTML and XML. NSXMLParser is an event-driven, SAX-style API that can be cumbers

Mattt 2.6k Dec 14, 2022
Simple XML Parser implemented in Swift

Simple XML Parser implemented in Swift What's this? This is a XML parser inspired by SwiftyJSON and SWXMLHash. NSXMLParser in Foundation framework is

Yahoo! JAPAN 531 Jan 1, 2023
SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)

SwiftSoup is a pure Swift library, cross-platform (macOS, iOS, tvOS, watchOS and Linux!), for working with real-world HTML. It provides a very conveni

Nabil Chatbi 3.7k Jan 6, 2023
Swift minion for simple and lightweight XML parsing

AEXML Swift minion for simple and lightweight XML parsing I made this for personal use, but feel free to use it or contribute. For more examples check

Marko Tadić 975 Dec 26, 2022
CheatyXML is a Swift framework designed to manage XML easily

CheatyXML CheatyXML is a Swift framework designed to manage XML easily. Requirements iOS 8.0 or later tvOS 9.0 or later Installation Cocoapods If you'

Louis Bodart 24 Mar 31, 2022
The most swifty way to deal with XML data in swift 5.

SwiftyXML SwiftyXML use most swifty way to deal with XML data. Features Infinity subscript dynamicMemberLookup Support (use $ started string to subscr

Kevin 99 Sep 6, 2022
Simple XML parsing in Swift

SWXMLHash SWXMLHash is a relatively simple way to parse XML in Swift. If you're familiar with NSXMLParser, this library is a simple wrapper around it.

David Mohundro 1.3k Jan 3, 2023
Easy XML parsing using Codable protocols in Swift

XMLCoder Encoder & Decoder for XML using Swift's Codable protocols. This package is a fork of the original ShawnMoore/XMLParsing with more features an

Max Desiatov 657 Dec 30, 2022
A simple way to map XML to Objects written in Swift

XMLMapper XMLMapper is a framework written in Swift that makes it easy for you to convert your model objects (classes and structs) to and from XML. Ex

Giorgos Charitakis 109 Jan 6, 2023
Generate styled SwiftUI Text from strings with XML tags.

XMLText is a mini library that can generate SwiftUI Text from a given XML string with tags. It uses AttributedString to compose the final text output.

null 15 Dec 7, 2022
Fetch a XML feed and parse it into objects

AlamofireXmlToObjects ?? This is now a subspec of EVReflection and the code is maintained there. ?? You can install it as a subspec like this: use_fra

Edwin Vermeer 65 Dec 29, 2020
📄 A Swift DSL for writing type-safe HTML/CSS in SwiftUI way

?? swift-web-page (swep) Swep is a Swift DSL for writing type-safe HTML/CSS in SwiftUI way. Table of Contents Motivation Examples Safety Design FAQ In

Abdullah Aljahdali 14 Dec 31, 2022
Swift package to convert a HTML table into an array of dictionaries.

Swift package to convert a HTML table into an array of dictionaries.

null 1 Jun 18, 2022
Mongrel is a Swift and HTML hybrid with a bit of support for CSS and Javascript.

Mongrel is a Swift and HTML hybrid with a bit of support for CSS and Javascript. Using a declaritive style of programming, Mongrel makes writing HTML feel natural and easy. Mongrel also uses a SwiftUI like body structure allowing structs to be completely dedicated as an HTML page or element.

Nicholas Bellucci 12 Sep 22, 2022
Convert text with HTML tags, links, hashtags, mentions into NSAttributedString. Make them clickable with UILabel drop-in replacement.

Convert text with HTML tags, links, hashtags, mentions into NSAttributedString. Make them clickable with UILabel drop-in replacement.

Pavel Sharanda 1.1k Dec 26, 2022
An Objective-C framework for your everyday HTML needs.

HTMLKit An Objective-C framework for your everyday HTML needs. Quick Overview Installation Parsing The DOM CSS3 Selectors Quick Overview HTMLKit is a

Iskandar Abudiab 229 Dec 12, 2022
Mathias Köhnke 1.1k Dec 16, 2022
Kanna(鉋) is an XML/HTML parser for Swift.

Kanna(鉋) Kanna(鉋) is an XML/HTML parser for cross-platform(macOS, iOS, tvOS, watchOS and Linux!). It was inspired by Nokogiri(鋸). ℹ️ Documentation Fea

Atsushi Kiwaki 2.3k Dec 31, 2022
A fast & lightweight XML & HTML parser in Swift with XPath & CSS support

Fuzi (斧子) A fast & lightweight XML/HTML parser in Swift that makes your life easier. [Documentation] Fuzi is based on a Swift port of Mattt Thompson's

Ce Zheng 994 Jan 2, 2023