Swift Parser Combinators

Overview

Parsey

Swift Parser Combinator Framework

In addition to simple combinators, Parsey supports source location/range tracking, backtracking prevention, and custom error messages.

Parsey Playground

Features

  • Combinator interface

    • |, ~~, ~~>, <~~, ^^ combinator operators
  • Lexer primitives:

    • Lexer.whitespace, Lexer.signedInteger, ...
  • Regex-like combinators:

    • Postfix .+ for .many().
      • Example: let arrayLiteral = "[" ~~> expression.+ <~~ "]"
    • Postfix .* for .manyOrNone().
      • Example: let classDef = (attribute | method).*
    • Postfix .? for .optional().
      • Example: let declaration = "let" ~~> id ~~ (":" ~~> type).? ~~ ("=" ~~> expression)
    • Postfix + for .manyConcatenated().
      • Example: let skippedSpaces = (Lexer.space | Lexer.tab)+
    • Infix + for .concatenatingResult(with:).
      • Example: let type = Lexer.upperLetter + Lexer.letter*
    • Lexer.regex(_:) for directly applying regular expressions.
      • Example: let id = Lexer.regex("[a-zA-Z][a-zA-Z0-9]*")
  • Backtracking prevention

    • .! postfix operator or .nonbacktracking()
  • Parser tagging for error messages

    • <!-- operator or .tagged(_:)
  • Rich error messages with source location

    • For example:
    Parse failure at 2:4 ----
    (+ %% 1 -20) 2 3)
       ^~~~~~~~~~~~~~
    Expecting an expression, but found "%"
    
  • Source range tracking

    • ^^^ operator or .mapParse(_:)
    • For example, S-expression \n(+ \n\n(+ +1 -20) 2 3) gets parsed to the following range-tracked AST:
    Expr:(2:1..<4:16):[
        ID:(2:2..<2:3):+,
        Expr:(4:1..<4:11):[
            ID:(4:2..<4:3):+,
            Int:(4:4..<4:6):1,
            Int:(4:7..<4:10):-20],
        Int:(4:12..<4:13):2,
        Int:(4:14..<4:15):3]
    

Requirements

  • Swift 3

  • Any operating system

Package

To use it in your Swift project, add the following dependency to your Swift package description file.

    .Package(url: "https://github.com/rxwei/Parsey", majorVersion: 1)

Examples

0️⃣ An LLVM Compiler Frontend written in Swift using Parsey

The COOL Programming Language

1️⃣ Parse Left-associative Infix Expressions with Operator Precedence

indirect enum Expression {
    case integer(Int)
    case symbol(String)
    case infix(String, Expression, Expression)
}

enum Grammar {
    static let integer = Lexer.signedInteger
        ^^ {Int($0)!} ^^ Expression.integer

    static let symbol = Lexer.regex("[a-zA-Z][0-9a-zA-Z]*")
        ^^ Expression.symbol

    static let addOp = Lexer.anyCharacter(in: "+-")
        ^^ { op in { Expression.infix(op, $0, $1) } }
    
    static let multOp = Lexer.anyCharacter(in: "*/")
        ^^ { op in { Expression.infix(op, $0, $1) } }

    /// Left-associative multiplication
    static let multiplication = (integer | symbol).infixedLeft(by: multOp)

    /// Left-associative addition
    static let addition = multiplication.infixedLeft(by: addOp)

    static let expression: Parser<Expression> = addition
}

try print(Grammar.expression.parse("2"))
/// Output:
/// Expression.integer(2)

try print(Grammar.expression.parse("2+1+2*a"))
/// Output:
/// Expression.infix("+",
///                  .infix("+", .integer(2), .integer(1)),
///                  .infix("*", .integer(2), .symbol("a")))

2️⃣ Parse S-Expressions

indirect enum Expr {
    case sExp([Expr])
    case int(Int)
    case id(String)
}

enum Grammar {
    static let whitespaces = (Lexer.space | Lexer.tab | Lexer.newLine)+
    static let anInt = Lexer.signedInteger ^^ { Int($0)! } ^^ Expr.int
    static let anID = Lexer.regex("[a-zA-Z_+\\-*/][0-9a-zA-Z_+\\-*/]*") ^^ Expr.id
    static let aSExp: Parser<Expr> =
        "(" ~~> (anExp.!).many(separatedBy: whitespaces).amid(whitespaces.?) <~~ ")"
        ^^ Expr.sExp
    static let anExp = anInt | anID | aSExp <!-- "an expression"
}

/// Success
try Grammar.anExp.parse("(+ (+ 1 -20) 2 3)")
/// Output: Expr.sExp(...)

/// Failure
try Grammar.anExp.parse("(+ \n(+ %% 1 -20) 2 3)")
/// Output: Parse failure at 2:4 ----
///         (+ %% 1 -20) 2 3)
///            ^~~~~~~~~~~~~~
///         Expecting an expression, but found "%"

3️⃣ Parse S-Expressions with Source Range Tracking

indirect enum Expr {
    case sExp([Expr], SourceRange)
    case int(Int, SourceRange)
    case id(String, SourceRange)
}

enum Grammar {
    static let whitespaces = (Lexer.space | Lexer.tab | Lexer.newLine)+

    static let anInt = Lexer.signedInteger 
        ^^^ { Expr.int(Int($0.target)!, $0.range) }

    static let anID = Lexer.regex("[a-zA-Z_+\\-*/][0-9a-zA-Z_+\\-*/]*")
        ^^^ { Expr.id($0.target, $0.range) }

    static let aSExp: Parser<Expr> =
        "(" ~~> (anExp.!).many(separatedBy: whitespaces).amid(whitespaces.?) <~~ ")"
        ^^^ { Expr.sExp($0.target, $0.range) }

    static let anExp = anInt | anID | aSExp <!-- "an expression"
}

/// Success
try Grammar.anExp.parse("(+ (+ 1 -20) 2 3)")
/// Output: Expr.sExp(...)

/// Failure
try Grammar.anExp.parse("(+ \n(+ %% 1 -20) 2 3)")
/// Output: Parse failure at 2:4 ----
///         (+ %% 1 -20) 2 3)
///            ^~~~~~~~~~~~~~
///         Expecting an expression, but found "%"

Dependency

License

MIT License

Comments
  • Issue

    Issue "closure argument was escaped..."

    Hello,

    I use the library "LogicKit" (By @kyouko-taiga ) which has dependencies on your library. There are a lot of tests in "ProofKit" that I launched and all worked before (I suppose before swift 4.2).

    An issue is detected in the file parser.swift at the line 317.

    Here a screen of what I have.

    If you want to check the test, you just need to launch tests of "LogicKit", in the file LogicKitParserTests.swift

    I tried to find with the debugger which line can throw this error, and it seems there is one (but not the only one): XCTAssertEqual(try Grammar.variable.parse(".var(\"x\")") , .var("x")) (Line 8 in the link above)

    If you need more informations tell me. Maybe it's just a problem with the new version of swift.

    opened by damdamo 4
  • error: 'Parse<Target>' initializer is inaccessible due to 'internal' protection level

    error: 'Parse' initializer is inaccessible due to 'internal' protection level

    I am trying to write a parser for IP numbers, e.g. "192.168.140.1". The parser should fail, if one of the numbers is invalid. I thought I could write a combinator to define the constraints. Then it would be possible to write: Lexer.unsignedInteger.validRange(0,255)

    What I am getting is: error: 'Parse' initializer is inaccessible due to 'internal' protection level

    enhancement 
    opened by jerbs 4
  • swift 3.1 build errors in release configuration

    swift 3.1 build errors in release configuration

    Running swift build -c release
    Compile Swift Module 'Funky' (10 sources)
    Compile Swift Module 'Parsey' (5 sources)
    sourceRoot/Packages/Parsey/Sources/Combinators.swift:71:62: error: var 'tagged' is internal and cannot be referenced from an '@inline(__always)' function
                catch var failure as ParseFailure where !failure.tagged {
                                                                 ^
    sourceRoot/Packages/Parsey/Sources/Error.swift:19:18: note: var 'tagged' is not '@_versioned' or public
        internal var tagged: Bool = false
                     ^
    sourceRoot/Packages/Parsey/Sources/Combinators.swift:72:25: error: instance method 'tag' is internal and cannot be referenced from an '@inline(__always)' function
                    failure.tag(tag)
                            ^
    sourceRoot/Packages/Parsey/Sources/Error.swift:32:28: note: instance method 'tag' is not '@_versioned' or public
        internal mutating func tag(_ tag: String) {
                               ^
    sourceRoot/Packages/Parsey/Sources/Combinators.swift:71:62: error: var 'tagged' is internal and cannot be referenced from an '@inline(__always)' function
                catch var failure as ParseFailure where !failure.tagged {
                                                                 ^
    sourceRoot/Packages/Parsey/Sources/Error.swift:19:18: note: var 'tagged' is not '@_versioned' or public
        internal var tagged: Bool = false
                     ^
    sourceRoot/Packages/Parsey/Sources/Combinators.swift:72:25: error: instance method 'tag' is internal and cannot be referenced from an '@inline(__always)' function
                    failure.tag(tag)
                            ^
    sourceRoot/Packages/Parsey/Sources/Error.swift:32:28: note: instance method 'tag' is not '@_versioned' or public
        internal mutating func tag(_ tag: String) {
                               ^
    <unknown>:0: error: build had 1 command failures
    

    Using swift 3.1 snapshot.

    LMK if there's any more information that would be useful.

    opened by dtzWill 4
  • Fix a bug that would trigger infinite recursion in `anyCharacter(in:)`

    Fix a bug that would trigger infinite recursion in `anyCharacter(in:)`

    I found a bug that I introduced by removing references to character views. Because String.subscript now longer returns a character view, the function anyCharacter(in: String) would call itself. I fixed it by creating an array that wraps the the substring.

    If this raises concerns with respect to performances, I guess I could explicitly create a substring with suffix(from:) instead. That said I don't think this would change anything, as I believe the created array doesn't involve any copy of the original string.

    opened by kyouko-taiga 1
  • Struggling to understand `suffixed(by:)`

    Struggling to understand `suffixed(by:)`

    I'm trying to write a grammar for a language that supports function calls with the usual syntax, i.e.

    f()
    f(a, b)
    f(a, b)(c)
    ...
    

    Since I'm not very familiar with parser combinators, the approach I took is quite similar to what I've done with LALR grammars, but couldn't managed to get where I wanted. In the search for a solution, I came across the suffix operation parser (Parser.suffixed(by:)) in the code, which seems quite promising. But unfortunately, it didn't bring me any more success so far.

    Trying to strip as much as I can, here's a minimal example:

    static let callExpr = atom.suffixed(by: "(" ~~> callArgs <~~ ")")
    static let callArgs =
            expr.many(separatedBy: comma) <~~ comma.?
            ^^^ { (args, loc) in { CallExpr(callee: $0, arguments: args, location: loc) as Node } }
    static let ident = Lexer.regex("[a-zA-Z_]\\w*")
            ^^^ { (val, loc) in Ident(name: val, location: loc) }
    static let atom: Parser<Node> = callExpr | ident ^^ { $0 as Node }
    

    This compiles just fines, but fails miserably with signal 4.

    Any help would be amazingly appreciated ;)

    opened by kyouko-taiga 1
  • Tag 2.1.0 is missing files

    Tag 2.1.0 is missing files

    If I include

    .package(url: "https://github.com/rxwei/Parsey.git", from: "2.1.0"),
    

    ... everything breaks :-(

    It appears that git tag 2.1.0 is missing all source files except 'Functional' and 'Operators' -- see: https://github.com/rxwei/Parsey/tree/2.1.0/Sources/Parsey

    The only way to make swift compile the code is to go to .build/checkouts/... and manually check out master branch.

    I'd kindly ask to include missing files and make a new release.

    opened by pkese 1
  • Strange issue related to the use of substring in ParserInput

    Strange issue related to the use of substring in ParserInput

    I discovered a strange issue that seems to be linked to the use of substring in ParserInput.

    My apologies, I wasn't able to create a minimal example. But the issue can be observed by running the tests of my LogicKit library, with the latest changes merged in master. At line 44 of Parser.swift, I parse conjunction of terms, using infixedLeft(by:). Everything works fine with two terms, but if I attempt to parse three, the parser produces completely invalid results (In that particular example, an invalid value of the enum) that may even trigger runtime errors.

    Using a String rather than a Substring for the property ParserInput.stream seems to solve the problem. Therefore, my guess is that something wrong is happening with the pointers of the substring when the parser backtracks. Unfortunately I don't have any evidence to support that claim, and I don't know how the current implementation differs from the previous one (which was based on character views).

    Any idea on the source of the problem? Also, I understand some concerns were raised about the use of strings in ParserInput, as they may involve a lot of unnecessary copies. But maybe there is an intermediate solution that could rely on Swift's default copy-on-write behavior since strings are value-types? Maybe some benchmarks could shed some light on the issue.

    opened by kyouko-taiga 2
  • Is it possible to completely backtrack after `suffixed(by:)`?

    Is it possible to completely backtrack after `suffixed(by:)`?

    Let's say I have two parsers, one for call expressions, and one for subscript expressions:

    static let callExpr = atom.suffixed(by: "(" ~~> callArgs <~~ ")")
    static let callArgs =
        (callArg.many(separatedBy: comma) <~~ comma.?).?
        ^^^ { (args, loc) in
            return { CallExpr(callee: $0, arguments: args ?? [], location: loc) as Node }
        }
    
    static let subscriptExpr = atom.suffixed(by: "[" ~~> subscriptArgs <~~ "]")
    static let subscriptArgs =
        callArg.many(separatedBy: comma) <~~ comma.?
        ^^^ { (args, loc) in
            return { SubscriptExpr(callee: $0, arguments: args, location: loc) as Node }
        }
    
    static let atom = ident | literal | "(" ~~> expr <~~ ")"
    static let expr = callExpr | subscriptExpr | atom
    

    Both those parsers can successfully parse things like f(1, 2) and m[0, 8] respectively. My problem is to find a way to combine them in the expr parser.

    The way I presented it above, the expr parser can no longer parse subscript expressions. It first tries to parse a call expression, successfully parsing the callee but then failing to parse the arguments. And because it doesn't backtrack until before the callee, it is left with an input of the form [0, 8], which can't be parsed by the remaining parsers.

    Maybe I could create a function Parser<(Node) -> Node>, that could choose between call and subscript expressions and I'd be left with a single atom.suffixed(by ...) parser. But this solution seems rather complex and difficult to maintain if I add things like dot and suffix expressions later on.

    I guess the best would be if I could somehow tell the expr parser to completely backtrack after any of its options failed.

    opened by kyouko-taiga 2
Releases(1.7.0)
Owner
Richard Wei
Richard Wei
Marky Mark is a parser written in Swift that converts markdown into native views.

Marky Mark is a parser written in Swift that converts markdown into native views. The way it looks it highly customizable and the supported markdown syntax is easy to extend.

M2mobi 287 Nov 29, 2022
Leverages Apple's Swift-based Markdown parser to output NSAttributedString.

Markdownosaur ?? Markdownosaur uses Apple's excellent and relatively new Swift Markdown library to analyze a Markdown source, and then takes that anal

Christian Selig 232 Dec 20, 2022
A simple Cron Parser with Swift

A simplified cron parser Requirements Swift 5.5 Command line tools How to Run Cr

null 0 Dec 21, 2021
Pure Swift 2.0 S-expression Parser

SwiftExP This is an S-expression parser written in pure Swift 2.0, where pure means without making any necessary usage of Foundation APIs at its core.

Marius Rackwitz 16 Jul 25, 2021
Markdown parser for iOS

Marky Mark Marky Mark is a parser written in Swift that converts markdown into native views. The way it looks it highly customizable and the supported

M2mobi 254 Jun 11, 2021
More powerful label, attributed string builder and text parser.

DDText More powerful label, attributed string builder and text parser. DDLabel More powerful label than UILabel, using TextKit. It supports features b

Daniel 16 Nov 8, 2022
Markdown parser for iOS

Marky Mark Marky Mark is a parser written in Swift that converts markdown into native views. The way it looks it highly customizable and the supported

M2mobi 262 Nov 23, 2021
BonMot is a Swift attributed string library

BonMot (pronounced Bon Mo, French for good word) is a Swift attributed string library. It abstracts away the complexities of the iOS, macOS, tvOS, and

Rightpoint 3.4k Dec 30, 2022
Croc is a swift emoji string parsing library

Croc is a library for parsing emojis on iOS. It provides a simple and lightweight interface for detecting, generating, categorizing and managing emoji characters, making emoji-powered features an easy task for developers.

Joe Kalash 127 Nov 20, 2022
Fully open source text editor for iOS written in Swift.

Edhita Fully open source text editor for iOS written in Swift. http://edhita.bornneet.com/ What Edhita means? Edhita (Romaji) == エディタ (Katakana) == Ed

Tatsuya Tobioka 1.2k Jan 1, 2023
Great Swift String Pluralize Extension

Pluralize.swift Great Swift String Pluralize Extension case-insensitive tons of rules for irregular nouns (plural form) supports uncountable nouns all

Joshua Arvin Lat 193 Nov 8, 2022
An NSPredicate DSL for iOS, OSX, tvOS, & watchOS. Inspired by SnapKit and lovingly written in Swift.

PrediKit A Swift NSPredicate DSL for iOS & OS X inspired by SnapKit, lovingly written in Swift, and created by that weird dude at KrakenDev. If you're

Hector Matos 542 Sep 24, 2022
PySwiftyRegex - Easily deal with Regex in Swift in a Pythonic way

PySwiftyRegex Easily deal with Regex in Swift in a Pythonic way.

Ce Zheng 232 Oct 12, 2022
Regular expressions for swift

Regex Advanced regular expressions for Swift Goals Regex library was mainly introduced to fulfill the needs of Swift Express - web application server

Crossroad Labs 328 Nov 20, 2022
👩‍🎨 Elegant Attributed String composition in Swift sauce

Elegant Attributed String composition in Swift sauce SwiftRichString is a lightweight library which allows to create and manipulate attributed strings

Daniele Margutti 2.9k Jan 5, 2023
SwiftVerbalExpressions is a Swift library that helps to construct difficult regular expressions

SwiftVerbalExpressions Swift Regular Expressions made easy SwiftVerbalExpressions is a Swift library that helps to construct difficult regular express

null 582 Jun 29, 2022
A Swift framework for parsing, formatting and validating international phone numbers. Inspired by Google's libphonenumber.

PhoneNumberKit Swift 5.3 framework for parsing, formatting and validating international phone numbers. Inspired by Google's libphonenumber. Features F

Roy Marmelstein 4.7k Jan 8, 2023
A comprehensive, lightweight string extension for Swift

SwiftString SwiftString is a lightweight string extension for Swift. This library was motivated by having to search StackOverflow for common string op

Andrew Mayne 1.6k Dec 30, 2022
A Swift Formatter Kit

Format A Swift formatter kit. Simple formatting syntax for decimal numbers, currency, mass, addresses, ordinal numbers and hexadecimal colors. Usage I

Roy Marmelstein 1.2k Nov 8, 2022