๐Ÿ•ธ๏ธ Swift Concurrency-powered crawler engine on top of Actomaton.

Related tags

Utility swift crawler
Overview

๐Ÿ•ธ๏ธ ActoCrawler

ActoCrawler is a Swift Concurrency-powered crawler engine on top of Actomaton, with flexible customizability to create various HTML scrapers, image scrapers, etc.

Example

struct Output: Sendable
{
    let nextLinksCount: Int
}

let htmlCrawler = await Crawler<Output, Void>.htmlScraper(
    config: CrawlerConfig(
        maxDepths: 10,
        maxTotalRequests: 100,
        timeoutPerRequest: 5,
        userAgent: "ActoCrawler",
        domainFilteringPolicy: .disallowedDomains([".*google.com*" /* ... */]),
        domainQueueTable: [
            ".*example1.com*": .init(maxConcurrency: 1, delay: 0),
            ".*example2.com*": .init(maxConcurrency: 5, delay: 0.1 ... 0.5)
        ]
    ),
    scrapeHTML: { response in
        let html = response.data
        let links = try html.select("a").map { try $0.attr("href") }

        let nextRequests = links
            .compactMap(URL.init(string:))
            .map { UserRequest(url: $0) }

        return (nextRequests, Output(nextLinksCount: nextRequests.count))
    }
)

// Visit initial page.
htmlCrawler.visit(url: URL(string: "https://www.wikipedia.org")!)

// Observe crawl events.
for await event in htmlCrawler.events {
    switch event {
    case let .willCrawl(req):
        print("Crawl : ๐Ÿ•ธ๏ธ [\(req.order)] [d=\(req.depth)] \(req.url)")
    case let .didCrawl(req, .success(output)):
        print("Output: โœ… [\(req.order)] [d=\(req.depth)] \(req.url), nextLinksCount = \(output.nextLinksCount)")
    case let .didCrawl(req, .failure(error)):
        print("Output: โŒ [\(req.order)] [d=\(req.depth)] \(req.url), error = \(error)")
    }
}

print("Output Done")

Acknowledgements

License

MIT

You might also like...
Cross-Platform, Protocol-Oriented Programming base library to complement the Swift Standard Library. (Pure Swift, Supports Linux)

SwiftFoundation Cross-Platform, Protocol-Oriented Programming base library to complement the Swift Standard Library. Goals Provide a cross-platform in

Swift - โœ๏ธSwift ๊ณต๋ถ€ ์ €์žฅ์†Œโœ๏ธ

Swift ์Šค์œ„ํ”„ํŠธ์˜ ๊ธฐ์ดˆ 1. Swift์˜ ๊ธฐ๋ณธ 2. ๋ณ€์ˆ˜์™€ ์ƒ์ˆ˜ [3. ๋ฐ์ดํ„ฐ ํƒ€์ž… ๊ธฐ๋ณธ] [4. ๋ฐ์ดํ„ฐ ํƒ€์ž… ๊ณ ๊ธ‰] 5. ์—ฐ์‚ฐ์ž 6. ํ๋ฆ„ ์ œ์–ด 7. ํ•จ์ˆ˜ 8. ์˜ต์…”๋„ ๊ฐ์ฒด์ง€ํ–ฅ ํ”„๋กœ๊ทธ๋ž˜๋ฐ๊ณผ ์Šค์œ„ํ”„ํŠธ 9. ๊ตฌ์กฐ์ฒด์™€ ํด๋ž˜์Šค 10. ํ”„๋กœํผํ‹ฐ์™€ ๋ฉ”์„œ๋“œ 11. ์ธ์Šคํ„ด์Šค ์ƒ

Swift-ndi - Swift wrapper around NewTek's NDI SDK

swift-ndi Swift wrapper around NewTek's NDI SDK. Make sure you extracted latest

__.swift is a port of Underscore.js to Swift.

__.swift Now, __.swift is version 0.2.0! With the chain of methods, __.swift became more flexible and extensible. Documentation: http://lotz84.github.

SNTabBarDemo-Swift - Cool TabBar With Swift
SNTabBarDemo-Swift - Cool TabBar With Swift

SNTabBarDemo-Swift Cool TabBar How To Use // MARK: - setup private func setu

Swift-when - Expression switch support in Swift

Swift When - supporting switch expressions in Swift! What is it? Basically, it a

Swift-compute-runtime - Swift runtime for Fastly Compute@Edge

swift-compute-runtime Swift runtime for Fastly Compute@Edge Getting Started Crea

Swift-HorizontalPickerView - Customizable horizontal picker view component written in Swift for UIKit/iOS

Horizontal Picker View Customizable horizontal picker view component written in

swift-highlight a pure-Swift data structure library designed for server applications that need to store a lot of styled text

swift-highlight is a pure-Swift data structure library designed for server applications that need to store a lot of styled text. The Highlight module is memory-efficient and uses slab allocations and small-string optimizations to pack large amounts of styled text into a small amount of memory, while still supporting efficient traversal through the Sequence protocol.

Comments
Owner
Actomaton
Swift async/await & Actor-powered effectful state-management framework.
Actomaton
A simple swift package that provides a Swift Concurrency equivalent to `@Published`.

AsyncValue This is a simple package that provides a convenience property wrapper around AsyncStream that behaves almost identically to @Published. Ins

Brent Mifsud 33 Oct 3, 2022
A command-line tool and Swift Package for generating class diagrams powered by PlantUML

SwiftPlantUML Generate UML class diagrams from swift code with this Command Line Interface (CLI) and Swift Package. Use one or more Swift files as inp

null 374 Jan 3, 2023
Easy way to detect iOS device properties, OS versions and work with screen sizes. Powered by Swift.

Easy way to detect device environment: Device model and version Screen resolution Interface orientation iOS version Battery state Environment Helps to

Anatoliy Voropay 582 Dec 25, 2022
BudouX: the machine learning powered line break organizer tool

BudouX.swift BudouX Swift implementation. BudouX is the machine learning powered

griffin-stewie 39 Dec 31, 2022
A graphical Mach-O viewer for macOS. Powered by Mach-O Kit.

Mach-O Explorer is a graphical Mach-O viewer for macOS. It aims to provide an interface and feature set that are similar to the venerable MachOView ap

Devin 581 Dec 31, 2022
Zip - A Swift framework for zipping and unzipping files. Simple and quick to use. Built on top of minizip.

Zip A Swift framework for zipping and unzipping files. Simple and quick to use. Built on top of minizip. Usage Import Zip at the top of the Swift file

Roy Marmelstein 2.3k Jan 3, 2023
An enhancement built on top of Foundation Framework and XCTest.

Beton is a Swift library built on top of the Foundation framework, that provides an additional layer of functionality, including easy localization, performance test measurement support, and convenience functionality. For us, Beton is primarily, but not exclusively, useful for server-side Swift engineering.

21Gram Consulting 26 Dec 10, 2022
๐ŸŸฃ Verge is a very tunable state-management engine on iOS App (UIKit / SwiftUI) and built-in ORM.

Verge is giving the power of state-management in muukii/Brightroom v2 development! Verge.swift ?? An effective state management architecture for iOS -

VergeGroup 478 Dec 29, 2022
Swift Markdown is a Swift package for parsing, building, editing, and analyzing Markdown documents.

Swift Markdown is a Swift package for parsing, building, editing, and analyzing Markdown documents.

Apple 2k Dec 28, 2022
Swift-DocC is a documentation compiler for Swift frameworks and packages aimed at making it easy to write and publish great developer documentation.

Swift-DocC is a documentation compiler for Swift frameworks and packages aimed at making it easy to write and publish great developer docum

Apple 833 Jan 3, 2023