🕸️ Swift Concurrency-powered crawler engine on top of Actomaton.

Actomaton

Last update: Oct 17, 2022

Related tags

Utility swift crawler

Overview

🕸️ ActoCrawler

ActoCrawler is a Swift Concurrency-powered crawler engine on top of Actomaton, with flexible customizability to create various HTML scrapers, image scrapers, etc.

Example

Examples

struct Output: Sendable
{
    let nextLinksCount: Int
}

let htmlCrawler = await Crawler<Output, Void>.htmlScraper(
    config: CrawlerConfig(
        maxDepths: 10,
        maxTotalRequests: 100,
        timeoutPerRequest: 5,
        userAgent: "ActoCrawler",
        domainFilteringPolicy: .disallowedDomains([".*google.com*" /* ... */]),
        domainQueueTable: [
            ".*example1.com*": .init(maxConcurrency: 1, delay: 0),
            ".*example2.com*": .init(maxConcurrency: 5, delay: 0.1 ... 0.5)
        ]
    ),
    scrapeHTML: { response in
        let html = response.data
        let links = try html.select("a").map { try $0.attr("href") }

        let nextRequests = links
            .compactMap(URL.init(string:))
            .map { UserRequest(url: $0) }

        return (nextRequests, Output(nextLinksCount: nextRequests.count))
    }
)

// Visit initial page.
htmlCrawler.visit(url: URL(string: "https://www.wikipedia.org")!)

// Observe crawl events.
for await event in htmlCrawler.events {
    switch event {
    case let .willCrawl(req):
        print("Crawl : 🕸️ [\(req.order)] [d=\(req.depth)] \(req.url)")
    case let .didCrawl(req, .success(output)):
        print("Output: ✅ [\(req.order)] [d=\(req.depth)] \(req.url), nextLinksCount = \(output.nextLinksCount)")
    case let .didCrawl(req, .failure(error)):
        print("Output: ❌ [\(req.order)] [d=\(req.depth)] \(req.url), error = \(error)")
    }
}

print("Output Done")

Acknowledgements

mattsse/voyager

License

MIT

Cross-Platform, Protocol-Oriented Programming base library to complement the Swift Standard Library. (Pure Swift, Supports Linux)

SwiftFoundation Cross-Platform, Protocol-Oriented Programming base library to complement the Swift Standard Library. Goals Provide a cross-platform in

620 Oct 11, 2022

Swift - ✏️Swift 공부 저장소✏️

Swift 스위프트의 기초 1. Swift의 기본 2. 변수와 상수 [3. 데이터 타입 기본] [4. 데이터 타입 고급] 5. 연산자 6. 흐름 제어 7. 함수 8. 옵셔널 객체지향 프로그래밍과 스위프트 9. 구조체와 클래스 10. 프로퍼티와 메서드 11. 인스턴스 생

0 Mar 9, 2022

Swift-ndi - Swift wrapper around NewTek's NDI SDK

swift-ndi Swift wrapper around NewTek's NDI SDK. Make sure you extracted latest

12 Dec 29, 2022

__.swift is a port of Underscore.js to Swift.

__.swift Now, __.swift is version 0.2.0! With the chain of methods, __.swift became more flexible and extensible. Documentation: http://lotz84.github.

86 Jun 29, 2022

SNTabBarDemo-Swift - Cool TabBar With Swift

SNTabBarDemo-Swift Cool TabBar How To Use // MARK: - setup private func setu

3 Sep 29, 2022

Swift-when - Expression switch support in Swift

Swift When - supporting switch expressions in Swift! What is it? Basically, it a

7 Nov 24, 2022

Swift-compute-runtime - Swift runtime for Fastly Compute@Edge

swift-compute-runtime Swift runtime for Fastly Compute@Edge Getting Started Crea

57 Dec 24, 2022

Swift-HorizontalPickerView - Customizable horizontal picker view component written in Swift for UIKit/iOS

Horizontal Picker View Customizable horizontal picker view component written in

8 Aug 1, 2022

swift-highlight a pure-Swift data structure library designed for server applications that need to store a lot of styled text

swift-highlight is a pure-Swift data structure library designed for server applications that need to store a lot of styled text. The Highlight module is memory-efficient and uses slab allocations and small-string optimizations to pack large amounts of styled text into a small amount of memory, while still supporting efficient traversal through the Sequence protocol.

4 Aug 14, 2022

Comments

Add ActoCrawlerPlaywright & HeadlessBrowserExample
This PR adds ActoCrawlerPlaywright and HeadlessBrowserExample using Playwright (python version of web automation) and PythonKit (Swift-Python interop) libraries:

microsoft/playwright-python: Python version of the Playwright testing and automation library.

pvieito/PythonKit: Swift framework to interact with Python.

The basic usage of Playwright can be easily followed by Getting started | Playwright Python.

To bring Python async-await into Swift Concurrency world, this PR also adds PythonKitAsync module as a quick wrapper. (CAVEAT: This approach may not be ideal due to slower browsing, possibly by multiple Python coroutines not being able to switch to the next one in their concurrency world)

Screencast

$ swift run HeadlessBrowserExample

https://user-images.githubusercontent.com/138476/170487944-b2bc5e2e-8021-4c36-ba2f-5f2c77421ebe.mp4
enhancement
opened by inamiy 0
Add `CrawlEvent` with `case willCrawl`

This PR adds CrawlEvent with case willCrawl so that beginning of the crawling events can also be observed.

There are some renamings from "output" to "event", and usage of Output? is now simplified into Output (non-optional).
enhancement

opened by inamiy 0

Owner

Actomaton

Swift async/await & Actor-powered effectful state-management framework.

GitHub

A simple swift package that provides a Swift Concurrency equivalent to `@Published`.

AsyncValue This is a simple package that provides a convenience property wrapper around AsyncStream that behaves almost identically to @Published. Ins

33 Oct 3, 2022

A command-line tool and Swift Package for generating class diagrams powered by PlantUML

SwiftPlantUML Generate UML class diagrams from swift code with this Command Line Interface (CLI) and Swift Package. Use one or more Swift files as inp

374 Jan 3, 2023

Easy way to detect iOS device properties, OS versions and work with screen sizes. Powered by Swift.

Easy way to detect device environment: Device model and version Screen resolution Interface orientation iOS version Battery state Environment Helps to

582 Dec 25, 2022

BudouX: the machine learning powered line break organizer tool

BudouX.swift BudouX Swift implementation. BudouX is the machine learning powered

39 Dec 31, 2022

A graphical Mach-O viewer for macOS. Powered by Mach-O Kit.

Mach-O Explorer is a graphical Mach-O viewer for macOS. It aims to provide an interface and feature set that are similar to the venerable MachOView ap

581 Dec 31, 2022

Zip - A Swift framework for zipping and unzipping files. Simple and quick to use. Built on top of minizip.

Zip A Swift framework for zipping and unzipping files. Simple and quick to use. Built on top of minizip. Usage Import Zip at the top of the Swift file

2.3k Jan 3, 2023

An enhancement built on top of Foundation Framework and XCTest.

Beton is a Swift library built on top of the Foundation framework, that provides an additional layer of functionality, including easy localization, performance test measurement support, and convenience functionality. For us, Beton is primarily, but not exclusively, useful for server-side Swift engineering.

26 Dec 10, 2022

🟣 Verge is a very tunable state-management engine on iOS App (UIKit / SwiftUI) and built-in ORM.

Verge is giving the power of state-management in muukii/Brightroom v2 development! Verge.swift ?? An effective state management architecture for iOS -

478 Dec 29, 2022

Swift Markdown is a Swift package for parsing, building, editing, and analyzing Markdown documents.

2k Dec 28, 2022

Swift-DocC is a documentation compiler for Swift frameworks and packages aimed at making it easy to write and publish great developer documentation.

Swift-DocC is a documentation compiler for Swift frameworks and packages aimed at making it easy to write and publish great developer docum

833 Jan 3, 2023

🕸️ Swift Concurrency-powered crawler engine on top of Actomaton.

Related tags

Overview

🕸️ ActoCrawler

Example

Acknowledgements

License

You might also like...

Cross-Platform, Protocol-Oriented Programming base library to complement the Swift Standard Library. (Pure Swift, Supports Linux)

Swift - ✏️Swift 공부 저장소✏️

Swift-ndi - Swift wrapper around NewTek's NDI SDK

__.swift is a port of Underscore.js to Swift.

SNTabBarDemo-Swift - Cool TabBar With Swift

Swift-when - Expression switch support in Swift

Swift-compute-runtime - Swift runtime for Fastly Compute@Edge

Swift-HorizontalPickerView - Customizable horizontal picker view component written in Swift for UIKit/iOS

swift-highlight a pure-Swift data structure library designed for server applications that need to store a lot of styled text

Comments

Add ActoCrawlerPlaywright & HeadlessBrowserExample

Screencast

Add `CrawlEvent` with `case willCrawl`

Owner

Actomaton

A simple swift package that provides a Swift Concurrency equivalent to `@Published`.

A command-line tool and Swift Package for generating class diagrams powered by PlantUML

Easy way to detect iOS device properties, OS versions and work with screen sizes. Powered by Swift.

BudouX: the machine learning powered line break organizer tool

A graphical Mach-O viewer for macOS. Powered by Mach-O Kit.

Zip - A Swift framework for zipping and unzipping files. Simple and quick to use. Built on top of minizip.

An enhancement built on top of Foundation Framework and XCTest.

🟣 Verge is a very tunable state-management engine on iOS App (UIKit / SwiftUI) and built-in ORM.

Swift Markdown is a Swift package for parsing, building, editing, and analyzing Markdown documents.

Swift-DocC is a documentation compiler for Swift frameworks and packages aimed at making it easy to write and publish great developer documentation.