Swift-cuckoo-collections - Cross-platform Swift dictionaries & sets that use a cuckoo hashing algorithm

Overview

CuckooCollections

A Swift package for open-addressed sets and dictionaries that use the cuckoo hasing algorithm.

Overview

Import the CuckooCollections module to use two new data structures that feature constant-time lookups, insertions and removals:

  • CuckooSet (prototype)
  • CuckooDictionary (not implemented)

This cuckoo hashing algorithm uses FNV-1 and FNV-1a with a 64-bit digest. The hash function implementation is also open-source, the code is available here.

Platforms

This package is tested in continuous integration on the following platforms:

  • Ubuntu 20.04
  • Windows Server 2019
  • macOS 11.5
  • iOS 15.0
  • tvOS 15.0
  • watchOS 8.0

Versioning

As a Swift Package Manager project, this package uses semantic versioning rules. All releases before 1.0.0 are considered pre-release. Under pre-release rules, code-breaking changes may be introduced with a minor version bump. To avoid this, specify the version requirement in your package manifest as follows:

.package(url: "https://github.com/crichez/swift-cuckoo-collections", .upToNextMinor(from: "0.0.1"))
Comments
  • `CuckooDictionary`

    `CuckooDictionary`

    Objectives

    CuckooDictionary aims to expose a very similar API to the Swift standard library's Dictionary but by using an open-addressed hash table and the same cuckoo hashing algorithm as CuckooSet.

    Tasks

    • [x] Write unit tests
    • [x] Implement methods
    • [x] Write documentation
    enhancement 
    opened by crichez 3
  • Use Pointers to store CuckooSet

    Use Pointers to store CuckooSet

    Objectives

    This pull request improves the per-insert performance of CuckooSet by changing its backing storage from Array to UnsafeMutablePointer.

    Implementation

    When initializing a set, every pointer in the buffer is initialized to nil to indicate an empty bucket. This operation still takes linear time as the initializer must iterate through every pointer in the buffer. The buffer's header is empty. Instead of storing capacity and count in the header, they are stored inline at the class definition. This may change to simplify copy-on-write semantics, but for now should provide meager performance benefits.

    Performance

    Benchmark output on the CuckooSet<Int> Insert task outputs the following comparison statistics:

    Tasks with difference scores larger than 1.05:
      Score   Sum     Improvements Regressions  Name
      1.341   1.341   1.341(#76)   1.000(#0)    CuckooSet<Int> Insert (*)
    

    The CuckooSet<Int> Contains task showed no significant improvement.

    Most of this improvement is associated with the removed bounds checks associated with unsafe pointers. This significantly reduces memory safety, but our tests guarantee this implementation is safe.

    enhancement needs minor version bump 
    opened by crichez 2
  • Profile Memory Utilization

    Profile Memory Utilization

    Issue Description

    We don't have a system in place to profile the memory allocated to a hash table. We should have a workflow to reliably detect memory efficiency improvements and regressions measured by load factor.

    Proposed Solution

    Create a new test case for memory efficiency tests. Instead of relying on XCTest performance tests which do not reliably work on all platforms, we can use standard assertions over the count and capacity of a given collection through a series of insertions.

    Parameters

    The memory efficiency of the current insertion algorithm depends on two variables:

    1. The load factor cap, or the maximum ratio of count / capacity.
    2. The bump count cap, or the maximum number of consecutive bucket displacements.

    Load Factor Cap

    Bumps are caused by collisions, so the likelihood of a bump increases as load factor increases. Keeping the load factor low decreases the complexity of each insert operation, but increases the amount of empty buckets in the table. Not considering the bump count cap, the ideal load factor cap is one where any farther increase in load would trigger a bump chain that takes more time to traverse than would be taken to expand the table.

    Bump Count Cap

    The primary function of the bump count cap is to avoid bump loops. A bump loop occurs when the same sequence of members are bumped more than once. This is the only case where an insertion might fail in a cuckoo hashing algorithm, so these loops must be detected. The number of consecutive bumps per insertion is also a good secondary indicator of load factor, so keeping the bump count cap low can avoid unnecessary work caused by a high or absent load factor cap.

    Current Profile

    The current master branch uses two methods to determine when to expand the hash table:

    1. A load factor cap of 0.5, meaning the hash table is expanded anytime count becomes greater than capacity / 2. The likelihood of collisions increases exponentially as count approaches capacity, so keeping the table's load factor low avoids doing unnecessary bumps.
    2. A bump count cap of 20, meaning if 20 consecutive members are displaced from their buckets to insert a new member, the hash table is expanded. As the number of consecutive bumps increases, the probability the insert operation is in a "bump loop" increases.

    According to these conditions, the best case load factor with the current design is 0.5, and the worst case could be as low as 2 / capacity. In the event of a hash flooding attack, the capacity of the table would grow exponentially until the machine runs out of memory.

    Optimizing for Memory Efficiency

    Since memory efficiency and insert performance are inversely related, the optimal cap is where any farther increase in the cap would cause an increase in the rate of change of time per operation.

    Detailed Design

    The benchmark should be run as an executable that takes the following arguments:

    1. A bump count cap range or a ratio of bump count cap to the number of occupied buckets.
    2. A number of consecutive benchmarks to run.
    3. A maximum number of insertions to perform (optional, defaults to 1M).
    Usage: swift run MemoryBenchmarks <file>
      --cycles <Int>
      --size <Int> (optional)
      --max-load-factor <Double> (optional)
      [--min-bump-count <Int> | --min-scaled-bump-count <Double>] 
      [--max-bump-count <Int> | --max-scaled-bump-count <Double>]
    

    And should write a JSON document to <file> with the following structure:

    {   
        "runs": [
            {   
                "bumpCountCap": 0.05,
                "loadFactorCap": 0.7,
                "results": [
                    {
                        "count": 2,
                        "capacity": 32,
                    },
                ]
            },
        ],
    }
    

    This data should allows us to plot a two-dimensional graph with:

    1. The bump count cap on the X axis.
    2. The load factor on the Y axis.

    Interpreting Results

    When interpreting this graph:

    • A slope of 0 would imply memory efficiency is unaffected by the bump count cap (but may be limited by a load factor cap).
    • A positive slope would imply memory efficiency increases as the bump count cap increases.
    • A negative slope would imply memory efficiency decreases as the bump count cap increases.

    This information should help identify when these limits should be changed as the implementation of the insertion algorithm changes. It can also help detect regressions and improvements as the mean load factor accross runs increases or decreases. This should be used in conjunction with a time performance graph to determine how a change affects performance, and optimize algorithm variables.

    documentation 
    opened by crichez 1
  • Performance Metrics & Comparisons

    Performance Metrics & Comparisons

    Overview

    To test the efficiency of the algorithm and implementation, we need to measure the performance of common operations on large collections to determine whether they can claim O(1) performance or just O(log n).

    It's also important to compare the performance of each data structure with a counterpart. CuckooSet can be compared to Swift's Set and CuckooDictionary to Swift's Dictionary. It's expected that the Element type and number of elements will influence relative performance, so many should be tested. Eventually we can compare CuckooOrderedSet and CuckooOrderedDictionary to counterparts in the Collections package from the team at Apple.

    Implementation

    We can use XCTest to measure performance, and change the gitignore file to track the baseline files. To make sure these baselines can accurately reflect regressions and improvements to performance, we will need to pay more attention to hardware and software updates fro GitHub Actions, since they handle the allocation and resources for our virtual machines.

    Alternatives

    Another option is to move the benchmarking to another repository to avoid adding bloat to the main repository. This can always be done down the line if the bloat gets excessive.

    documentation enhancement 
    opened by crichez 1
  • `CuckooSet` Comparison Operations

    `CuckooSet` Comparison Operations

    Objectives

    This pull request adds the following comparison operations to CuckooSet:

    • == (lhs:rhs:)
    • isSubset(of:)
    • isStrictSubset(of:)
    • isSuperset(of:)
    • isStrictSuperset(of:)
    • isDisjoint(with:)

    CuckooSet also unconditionally conforms to FNVHashable.

    Tasks

    • [x] declare methods
    • [x] write unit tests
    • [ ] implement methods
    • [ ] pass & refine tests
    • [ ] write documentation
    enhancement 
    opened by crichez 1
  • Rename Benchmark Tasks

    Rename Benchmark Tasks

    Objectives

    This pull request renames the benchmark tasks to a consistent convention. The history explores the requests in #17 and decided to close #17 without adding new benchmarks.

    Note on #17

    The benchmark tasks for set algebra operations ended up being highly variable, and no more than a poor reflection of the performance of insert and remove operations. The new tasks were introduced on this branch, but then removed in favor of their component operations.

    Naming Convention

    Each task is now named collection/method-options. For example:

    • set/insert
    • dict/lookup
    • set/insert-reservingCapacity
    documentation 
    opened by crichez 0
  • Re-Implement CuckooSet.removeAll()

    Re-Implement CuckooSet.removeAll()

    Objective

    This pull request re-implements the CuckooSet.removeAll() method that was previously removed by accident. This closes #18.

    Implementation

    This PR takes advantage of the changes in #20 by simply re-allocating the set's storage to a default-sized empty instance.

    public func removeAll() {
        buckets = CuckooStorage(capacity: 32)
    }
    

    In the event this storage is shared with another set, it should remain unaffected. If this storage is uniquely refenced, ARC will deinitialize it.

    Impact on Source Stability

    This PR restores a part of the original public API. Compared to the last commit, this is an additive change.

    enhancement 
    opened by crichez 0
  • Optimize CuckooDictionary Storage

    Optimize CuckooDictionary Storage

    Objectives

    • Improve the performance of dictionary insertions and updates by optimizing the storage of the underlying hash table. This closes #19.
    • Expose sequences of keys and values exclusively to mirror the Swift Dictionary API.

    Implementation

    Storage Change

    The implementation for the storage change mirrors that of CuckooSet. The buckets internal property is now an instance of the CuckooStorage class where each element is a tuple of a key and its associated value. See #19 for more details.

    Keys and Values

    The new sequences are initialized from the full dictionary, but only return the key or value respectively for each iteration. This means getting a keys or values sequence is a fairly trivial constant-time operation, and iterating over it has very similar performance to iterating over the entire dictionary.

    Benchmarks

    The benchmark for the Insert operation outputs the following:

    Tasks with difference scores larger than 1.05:
      Score   Sum     Improvements Regressions  Name
      1.192   1.192   1.192(#76)   1.000(#0)    CuckooDictionary<Int, Bool> Insert (*)
    

    The lookup operation had no significant change in performance.

    Each insert/update operation is approximately twice as slow as with CuckooSet, which is somewhat unexpected. The bulk of the insertion work should be hashing and assignments, which should take the same amount of time. A new optimization pass on the insert/update algorithm may be required. For now though, this is still an improvement.

    Impact on Source Stability

    There are two potentially breaking changes in this PR:

    • The setter for CuckooDictionary.count is now internal. This change fixes a previous oversight. This is technically a breakage, but the property was never meant to be manually mutated.
    • The return type of CuckooDictionary.makeIterator() has changed from IndexingIterator to the new CuckooDictionary.Iterator struct. However the only API exposed by each of those types is the IteratorProtocol API. Only code that features the explicitly declared type is affected.
    enhancement needs minor version bump 
    opened by crichez 0
  • Optimize Dictionary Storage

    Optimize Dictionary Storage

    Issue Description

    CuckooDictionary is still stored in an array like the old CuckooSet. Using an array comes with additional bounds checks that slow each insert operation down by about 20%. It also makes accessing keys and values exclusively linear time operations since the entire storage array must be iterated over to separate them.

    Suggested Solution

    Use CuckooStorage to manage buckets, and use a tuple of keys and values as the Element type.

    var buckets: CuckooStorage<(key: Key, value: Value)>
    

    Keys and Values

    The keys and values sequences should be custom sequences where the iterator separates the tuple at iteration-time. Each property can now be accessed in constant time without copying anything.

    enhancement needs minor version bump 
    opened by crichez 0
  • Re-Implement `CuckooSet.removeAll()`

    Re-Implement `CuckooSet.removeAll()`

    Issue Description

    The removeAll() method was removed from the CuckooSet API during the storage change and was never put back. We should add it in a new PR.

    Proposed Solution

    The new storage system allows us to replace the storage of a set entirely. We can implement this method very easily be replacing it with a smaller empty buffer. This would also change the complexity of the method to O(1) since the post-remove capacity is a fixed value.

    bug 
    opened by crichez 0
  • Add Benchmarks for All Set Operations

    Add Benchmarks for All Set Operations

    Issue Description

    We currently have benchmarks for insert and lookup operations on both the set and dictionary. The complexity of SetAlgebra methods is not obvious and needs to be benchmarked before declaring it in documentation. Mutating operations are always expected to be O(1), however improvements to the performance of individual operations can still be made and will require a benchmark to justify.

    Proposed Solution

    Add benchmark tasks for the following set algebra operations:

    • union
    • formUnion
    • intersection
    • formIntersection
    • symmetricDifference
    • formSymmetricDifference
    • subtract
    • subtracting
    • isSubset
    • isStrictSubset
    • isSuperset
    • isStrictSuperset
    • isDisjoint

    Add benchmark tasks for the following mutating operations:

    • remove
    • removeAll
    • expand
    • bump
    documentation 
    opened by crichez 0
Releases(v0.0.2)
  • v0.0.2(Jan 27, 2022)

    This pre-release includes internal changes made to the set bump and insertion process. These are mostly for clarity and should offer very slight performance improvements, including reducing the memory size of CuckooSet.

    No code-breaking changes were introduced in this version.

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Jan 27, 2022)

    The first pre-release!

    This release features the CuckooSet and CuckooDictionary types which feature constant time insert, remove and contains operations. These types expose a very similar API to the Swift standard library's Set and Dictionary respectively, with slightly worse performance as they stand now.

    Source code(tar.gz)
    Source code(zip)
Owner
Christopher Richez
Christopher Richez
Swift package wrapping the OpenWall BCrypt hashing algorithm

SwiftBCrypt A simple Swift Package wrapping the OpenWall BCrypt hashing algorithm. Generate Salt let bcryptSalt = try BCRypt.makeSalt() Hash Phrases

Tanner Silva 1 Jul 8, 2022
Swift cross-platform crypto library using CommonCrypto/libcrypto

BlueCryptor Swift cross-platform crypto library derived from IDZSwiftCommonCrypto. IMPORTANT NOTE: This release is NOT entirely source code compatible

Kitura 183 Oct 15, 2022
Swift cross-platform crypto library using CommonCrypto/libcrypto

BlueCryptor Swift cross-platform crypto library derived from IDZSwiftCommonCrypto. IMPORTANT NOTE: This release is NOT entirely source code compatible

Kitura 183 Oct 15, 2022
Send email to any SMTP server like a boss, in Swift and cross-platform

Hedwig is a Swift package which supplies a set of high level APIs to allow you sending email to an SMTP server easily. If you are planning to send ema

Wei Wang 1.1k Jan 3, 2023
A lightweight stochastic optimizer based on slime mold (Slime Mold Algorithm)

Slime This is a Swift implementation of a Slime Mold Algorithm - a stochastic optimizer - generally based on this paper The only dependency required b

Ethan J 4 Aug 6, 2022
Wi-attack: Cross-technology Impersonation Attack against iBeacon Services

Wi-attack: Cross-technology Impersonation Attack against iBeacon Services

Naxin 3 Nov 30, 2021
Oversecured Vulnerable iOS App is an iOS app that aggregates all the platform's known and popular security vulnerabilities.

Description Oversecured Vulnerable iOS App is an iOS app that aggregates all the platform's known and popular security vulnerabilities. List of vulner

Oversecured Inc 135 Dec 15, 2022
PassDrop is a fully-featured secure password management system, compatible with the free KeePass 1.x (Classic) and multi-platform KeePassX desktop applications.

passdrop This is a modern, updated build of Rudis Muiznieks's PassDrop application. PassDrop is a fully-featured secure password management system, co

Chad Austin 33 Sep 23, 2022
A simple wrapper for the iOS Keychain to allow you to use it in a similar fashion to User Defaults. Written in Swift.

SwiftKeychainWrapper A simple wrapper for the iOS / tvOS Keychain to allow you to use it in a similar fashion to User Defaults. Written in Swift. Prov

Jason 1.5k Dec 30, 2022
A tiny and easy to use Swift class to encrypt strings using HMAC algorithms.

#Sweet HMAC SweetHMAC is a tiny and easy to use Swift class to encrypt strings using HMAC algorithms. A special thanks to jernejstrasner for shared HM

Jan Cássio 37 Jul 27, 2022
Safe and easy to use crypto for iOS and macOS

Swift-Sodium Swift-Sodium provides a safe and easy to use interface to perform common cryptographic operations on macOS, iOS, tvOS and watchOS. It lev

Frank Denis 483 Jan 5, 2023
Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

Cossack Labs 1.6k Dec 30, 2022
Use Apple FaceID or TouchID authentication in your app using BiometricAuthentication.

BiometricAuthentication Use Apple FaceID or TouchID authentication in your app using BiometricAuthentication. It's very simple and easy to use that ha

Rushi Sangani 804 Dec 30, 2022
CoreML-Face-Parsing - how to use face-parsing CoreML model in iOS

CoreML-Face-Parsing The simple sample how to use face-parsing CoreML model in iO

MLBoy 6 Oct 25, 2022
An easy-to-use, open-source two-factor authentication app designed specifically for iOS.

Tofu An easy-to-use, open-source two-factor authentication app designed specifically for iOS. Tofu generates one-time passwords to help you protect yo

Calle Luks 380 Jan 8, 2023
RSA public/private key encryption, private key signing and public key verification in Swift using the Swift Package Manager. Works on iOS, macOS, and Linux (work in progress).

BlueRSA Swift cross-platform RSA wrapper library for RSA encryption and signing. Works on supported Apple platforms (using Security framework). Linux

Kitura 122 Dec 16, 2022
RSA public/private key encryption, private key signing and public key verification in Swift using the Swift Package Manager. Works on iOS, macOS, and Linux (work in progress).

BlueRSA Swift cross-platform RSA wrapper library for RSA encryption and signing. Works on supported Apple platforms (using Security framework). Linux

Kitura 122 Dec 16, 2022
Swift-problem-solving - Swift 알고리즘 맛보기 😋

swift-problem-solving Swift 로 알고리즘 익히기 ?? Programmers 난이도 풀이 문제 바로가기 Lv.2 오픈채팅방 링크 Lv.3 다단계 칫솔 판매 링크 Lv.3 합승 택시 요금 링크 Leetcode 난이도 풀이 문제 바로가기 Medium 1

jegyun 3 Dec 27, 2022
Helps you define secure storages for your properties using Swift property wrappers.

?? Secure Property Storage Helps you define secure storages for your properties using Swift property wrappers. ?? Features All keys are hashed using S

Alex Rupérez 443 Jan 4, 2023