Swift-cuckoo-collections - Cross-platform Swift dictionaries & sets that use a cuckoo hashing algorithm

Christopher Richez

Last update: Aug 2, 2022

Related tags

Security windows macos linux swift ios set tvos dictionary hashtable cuckoo cuckoo-hashing-algorithm ipados

Overview

CuckooCollections

A Swift package for open-addressed sets and dictionaries that use the cuckoo hasing algorithm.

Overview

Import the CuckooCollections module to use two new data structures that feature constant-time lookups, insertions and removals:

CuckooSet (prototype)
CuckooDictionary (not implemented)

This cuckoo hashing algorithm uses FNV-1 and FNV-1a with a 64-bit digest. The hash function implementation is also open-source, the code is available here.

Platforms

This package is tested in continuous integration on the following platforms:

Ubuntu 20.04
Windows Server 2019
macOS 11.5
iOS 15.0
tvOS 15.0
watchOS 8.0

Versioning

As a Swift Package Manager project, this package uses semantic versioning rules. All releases before 1.0.0 are considered pre-release. Under pre-release rules, code-breaking changes may be introduced with a minor version bump. To avoid this, specify the version requirement in your package manifest as follows:

.package(url: "https://github.com/crichez/swift-cuckoo-collections", .upToNextMinor(from: "0.0.1"))

Comments

`CuckooDictionary`
Objectives

CuckooDictionary aims to expose a very similar API to the Swift standard library's Dictionary but by using an open-addressed hash table and the same cuckoo hashing algorithm as CuckooSet.

Tasks

[x] Write unit tests

[x] Implement methods

[x] Write documentation

enhancement
opened by crichez 3
Use Pointers to store CuckooSet
Objectives

This pull request improves the per-insert performance of CuckooSet by changing its backing storage from Array to UnsafeMutablePointer.

Implementation

When initializing a set, every pointer in the buffer is initialized to nil to indicate an empty bucket. This operation still takes linear time as the initializer must iterate through every pointer in the buffer. The buffer's header is empty. Instead of storing capacity and count in the header, they are stored inline at the class definition. This may change to simplify copy-on-write semantics, but for now should provide meager performance benefits.

Performance

Benchmark output on the CuckooSet<Int> Insert task outputs the following comparison statistics:

Tasks with difference scores larger than 1.05: Score Sum Improvements Regressions Name 1.341 1.341 1.341(#76) 1.000(#0) CuckooSet<Int> Insert (*)

The CuckooSet<Int> Contains task showed no significant improvement.

Most of this improvement is associated with the removed bounds checks associated with unsafe pointers. This significantly reduces memory safety, but our tests guarantee this implementation is safe.
enhancement needs minor version bump
opened by crichez 2
Profile Memory Utilization
Issue Description

We don't have a system in place to profile the memory allocated to a hash table. We should have a workflow to reliably detect memory efficiency improvements and regressions measured by load factor.

Proposed Solution

Create a new test case for memory efficiency tests. Instead of relying on XCTest performance tests which do not reliably work on all platforms, we can use standard assertions over the count and capacity of a given collection through a series of insertions.

Parameters

The memory efficiency of the current insertion algorithm depends on two variables:

The load factor cap, or the maximum ratio of count / capacity.

The bump count cap, or the maximum number of consecutive bucket displacements.

Load Factor Cap

Bumps are caused by collisions, so the likelihood of a bump increases as load factor increases. Keeping the load factor low decreases the complexity of each insert operation, but increases the amount of empty buckets in the table. Not considering the bump count cap, the ideal load factor cap is one where any farther increase in load would trigger a bump chain that takes more time to traverse than would be taken to expand the table.

Bump Count Cap

The primary function of the bump count cap is to avoid bump loops. A bump loop occurs when the same sequence of members are bumped more than once. This is the only case where an insertion might fail in a cuckoo hashing algorithm, so these loops must be detected. The number of consecutive bumps per insertion is also a good secondary indicator of load factor, so keeping the bump count cap low can avoid unnecessary work caused by a high or absent load factor cap.

Current Profile

The current master branch uses two methods to determine when to expand the hash table:

A load factor cap of 0.5, meaning the hash table is expanded anytime count becomes greater than capacity / 2. The likelihood of collisions increases exponentially as count approaches capacity, so keeping the table's load factor low avoids doing unnecessary bumps.

A bump count cap of 20, meaning if 20 consecutive members are displaced from their buckets to insert a new member, the hash table is expanded. As the number of consecutive bumps increases, the probability the insert operation is in a "bump loop" increases.

According to these conditions, the best case load factor with the current design is 0.5, and the worst case could be as low as 2 / capacity. In the event of a hash flooding attack, the capacity of the table would grow exponentially until the machine runs out of memory.

Optimizing for Memory Efficiency

Since memory efficiency and insert performance are inversely related, the optimal cap is where any farther increase in the cap would cause an increase in the rate of change of time per operation.

Detailed Design

The benchmark should be run as an executable that takes the following arguments:

A bump count cap range or a ratio of bump count cap to the number of occupied buckets.

A number of consecutive benchmarks to run.

A maximum number of insertions to perform (optional, defaults to 1M).

Usage: swift run MemoryBenchmarks <file> --cycles <Int> --size <Int> (optional) --max-load-factor <Double> (optional) [--min-bump-count <Int> | --min-scaled-bump-count <Double>] [--max-bump-count <Int> | --max-scaled-bump-count <Double>]

And should write a JSON document to <file> with the following structure:

{ "runs": [ { "bumpCountCap": 0.05, "loadFactorCap": 0.7, "results": [ { "count": 2, "capacity": 32, }, ] }, ], }

This data should allows us to plot a two-dimensional graph with:

The bump count cap on the X axis.

The load factor on the Y axis.

Interpreting Results

When interpreting this graph:

A slope of 0 would imply memory efficiency is unaffected by the bump count cap (but may be limited by a load factor cap).

A positive slope would imply memory efficiency increases as the bump count cap increases.

A negative slope would imply memory efficiency decreases as the bump count cap increases.

This information should help identify when these limits should be changed as the implementation of the insertion algorithm changes. It can also help detect regressions and improvements as the mean load factor accross runs increases or decreases. This should be used in conjunction with a time performance graph to determine how a change affects performance, and optimize algorithm variables.
documentation
opened by crichez 1
Performance Metrics & Comparisons

Overview

To test the efficiency of the algorithm and implementation, we need to measure the performance of common operations on large collections to determine whether they can claim O(1) performance or just O(log n).

It's also important to compare the performance of each data structure with a counterpart. CuckooSet can be compared to Swift's Set and CuckooDictionary to Swift's Dictionary. It's expected that the Element type and number of elements will influence relative performance, so many should be tested. Eventually we can compare CuckooOrderedSet and CuckooOrderedDictionary to counterparts in the Collections package from the team at Apple.

Implementation

We can use XCTest to measure performance, and change the gitignore file to track the baseline files. To make sure these baselines can accurately reflect regressions and improvements to performance, we will need to pay more attention to hardware and software updates fro GitHub Actions, since they handle the allocation and resources for our virtual machines.

Alternatives

Another option is to move the benchmarking to another repository to avoid adding bloat to the main repository. This can always be done down the line if the bloat gets excessive.
documentation enhancement

opened by crichez 1
`CuckooSet` Comparison Operations
Objectives

This pull request adds the following comparison operations to CuckooSet:

== (lhs:rhs:)

isSubset(of:)

isStrictSubset(of:)

isSuperset(of:)

isStrictSuperset(of:)

isDisjoint(with:)

CuckooSet also unconditionally conforms to FNVHashable.

Tasks

[x] declare methods

[x] write unit tests

[ ] implement methods

[ ] pass & refine tests

[ ] write documentation

enhancement
opened by crichez 1
Rename Benchmark Tasks
Objectives

This pull request renames the benchmark tasks to a consistent convention. The history explores the requests in #17 and decided to close #17 without adding new benchmarks.

Note on #17

The benchmark tasks for set algebra operations ended up being highly variable, and no more than a poor reflection of the performance of insert and remove operations. The new tasks were introduced on this branch, but then removed in favor of their component operations.

Naming Convention

Each task is now named collection/method-options. For example:

set/insert

dict/lookup

set/insert-reservingCapacity

documentation
opened by crichez 0
Re-Implement CuckooSet.removeAll()
Objective

This pull request re-implements the CuckooSet.removeAll() method that was previously removed by accident. This closes #18.

Implementation

This PR takes advantage of the changes in #20 by simply re-allocating the set's storage to a default-sized empty instance.

public func removeAll() { buckets = CuckooStorage(capacity: 32) }

In the event this storage is shared with another set, it should remain unaffected. If this storage is uniquely refenced, ARC will deinitialize it.

Impact on Source Stability

This PR restores a part of the original public API. Compared to the last commit, this is an additive change.
enhancement
opened by crichez 0
Optimize CuckooDictionary Storage
Objectives

Improve the performance of dictionary insertions and updates by optimizing the storage of the underlying hash table. This closes #19.

Expose sequences of keys and values exclusively to mirror the Swift Dictionary API.

Implementation

Storage Change

The implementation for the storage change mirrors that of CuckooSet. The buckets internal property is now an instance of the CuckooStorage class where each element is a tuple of a key and its associated value. See #19 for more details.

Keys and Values

The new sequences are initialized from the full dictionary, but only return the key or value respectively for each iteration. This means getting a keys or values sequence is a fairly trivial constant-time operation, and iterating over it has very similar performance to iterating over the entire dictionary.

Benchmarks

The benchmark for the Insert operation outputs the following:

Tasks with difference scores larger than 1.05: Score Sum Improvements Regressions Name 1.192 1.192 1.192(#76) 1.000(#0) CuckooDictionary<Int, Bool> Insert (*)

The lookup operation had no significant change in performance.

Each insert/update operation is approximately twice as slow as with CuckooSet, which is somewhat unexpected. The bulk of the insertion work should be hashing and assignments, which should take the same amount of time. A new optimization pass on the insert/update algorithm may be required. For now though, this is still an improvement.

Impact on Source Stability

There are two potentially breaking changes in this PR:

The setter for CuckooDictionary.count is now internal. This change fixes a previous oversight. This is technically a breakage, but the property was never meant to be manually mutated.

The return type of CuckooDictionary.makeIterator() has changed from IndexingIterator to the new CuckooDictionary.Iterator struct. However the only API exposed by each of those types is the IteratorProtocol API. Only code that features the explicitly declared type is affected.

enhancement needs minor version bump
opened by crichez 0
Optimize Dictionary Storage
Issue Description

CuckooDictionary is still stored in an array like the old CuckooSet. Using an array comes with additional bounds checks that slow each insert operation down by about 20%. It also makes accessing keys and values exclusively linear time operations since the entire storage array must be iterated over to separate them.

Suggested Solution

Use CuckooStorage to manage buckets, and use a tuple of keys and values as the Element type.

var buckets: CuckooStorage<(key: Key, value: Value)>

Keys and Values

The keys and values sequences should be custom sequences where the iterator separates the tuple at iteration-time. Each property can now be accessed in constant time without copying anything.
enhancement needs minor version bump
opened by crichez 0
Re-Implement `CuckooSet.removeAll()`

Issue Description

The removeAll() method was removed from the CuckooSet API during the storage change and was never put back. We should add it in a new PR.

Proposed Solution

The new storage system allows us to replace the storage of a set entirely. We can implement this method very easily be replacing it with a smaller empty buffer. This would also change the complexity of the method to O(1) since the post-remove capacity is a fixed value.
bug

opened by crichez 0
Add Benchmarks for All Set Operations
Issue Description

We currently have benchmarks for insert and lookup operations on both the set and dictionary. The complexity of SetAlgebra methods is not obvious and needs to be benchmarked before declaring it in documentation. Mutating operations are always expected to be O(1), however improvements to the performance of individual operations can still be made and will require a benchmark to justify.

Proposed Solution

Add benchmark tasks for the following set algebra operations:

union

formUnion

intersection

formIntersection

symmetricDifference

formSymmetricDifference

subtract

subtracting

isSubset

isStrictSubset

isSuperset

isStrictSuperset

isDisjoint

Add benchmark tasks for the following mutating operations:

remove

removeAll

expand

bump

documentation
opened by crichez 0

Releases(v0.0.2)

v0.0.2(Jan 27, 2022)

This pre-release includes internal changes made to the set bump and insertion process. These are mostly for clarity and should offer very slight performance improvements, including reducing the memory size of CuckooSet.

No code-breaking changes were introduced in this version.
Source code(tar.gz)
Source code(zip)
v0.0.1(Jan 27, 2022)

The first pre-release!

This release features the CuckooSet and CuckooDictionary types which feature constant time insert, remove and contains operations. These types expose a very similar API to the Swift standard library's Set and Dictionary respectively, with slightly worse performance as they stand now.
Source code(tar.gz)
Source code(zip)

Owner

Christopher Richez

GitHub

Swift package wrapping the OpenWall BCrypt hashing algorithm

SwiftBCrypt A simple Swift Package wrapping the OpenWall BCrypt hashing algorithm. Generate Salt let bcryptSalt = try BCRypt.makeSalt() Hash Phrases

1 Jul 8, 2022

Swift cross-platform crypto library using CommonCrypto/libcrypto

BlueCryptor Swift cross-platform crypto library derived from IDZSwiftCommonCrypto. IMPORTANT NOTE: This release is NOT entirely source code compatible

183 Oct 15, 2022

Swift cross-platform crypto library using CommonCrypto/libcrypto

BlueCryptor Swift cross-platform crypto library derived from IDZSwiftCommonCrypto. IMPORTANT NOTE: This release is NOT entirely source code compatible

183 Oct 15, 2022

Send email to any SMTP server like a boss, in Swift and cross-platform

Hedwig is a Swift package which supplies a set of high level APIs to allow you sending email to an SMTP server easily. If you are planning to send ema

1.1k Jan 3, 2023

A lightweight stochastic optimizer based on slime mold (Slime Mold Algorithm)

Slime This is a Swift implementation of a Slime Mold Algorithm - a stochastic optimizer - generally based on this paper The only dependency required b

4 Aug 6, 2022

Wi-attack: Cross-technology Impersonation Attack against iBeacon Services

3 Nov 30, 2021

Oversecured Vulnerable iOS App is an iOS app that aggregates all the platform's known and popular security vulnerabilities.

Description Oversecured Vulnerable iOS App is an iOS app that aggregates all the platform's known and popular security vulnerabilities. List of vulner

135 Dec 15, 2022

PassDrop is a fully-featured secure password management system, compatible with the free KeePass 1.x (Classic) and multi-platform KeePassX desktop applications.

passdrop This is a modern, updated build of Rudis Muiznieks's PassDrop application. PassDrop is a fully-featured secure password management system, co

33 Sep 23, 2022

A simple wrapper for the iOS Keychain to allow you to use it in a similar fashion to User Defaults. Written in Swift.

SwiftKeychainWrapper A simple wrapper for the iOS / tvOS Keychain to allow you to use it in a similar fashion to User Defaults. Written in Swift. Prov

1.5k Dec 30, 2022

A tiny and easy to use Swift class to encrypt strings using HMAC algorithms.

#Sweet HMAC SweetHMAC is a tiny and easy to use Swift class to encrypt strings using HMAC algorithms. A special thanks to jernejstrasner for shared HM

37 Jul 27, 2022

Safe and easy to use crypto for iOS and macOS

Swift-Sodium Swift-Sodium provides a safe and easy to use interface to perform common cryptographic operations on macOS, iOS, tvOS and watchOS. It lev

483 Jan 5, 2023

Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Themis provides strong, usable cryptography for busy people General purpose cryptographic library for storage and messaging for iOS (Swift, Obj-C), An

1.6k Dec 30, 2022

Use Apple FaceID or TouchID authentication in your app using BiometricAuthentication.

BiometricAuthentication Use Apple FaceID or TouchID authentication in your app using BiometricAuthentication. It's very simple and easy to use that ha

804 Dec 30, 2022

CoreML-Face-Parsing - how to use face-parsing CoreML model in iOS

CoreML-Face-Parsing The simple sample how to use face-parsing CoreML model in iO

6 Oct 25, 2022

An easy-to-use, open-source two-factor authentication app designed specifically for iOS.

Tofu An easy-to-use, open-source two-factor authentication app designed specifically for iOS. Tofu generates one-time passwords to help you protect yo

380 Jan 8, 2023

RSA public/private key encryption, private key signing and public key verification in Swift using the Swift Package Manager. Works on iOS, macOS, and Linux (work in progress).

BlueRSA Swift cross-platform RSA wrapper library for RSA encryption and signing. Works on supported Apple platforms (using Security framework). Linux

122 Dec 16, 2022

RSA public/private key encryption, private key signing and public key verification in Swift using the Swift Package Manager. Works on iOS, macOS, and Linux (work in progress).

BlueRSA Swift cross-platform RSA wrapper library for RSA encryption and signing. Works on supported Apple platforms (using Security framework). Linux

122 Dec 16, 2022

Swift-problem-solving - Swift 알고리즘 맛보기 😋

swift-problem-solving Swift 로 알고리즘 익히기 ?? Programmers 난이도 풀이 문제 바로가기 Lv.2 오픈채팅방 링크 Lv.3 다단계 칫솔 판매 링크 Lv.3 합승 택시 요금 링크 Leetcode 난이도 풀이 문제 바로가기 Medium 1

3 Dec 27, 2022

Helps you define secure storages for your properties using Swift property wrappers.

?? Secure Property Storage Helps you define secure storages for your properties using Swift property wrappers. ?? Features All keys are hashed using S

443 Jan 4, 2023

Swift-cuckoo-collections - Cross-platform Swift dictionaries & sets that use a cuckoo hashing algorithm

Related tags

Overview

CuckooCollections

Overview

Platforms

Versioning

Comments

Objectives

Tasks

Objectives

Implementation

Performance

Issue Description

Proposed Solution

Parameters

Load Factor Cap

Bump Count Cap

Current Profile

Optimizing for Memory Efficiency

Detailed Design

Interpreting Results

Overview

Implementation

Alternatives

Objectives

Tasks

Objectives

Note on #17

Naming Convention

Objective

Implementation

Impact on Source Stability

Objectives

Implementation

Storage Change

Keys and Values

Benchmarks

Impact on Source Stability

Issue Description

Suggested Solution

Keys and Values

Issue Description

Proposed Solution

Issue Description

Proposed Solution

Releases(v0.0.2)

v0.0.2(Jan 27, 2022)

v0.0.1(Jan 27, 2022)

Owner

Christopher Richez

Swift package wrapping the OpenWall BCrypt hashing algorithm

Swift cross-platform crypto library using CommonCrypto/libcrypto

Swift cross-platform crypto library using CommonCrypto/libcrypto

Send email to any SMTP server like a boss, in Swift and cross-platform

A lightweight stochastic optimizer based on slime mold (Slime Mold Algorithm)

Wi-attack: Cross-technology Impersonation Attack against iBeacon Services

Oversecured Vulnerable iOS App is an iOS app that aggregates all the platform's known and popular security vulnerabilities.

PassDrop is a fully-featured secure password management system, compatible with the free KeePass 1.x (Classic) and multi-platform KeePassX desktop applications.

A simple wrapper for the iOS Keychain to allow you to use it in a similar fashion to User Defaults. Written in Swift.

A tiny and easy to use Swift class to encrypt strings using HMAC algorithms.

Safe and easy to use crypto for iOS and macOS

Easy to use cryptographic framework for data protection: secure messaging with forward secrecy and secure data storage. Has unified APIs across 14 platforms.

Use Apple FaceID or TouchID authentication in your app using BiometricAuthentication.

CoreML-Face-Parsing - how to use face-parsing CoreML model in iOS

An easy-to-use, open-source two-factor authentication app designed specifically for iOS.

RSA public/private key encryption, private key signing and public key verification in Swift using the Swift Package Manager. Works on iOS, macOS, and Linux (work in progress).

RSA public/private key encryption, private key signing and public key verification in Swift using the Swift Package Manager. Works on iOS, macOS, and Linux (work in progress).

Swift-problem-solving - Swift 알고리즘 맛보기 😋

Helps you define secure storages for your properties using Swift property wrappers.