Skip to content

Commit 9fa5a31

Browse files
committed
Add section on entropy bits calculation
1 parent 7835b90 commit 9fa5a31

File tree

5 files changed

+45
-32
lines changed

5 files changed

+45
-32
lines changed

README.md

Lines changed: 44 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Efficiently generate cryptographically strong random strings of specified entrop
1414
- [Custom Characters](#CustomCharacters)
1515
- [Efficiency](#Efficiency)
1616
- [Custom Bytes](#CustomBytes)
17+
- [Entropy Bits](#EntropyBits)
1718
- [TL;DR 2](#TLDR2)
1819

1920
### <a name="Installation"></a>Installation
@@ -86,9 +87,7 @@ Custom characters may be specified. Using uppercase hexadecimal characters:
8687

8788
> 16E26779479356B516
8889
89-
Convenience functions `smallID`, `mediumID`, `largeID`, `sessionID` and `token` provide random strings for various predefined bits of entropy.
90-
91-
Small ID represents a potential of 30 strings with a 1 in a million chance of repeat:
90+
Convenience functions `smallID`, `mediumID`, `largeID`, `sessionID` and `token` provide random strings for various predefined bits of entropy. For example, a small id represents a potential of 30 strings with a 1 in a million chance of repeat:
9291

9392
```js
9493
import {Random} from 'entropy-string'
@@ -97,7 +96,9 @@ Small ID represents a potential of 30 strings with a 1 in a million chance of re
9796
const string = random.smallID()
9897
```
9998

100-
OWASP session ID using base 32 characters:
99+
> DpTQqg
100+
101+
Or, to generate an OWASP session ID:
101102

102103
```js
103104
import {Random} from 'entropy-string'
@@ -108,24 +109,13 @@ OWASP session ID using base 32 characters:
108109

109110
> nqqBt2P669nmjPQRqh4NtmTPn9
110111
111-
OWASP session ID using [RFC 4648](https://tools.ietf.org/html/rfc4648#section-5) file system and URL safe characters:
112-
```js
113-
import {Random, charSet64} from 'entropy-string'
114-
115-
const random = new Random(charSet64)
116-
const string = random.sessionID()
117-
```
118-
119-
> HRU1M7VR5u-N6B0Xo4ZSjx
120-
121-
Base 64 character, 256-bit token
122-
112+
Or perhaps you need an 256-bit token using [RFC 4648](https://tools.ietf.org/html/rfc4648#section-5) file system and URL safe characters:
123113
```js
124114
import {Random, Entropy, charSet64} from 'entropy-string'
125115

126116
const random = new Random(charSet64)
127117

128-
const string = random.string(bits)
118+
const string = random.token()
129119
```
130120

131121
> t-Z8b9FLvpc-roln2BZnGYLZAX_pn5U7uO_cbfldsIt
@@ -134,15 +124,15 @@ Base 64 character, 256-bit token
134124

135125
### <a name="Overview"></a>Overview
136126

137-
`entropy-string` provides easy creation of randomly generated strings of specific entropy using various character sets. Such strings are needed when generating, for example, random IDs and you don't want the overkill of a GUID, or for ensuring that some number of items have unique identifiers.
127+
`entropy-string` provides easy creation of randomly generated strings of specific entropy using various character sets. Such strings are needed as unique identifiers when generating, for example, random IDs and you don't want the overkill of a GUID.
138128

139-
A key concern when generating such strings is that they be unique. To truly guarantee uniqueness requires either deterministic generation (e.g., a counter) that is not random, or that each newly created random string be compared against all existing strings. When ramdoness is required, the overhead of storing and comparing strings is often too onerous and a different tack is needed.
129+
A key concern when generating such strings is that they be unique. Guaranteed uniqueness, however,, requires either deterministic generation (e.g., a counter) that is not random, or that each newly created random string be compared against all existing strings. When ramdoness is required, the overhead of storing and comparing strings is often too onerous and a different tack is chosen.
140130

141-
A common strategy is to replace the *guarantee of uniqueness* with a weaker but often sufficient *probabilistic uniqueness*. Specifically, rather than being absolutely sure of uniqueness, we settle for a statement such as *"there is less than a 1 in a billion chance that two of my strings are the same"*. This strategy requires much less overhead, but does require we have some manner of qualifying what we mean by, for example, *"there is less than a 1 in a billion chance that 1 million strings of this form will have a repeat"*.
131+
A common strategy is to replace the *guarantee of uniqueness* with a weaker but often sufficient *probabilistic uniqueness*. Specifically, rather than being absolutely sure of uniqueness, we settle for a statement such as *"there is less than a 1 in a billion chance that two of my strings are the same"*. This strategy requires much less overhead, but does require we have some manner of qualifying what we mean by *"there is less than a 1 in a billion chance that 1 million strings of this form will have a repeat"*.
142132

143-
Understanding probabilistic uniqueness requires some understanding of [*entropy*](https://en.wikipedia.org/wiki/Entropy_(information_theory)) and of estimating the probability of a [*collision*](https://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_collision_problem) (i.e., the probability that two strings in a set of randomly generated strings might be the same). Happily, you can use `entropy-string` without a deep understanding of these topics.
133+
Understanding probabilistic uniqueness of random strings requires an understanding of [*entropy*](https://en.wikipedia.org/wiki/Entropy_(information_theory)) and of estimating the probability of a [*collision*](https://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_collision_problem) (i.e., the probability that two strings in a set of randomly generated strings might be the same). The blog posting [Hash Collision Probabilities](http://preshing.com/20110504/hash-collision-probabilities/) provides an excellent overview of deriving an expression for calculating the probability of a collision in some number of hashes using a perfect hash with an N-bit output. The [Entropy Bits](#EntropyBits) section below discribes how `entropy-string` takes this idea a step further to address a common need in generating unique identifiers.
144134

145-
We'll begin investigating `entropy-string` by considering our [Real Need](#RealNeed) when generating random strings.
135+
We'll begin investigating `entropy-string` and this common need by considering our [Real Need](#RealNeed) when generating random strings.
146136

147137
[TOC](#TOC)
148138

@@ -188,13 +178,14 @@ Not only is this statement more specific, there is no mention of string length.
188178

189179
How do you address this need using a library designed to generate strings of specified length? Well, you don't directly, because that library was designed to answer the originally stated need, not the real need we've uncovered. We need a library that deals with probabilistic uniqueness of a total number of some strings. And that's exactly what `entropy-string` does.
190180

191-
Let's use `entropy-string` to help this developer generate 5 IDs:
181+
Let's use `entropy-string` to help this developer generate 5 hexadecimal IDs from a pool of a potentail 10,000 IDs with a 1 in a milllion chance of a repeat:
192182

193183
```js
194184
import {Random, Entropy, charSet16} from 'entropy-string'
195185

196-
const random = new Random(charSet16)
197186
const bits = Entropy.bits(10000, 1000000)
187+
const random = new Random(charSet16)
188+
198189
const strings = Array()
199190
for (let i = 0; i < 5; i++) {
200191
string = random.string(bits)
@@ -204,29 +195,29 @@ Let's use `entropy-string` to help this developer generate 5 IDs:
204195

205196
> ["85e442fa0e83", "a74dc126af1e", "368cd13b1f6e", "81bf94e1278d", "fe7dec099ac9"]
206197
207-
To generate the IDs, we first use
198+
Examining the above code,
208199

209200
```js
210201
const bits = Entropy.bits(10000, 1000000)
211202
```
212203

213-
to determine how much entropy is needed to satisfy the probabilistic uniqueness of a **1 in a million** risk of repeat in a total of **10,000** strings. We didn't print the result, but if you did you'd see it's about **45.51** bits.
214-
215-
The following line creates a `Random` instance configured to generated strings using the predefined hexadecimal characters provided by `charSet16`:
204+
is used to determine how much entropy is needed to satisfy the probabilistic uniqueness of a **1 in a million** risk of repeat in a total of **10,000** potential strings. We didn't print the result, but if you did you'd see it's about **45.51** bits. Then
216205

217206
```js
218207
const random = new Random(charSet16)
219208
```
220209

221-
Then inside a loop we used
210+
creates a `Random` instance configured to generated strings using the predefined hexadecimal characters provided by `charSet16`. Finally
222211

223212
```js
224213
const string = random.string(bits)
225214
```
226215

227-
to actually generate a random string of the specified entropy. Looking at the IDs, we can see each is 12 characters long. Again, the string length is a by-product of the characters used to represent the entropy we needed. And it seems the developer didn't really need 16 characters after all.
216+
is used to actually generate a random string of the specified entropy.
217+
218+
Looking at the IDs, we can see each is 12 characters long. Again, the string length is a by-product of the characters used to represent the entropy we needed. And it seems the developer didn't really need 16 characters after all.
228219

229-
Finally, given that the strings are 12 hexadecimals long, each string actually has an information carrying capacity of `12 * 4 = 48` bits of entropy (a hexadecimal character carries 4 bits). That's fine. Assuming all characters are equally probable, a string can only carry entropy equal to a multiple of the amount of entropy represented per character. `entropy-string` produces the smallest strings that *exceed* the specified entropy.
220+
Given that the strings are 12 hexadecimals long, each string actually has an information carrying capacity of `12 * 4 = 48` bits of entropy (a hexadecimal character carries 4 bits). That's fine. Assuming all characters are equally probable, a string can only carry entropy equal to a multiple of the amount of entropy represented per character. `entropy-string` produces the smallest strings that *exceed* the specified entropy.
230221

231222
[TOC](#TOC)
232223

@@ -496,6 +487,28 @@ Note the number of bytes needed is dependent on the number of characters in our
496487

497488
[TOC](#TOC)
498489

490+
### <a name="EntropyBits"></a>Entropy Bits
491+
492+
Thus far we've avoided the mathematics behind the calculation of the entropy bits required to specify a risk that some number random strings will not have a repeat. As noted in the [Overview](#Overview), the posting [Hash Collision Probabilities](http://preshing.com/20110504/hash-collision-probabilities/) derives an expression, based on the well-known [Birthday Problem](https://en.wikipedia.org/wiki/Birthday_problem#Approximations), for calculating the probability of a collision in some number of hashes (denoted by `k`) using a perfect hash with an output of `M` bits:
493+
494+
![Hash Collision Probability](images/HashCollision.png)
495+
496+
There are two slight tweaks to this equation as compared to the one in the referenced posting. `M` is used for the total number of possible hashes and an equation is formed by explicitly specifying that the expression in the posting is approximately equal to `1/n`.
497+
498+
More importantly, the above equation isn't in a form conducive to our entropy string needs. The equation was derived for a set number of possible hashes and yields a probability, which is fine for hash collisions but isn't quite right for calculating the bits of entropy needed for our random strings.
499+
500+
The first thing we'll change is to use `M = 2^N`, where `N` is the number of entropy bits. This simply states that the number of possible strings is equal to the number of possible values using `N` bits:
501+
502+
![N-Bit Collision Probability](images/NBitCollision.png)
503+
504+
Now we massage the equation to represent `N` as a function of `k` and `n`:
505+
506+
![Entropy Bits Equation](images/EntropyBits.png)
507+
508+
The final line represents the number of entropy bits `N` as a function of the number of potential strings `k` and the risk of repeat of 1 in `n`, exactly what we want. Furthermore, the equation is in a form that avoids really large numbers in calculating `N` since we immediately take a logarithm of each large value `k` and `n`.
509+
510+
[TOC](#TOC)
511+
499512
### <a name="TLDR2"></a>TL;DR 2
500513

501514
#### Take Away

images/EntropyBits.png

30.2 KB
Loading

images/HashCollision.png

4.31 KB
Loading

images/NBitCollision.png

4.34 KB
Loading

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "entropy-string",
3-
"version": "2.2.1",
3+
"version": "2.2.2",
44
"description": "Efficiently generate cryptographically strong random strings of specified entropy from various character sets.",
55
"main": "entropy-string.js",
66
"directories": {

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy