After a block of unrelated content, I have gotten back to Code Kata 5 and implemented my sample Bloom Filter. Lets dig into it and see how it rolls or filters. You know what I mean.
Implemented Bloom Filter
So the breakdown of this implementation, it starts with a class called Filter. I have not done much in the way of optimization or engineering so this is kind of as is but I think it will achieve its goals of showing you an implemented Bloom Filter. Any improvement just leave them down below. If enough people care about this we might do another post on its optimization.
Filter
using System.Collections.Generic; using System.Text; namespace CodeKata5 { public class Filter { byte[] bloom; int maxMultipler; List<string> words; public Filter(List<string> Words, int MaxMultipler) { words = Words; maxMultipler = MaxMultipler; } public static int hash(string input, int multipler) { int value = 0; byte[] temp = Encoding.ASCII.GetBytes(input); foreach (var i in temp) { value += i; } return value * multipler; } public int getArraySize() { return bloom.Length; } private int getMaxArraySize() { int maxSize = 0; int temp; foreach(string i in words) { temp = hash(i, maxMultipler); if(temp > maxSize) { maxSize = temp; } } return maxSize + 1; } private bool isBiggerThanBloomFilter(string word) { int maxSize = hash(word, maxMultipler); if (maxSize > bloom.Length) { return true; } return false; } public void buildBloomFilter() { bloom = new byte[getMaxArraySize()]; foreach (string i in words) { for(int r = 1; r <= maxMultipler; r++) { bloom[hash(i, r)] = 1; } } } public bool QueryBloomFilter(string word) { bool result = false; if (isBiggerThanBloomFilter(word)) { return false; } for (int r = 1; r <= maxMultipler; r++) { if (bloom[hash(word, r)] == 1) { result = true; } else { return false; } } return result; } } }
So lets break it down function by function. First we have the declarations. We have an array of bytes, which should be bits, to house our zero’s and one’s. Next we have the maxMultipler which is a variable indicating the maximum multiplication we will do as part of our hash. Then the words list which are a list of words that will be used to setup the filter.
Our constructor sets the words list and the maxMultipler we will deal with in a single instance of the filter.
The has function will allow a value to be hashed and multiplied by a multiplier. This will give us the number we need to deal with in the bloom filter to identify the index we will check or set.
While the getArraySize will tattle on the size of the array to the outside world, the getMaxArraySize will tell our class what the max array size will be based on the most expensive word in the list. This is why we cycle through the list of words.
We have a function in isBiggerThanBloomFilter, for checking if a word comes back false because its hash would be greater than the maximum size of the bloom filter. This just gives us a work around if a word could not be present due to its size. We can just reject it immediately.
Finally we get to the buildBloomFilter. This builds the bloom filter. Amazing how that works. We set the byte array to the maximum array size from our words list and then start looking through the words. We use the maxMultipler to determine how many hashes will will do since our hash is a multiplication factor. Then set that index to 1. In all honesty, this maxMultipler will be cleaned out and replace with a number of hashes or something akin to that in the Code Kata 5 implementation.
With all that setup done, we can do the money shot. We have the QueryBloomFilter that takes a word in its parameter list. It is check to make sure it can be in the array first because of size, this is not part of the Bloom filter, just a work around for me. If it is possible based on size to be in our array we cycle through the hashes and keep a result to determine if it passes.
First the result is false and that means a failed test. We will push through each hash and if that index in our Bloom array is equal to 1 then we set to true. If any of the test failed, the rest of the tests stop and false is returned. Everything must be true for it to be true.
Wrapping up
That’s it. A simple program to demonstrate an implemented Bloom Filter. With that out of the way. Maybe I will start playing with some hashes and implement that into our happy little program. Here is how to setup a button, in my case called btnRun in case you want to play with it.
private void btnRun_Click(object sender, EventArgs e) { List<string> words = new List<string>() { "Hi", "Bye", "Leeks", "Figs", "President", "Goats", "goats" }; Filter filter = new Filter(words, 4); filter.buildBloomFilter(); lbResults.Items.Add("Hi - " + filter.QueryBloomFilter("Hi")); lbResults.Items.Add("Bye - " + filter.QueryBloomFilter("Bye")); lbResults.Items.Add("Leeks - " + filter.QueryBloomFilter("Leeks")); lbResults.Items.Add("Figs - " + filter.QueryBloomFilter("Figs")); lbResults.Items.Add("Dog - " + filter.QueryBloomFilter("Dog")); lbResults.Items.Add("Gopher - " + filter.QueryBloomFilter("Gopher")); lbResults.Items.Add("lye - " + filter.QueryBloomFilter("lye")); lbResults.Items.Add("Reeks - " + filter.QueryBloomFilter("Reeks")); lbResults.Items.Add("Digs - " + filter.QueryBloomFilter("Digs")); lbResults.Items.Add("Goats - " + filter.QueryBloomFilter("Goats")); lbResults.Items.Add("a - " + filter.QueryBloomFilter("a")); lbResults.Items.Add("President - " + filter.QueryBloomFilter("President")); lbResults.Items.Add("!@#$%^&*() - " + filter.QueryBloomFilter("!@#$%^&*()")); lbResults.Items.Add("!@#$ - " + filter.QueryBloomFilter("!@#$")); lbResults.Items.Add("figs - " + filter.QueryBloomFilter("figs")); lbResults.Items.Add(filter.getArraySize()); }