Code Kata 3 The Search for … WTF

You get a gold star if you figure out what the title is in reference. That means you have required geek viewing, something most people underestimate in importance to the profession. Now when I decided to do the Code Kata’s, I did not read them so I have no idea what is coming down the pipeline from these challenges. Code Kata 3 is well, different.

What do you mean Code Kata 3 is different?

I mean just that. It is different. It is a estimating exercise about how much data you would expect to use. The idea behind this means you won’t have to stop coding if you improve your estimation skills. I understand that but I have some issues.

First, my workflow does not include off the cuff estimations for data types while I am programing everything. Every bit of those sorts of questions should have answers before you fire up the old IDE. You need to have at least an idea of how normalized the data will be. There are times to violate the atomic principle in database design and you will miss those when you don’t talk enough to the user and give them the pros and cons of such an approach. That is the consultant coming out in me so back to the main rant.

Next, this is not waterfall before any of you wiseguys start up. Agile is not a license to play cowboy. There is still order and procedure, just on a small enough scale to allow all that structure to respond to reality. There should a a solid understanding of each backlog item before you start them.

Finally, I will still give it a go even though I think it is stupid. I have to admit that this is closer t how I do it then how the exercising is framing the reason to do it. Because you are focusing on the data structure while you are not programming. It is still more slap dash since you are not giving proper time to analysis sample data and question people smarter than you about the specific domain. Nor are we building any sort of validation of the theory into it. It’s just off the cuff insanity.

Code Kata 3 Attempt

All right, now that I have my religious zealotry out of my system let’s look at the problems. I warn you, this is going to be ugly because this is so not how I operate.

How Big

Binary digits in the following:

  • 1,000
  • 1,000,000
  • 1,000,000,000
  • 1,000,000,000,000
  • 8,000,000,000,000

Let’s see how would I solve this. I rarely have a need for binary. Most modern data types have a variable length type for a number and I just follow the ranges of the data types and apply them to the number.

Well for a 1000 it is bigger than 255 so it is more than a byte. But falls well within +-32000 or 64000 of 2 bytes. There is 8 bits in a byte so between 8 and 16 bits? If 1111101000 is binary for 1000 is 10 digits so that makes sense since 1000 is on the smaller side of 32k. Let’s see 32k in binary 111110100000000, it is also in 16 digits and I think the last digit is signage. It’s been a while since school.

So 1000 between 8-16 bits.

If we apply out the logic to the rest, we are looking at:

1,000,000 being within 4 bytes’s 2 billion-ish results so 4*8 is 32 bits. More likely a 3 byte answer so likely as small as 21 bits.

1,000,000,000 is within the 4 bytes 2 billion-ish so 32 bits again.

1,000,000,000,000 is probably 6 bitable but if we get rid of signing then I think we can get it into 32 bits.

8,000,000,000,000 I will go out on a limb here and say probably 6ish bytes but based on how data stores go I would say it will be in an 8 bit storage. So I say 64 bits will used but 48 is more probable.

Town of 20,000 Residents

20,000 residents with First Name, Last Name, Address, City, State and Zip. I really hate this one. If you wanted to be a smart ass you could normalize this thing out to an insane degree.

You could create a ZIP table so that instead of a 5-9 digit storage you could get a single byte since a town of 20,000 likely has on one or two zip codes. I know my 20,000 person town only has 2, one for the town and one for the PO Boxes. So you could normalize that down to 5 digits for the two zips and a single bit for each record.

Since few towns skirt past a single state line and to my knowledge all that do are only between two states. So you could do the same sort of insanity and use a single bit as a primary key above since this is a database of a single town. So two bytes for the state code and a single bit for the state code.

Then if you want to get really crazy, you could mine the data since it is only 20,000 people there are not that many streets for the longest street name and store that in its own table with a single byte maybe 2 for the primary key.

Then store all the address numbers in the 2 bit address since 64000 is likely as big as you need for that value.

Finally, assuming that a town of 20,000 is fairly ethnically homogeneous, you could create first and last name tables with the max number of letters being used to determine the bytes then a key of 1 or 2 bytes.

All that insanity would be slow and unwieldy to program but the old timers really loved that kind of optimization. It will be slower to access and manage but boy it will be small. Then again they could have just bought an extra terabyte of storage for less than it costs for you to spend the time optimizing it.

I realized I said I would do this kata but this is getting really long and is becoming a weird thought experiment. I think I am going to call it. You see my methodology for solving the problem. In fact, I have been just typing this things in a stream of consciousness style with stops for spelling corrections. I will leave all my typos and such in to reflect that this is just how off the cuff, my optimization brain is working on the fly.

As a side note as I am optimizing the SEO, I need my keywords in this article more. So in the Code Kata 3 we look at on the fly problem solving.

Leave a Reply

Your email address will not be published.