Databases for Developers

Blunt Instruments

When first confronted by any database system, the junior developer tends to see it at just a dumb tool to store data. Understanding of the finer nuances of data storage is something one acquires with experience and practice. Well, some do.

My journey with databases started sometime around the turn of the millenium, when I built small websites based on ASP (yes, the vbscript-based precursor to Asp.Net) and Access-Databases. At the time, Sybase had just sold out to Microsoft and SQL Server 6.5 had just been released, but MS Access was plenty. Later I graduated to SQL Server 7.0 and subsequently 2000 for bigger websites. That was around the burst of the dot-com-bubble. I spent a few years building ERP-type stuff on SS2k5/2k8, and then 11 years building a high-perf system for high schools on Oracle 10/11/12. Then I came back to SQL Server, which in the meantime had become 2017/2019. That's roughly 25 years of working with databases.

Knowing your Tool's Strengths and Limitations

A database is a tool. Much like your IDE or your keyboard. Yes, really. Unlike your keyboard, you're free to use other databases than the one you're used to - switching keyboards is just weird. But when you grow accustomed to a database system, you tend to stick with it. The reason is for this is that as you get to know your database system's capabilities, you also learn what you can't do with it, and you learn how to work around those limitations. 

Disclaimer: I worked on Oracle 10/11, and very little on 12. The things I write will pertain to 10/11, and may very well have changed since. The take-aways are still valid.

Memory usage

One of the features I like about Oracle is the Query Result Cache (which probably has some fancy tm-name). In short, if you repeat a query - and Oracle somehow detects that the result will not have changed between runs - it'll serve the result out of a cache. This is immensely useful for application that repeat certain elements over and over. The caveat is of course that you need RAM - and plenty of it. Oh, and money. Oracle licenses are insanely expensive. But If you have the money you get a lot of speed. If your query result is not in cache, the database will have to read from disk - which isn't necessarily slow, but you will want some snappy SSDs for this.

SQL Server on the other hand uses RAM to store copies of your data for faster access. There is no Query Result Cache. When you run your query on SQL Server, data is read from disk into RAM, then queried. If you repeat the query, data is read directly from RAM, if SQL Server sees that data hasn't changed. Not the query result. But the query can be repeated on data which is in RAM, which is very fast - if your query is fast, that is. No amount of RAM or CPU will ever compensate for a poorly written query - unlike with Oracle, where you only pay once for poor SQL penmanship. The caveat to this, is that once you've read enough data from disk to fill the RAM, things will start to get ejected from the cache on a usage-based schedule. So, again you need RAM. And money, because SQL Server licenses are insanely expensive - though not as insane as Oracle ones.

How do I use this knowledge

Knowing how the database works is crucial to your application design. On Oracle you'll want to write queries with very little variance. The more parameters, the more RAM you need to cache the results. On SQL Server you can afford many parameters, but you need to make sure the database can store data in RAM - I tend to use the phrase "The Working Set" - which is the 95% of data in the database that is queried on a regular basis. If The Working Set is too large to fit in RAM, you're going to see cache misses and subsequent IOs.

If you're limited in how many parameters you can use in your query - and do consider the cardinality of said parameters - you will come to realize, that you should in fact not query the database, and in stead use an application-level cache. You could just "Get Table" and store it in memory somewhere, and do your lookups there. In fact GetEntireTableIntoRAM().Where(r => r.SomeNonIndexedProperty == myValue) could be faster on Oracle than GetRowFromTableBySomeNonIndexedProperty(myValue) for 10.000s of different values, simply because GetEntireTableIntoRAM() will be a single query with one result to be stored in cache, where GetRowFromTableBySomeNonIndexedProperty() will force disk-access for each of the 10000 different values. Provocative, huh?

On SQL Server the above example wouldn't work as well. GetEntireTableIntoRam() would in fact also store everything in RAM on the database, but GetRowFromTableBySomeNoneIndexedProperty() will also be queried from RAM in stead of disk, so you don't pay that much of a penalty.

"But, but ... what's the use of a database then?", you might ask. "Structure" is the answer, and I'll get back to that. Eventually

Recap: Give that database some RAM!

When using a database, you need to realize what it is really good at, and where it really sucks. SQL Server is quite good at general purpose queries, but you need to give it some memory. Oracle is very good at repeated workloads, but you need to give it some memory. In both cases, if your queries tend to be very varied, you need to construct your application in such a way, that it makes the database look good. Write queries that utilize the database memory in stead of saturating it and causing cache misses. A database is only as fast as you let it be, and not giving it enough RAM is one of the ways you make things go slow.

Next time: Queries that don't perform

 

Comments are closed