Show HN: It's been nearly 50 years. Is SQL outdated?

5 points

3 years ago

Actually, the Q in SQL represents query. The original purpose of inventing SQL is to query (i.e., calculation), which is the main goal of SQL. However, it is hard to say that SQL is very competent when describing computing tasks. Here is an example that is actually not very complicated: calculate the maximum consecutive days that a stock keeps rising. Write it in SQL is like this:

select max (consecutive_day) from (select count(*) (consecutive_day from (select sum(rise_mark) over(order by trade_date) days_no_gain from (select trade_date, case when closing_price>lag(closing_price) over(order by trade_date) then 0 else 1 END rise_mark from stock_price ) ) group by days_no_gain) The working principle of this statement won’t be explained here, it's a little confusing anyway. You can try it yourself. Then, why these two aspects cannot be well achieved in SQL? The reason why coding is difficult is largely due to algebra. Relational algebra has been invented for fifty years. The difference between the application requirements and hardware environments of fifty years ago and today is very huge. Continuing to apply the theory of fifty years ago to solve today's problems, does it sound too outdated? However, this is the reality. Due to the large number of existing users and the lack of mature new technologies, SQL, based on relational algebra, is still the most important database language today. Although some improvements have been made in recent decades, the foundation has not changed. In the face of contemporary complex requirements and hardware environments, it is reasonable that SQL is incompetent.

And, unfortunately, this problem is at the theoretical level, and it won't help no matter how optimized it is in practice, it can only be improved in a limited way, not eradicated. Regrettably, most database developers do not think of this level, or, in order to take care of the compatibility of existing users, they do not intend to think about this level. As a result, the mainstream database industry has been going around in circles in this limited space. Now then, how to make the calculation Easier in Writing and Faster in Running?

Invent new algebra! An algebra with “multiplication”, and then design a new language based on the new algebra.

This is where SPL comes from. Its theoretical basis is no longer the relational algebra, but something called discrete dataset. The formal language designed based on this new algebra is named SPL (structured process language). Innovations against the shortcomings of SQL have been made to SPL (more precisely, innovations against various deficiencies of relational algebra have been made to the discrete dataset). SPL redefines and extends many operations of structured data, specifically, it adds the discreteness, enhances ordered computation, implements a thorough set orientation, supports object references, and advocates stepwise operation. Recoding the previous problems in SPL will give you a direct feeling.

Calculate the maximum consecutive days that a stock keeps rising:

stock_price.sort(trade_date).group@i(closing_price<closing_price[-1]).max(~.len()) Although the calculation idea is the same as the previous SQL, it is much easier to express and no longer confusing, because of the introduction of ordering characteristic.

Take the top 10 out of 100 million pieces of data:

T.groups(;top(-10,x)) SPL has richer set data types, it is easy to describe the efficient algorithm that implements simple aggregation on a single traversal, without involving big sorting action.

Due to space limitations, we will not introduce SPL (discrete dataset) in an all-round way here, If you want to learn more about SPL, please head over to github and leave a comment if you have any questions.

12 comments