Yet Another 10 Common Mistakes Java Developers Make When Writing SQL

(Sorry for that click-bait heading. Couldn’t resist ;-) )

We’re on a mission. To teach you SQL. But mostly,we want to teach you how to appreciate SQL. You’ll love it!

Getting SQL right or wrong shouldn’t be about that You’re-Doing-It-Wrong™ attitude that can be encountered often when evangelists promote their object of evangelism. Getting SQL right should be about the fun you’ll have once you do get it right. The things you start appreciating when you notice that you can easily replace 2000 lines of slow,hard-to-maintain,and ugly imperative (or object-oriented) code with 300 lines of lean functional code (e.g. using ),or even better,with 50 lines of SQL.

We’re glad to see that our blogging friends have started appreciating SQL,and most specifically,window functions after reading our posts. For instance,take

So,after our previous,very popular posts:

… we’ll bring you:

Yet Another 10 Common Mistakes Java Developers Make When Writing SQL

And of course,this doesn’t apply to Java developers alone,but it’s written from the perspective of a Java (and SQL) developer. So here we go (again):

1. Not Using Window Functions

After all that we’ve been preaching,this must be our number 1 mistake in this series.  of them all. They’re so incredibly useful,they should be the number one reason for anyone to switch to a better database,e.g. PostgreSQL:

If free and/or Open Source is important to you,you have absolutely no better choice than using  (and you’ll even get to use the free ,if you’re a Java developer).

And if you’re lucky enough to work in an environment with Oracle or SQL Server (or DB2,Sybase) licenses,you get even more out of your new favourite tool.

We won’t repeat all the window function goodness in this section,we’ve blogged about them often enough:

The Cure:

Remove MySQL. Take a decent database. And start playing with window functions. You’ll never go back,guaranteed.

2. Not declaring NOT NULL constraints

This one was already part of a previous list where we claimed that you should add as much metadata as possible to your schema,because your database will be able to leverage that metadata for optimisations. For instance,if your database knows that a foreign key value inBOOK.AUTHOR_ID must also be contained exactly once in AUTHOR.ID,then a whole set of optimisations can be achieved in complex queries.

Now let’s have another look at NOT NULL constraints. If you’re using Oracle,NULL values will not be part of your index. This doesn’t matter if you’re expressing an IN constraint,for instance:

But what happens with a NOT IN constraint?

Due to NULL,there is a slight risk of the second query unexpectedly not returning any results at all,namely if there is at least one NULL value as a result from the subquery. This is true for all databases that get SQL right.

But because the index on nullable_column doesn’t contain any NULLvalues,Oracle has to look up the complete content in the table,resulting in a FULL TABLE SCAN. Now that is unexpected! Details about this can be seen .

The Cure:

Carefully review all your nullable,yet indexed columns,and check if you really cannot add a NOT NULL constraint to those columns.

The Tool:

If you’re using Oracle,use this query to detect all nullable,yet indexed columns:

Example output:

And then, fix it!

(Accidental criticism of Maven is irrelevant here ;-) )

If you’re curious about more details,see also these posts:

3. Using PL/SQL Package State

Now,this is a boring one if you’re not using Oracle,but if you are (and you’re a Java developer),be very wary of PL/SQL package state. Are you really doing what you think you’re doing?

,e.g.

FUNCTION next_n RETURN NUMBER;
END pkg;

CREATE OR REPLACE PACKAGE BODY pkg IS
FUNCTION next_n RETURN NUMBER
IS
BEGIN
n := n + 1;
RETURN n;
END next_n;
END pkg;

Wonderful,so you’ve created yourself an in-memory counter that generates a new number every time you call pkg.next_n. But who owns that counter? Yes,the session. Each session has their own initialised “package instance”.

But no,it’s probably not the session you might have thought of.

We Java developers connect to databases through connection pools. When we obtain a JDBC Connection from such a pool,we recycle that connection from a previous “session”,e.g. a previous HTTP Request (not HTTP Session!). But that’s not the same. The database session (probably) outlives the HTTP Request and will be inherited by the next request,possibly from an entirely different user. Now,imagine you had a credit card number in that package…?

Not The Cure:

Nope. Don’t just jump to using  packages

FUNCTION next_n RETURN NUMBER;
END pkg;

Because:

So,don’t.

Not The Cure:

I know. PL/SQL can be a beast. It often seems like such a quirky language. But face it. Many things run much much faster when written in PL/SQL,so don’t give up,just yet. Dropping PL/SQL is not the solution either.

The Cure:

At all costs,try to avoid package state in PL/SQL. Think of package state as of static variables in Java. While they might be useful for caches (and constants,of course) every now and then,you might not actually access that state that you wanted. Think about load-balancers,suddenly transferring you to another JVM. Think about class loaders,that might have loaded the same class twice,for some reason.

Instead,pass state as arguments through procedures and functions. This will avoid side-effects and make your code much cleaner and more predictable.

Or,obviuously,persist state to some table.

4. Running the same query all the time

Master data is boring. You probably wrote some utility to get the latest version of your master data (e.g. language,locale,translations,tenant,system settings),and you can query it every time,once it is available.

At all costs,don’t do that. You don’t have to cache many things in your application,as modern databases have grown to be extremely fast when it comes to caching:

  • Table / column content
  • Index content
  • Query / materialized view results
  • Procedure results (if they’re deterministic)
  • Cursors
  • Execution plans

So,for your average query,there’s virtually no need for an ORM second-level cache,at least from a performance perspective (ORM caches mainly fulfil other purposes,of course).

But when you query master data,i.e. data that never changes,then,network latency,traffic and many other factors will impair your database experience.

The Cure:

Please do take 10 minutes, ,and use its ,that ships with various built-in invalidation strategies. Choose time-based invalidation (i.e. polling),choose ,or NOTIFY for event-based invalidation,or just make your cache permanent,if it doesn’t matter. But don’t issue an identical master data query all the time.

… This obviously brings us to

5. Not knowing about the N+1 problem

You had a choice. At the beginning of your software product,you had to choose between:

So,obviously,you chose an ORM,because otherwise you wouldn’t be suffering from “N+1″. What does “N+1″ mean?

. Essentially,you’re running:

-- And then,for each book:
SELECT FROM author WHERE id = ?
SELECT
FROM author WHERE id = ?
SELECT * FROM author WHERE id = ?

Of course,you could go and tweak your hundreds of annotations to correctly prefetch or eager fetch each book’s associated author information to produce something along the lines of:

But that would be an awful lot of work,and you’ll risk eager-fetching too many things that you didn’t want,resulting in another performance issue.

Maybe,you could upgrade to JPA 2.1 and use the new @NamedEntityGraph to express beautiful annotation trees like this one:

. Hantsy then goes on explaining that you can use the above beauty through the following statement:

Let us all appreciate the above application of JEE standards with all due respect,and then consider…

The Cure:

You just listen to the wise words at the beginning of this article and replace thousands of lines of tedious Java /  code with a couple of lines of SQL. Because that will also likely help you prevent another issue that we haven’t even touched yet,namely selecting too many columns as you can see in these posts:

Since you’re already using an ORM,this might just mean resorting to native SQL – or maybe you manage to express your query with JPQL. Of course,we agree with Alessio Harri in believing that you should use jOOQ together with JPA:

The Takeaway:

While the above will certainly help you work around some real world issues that you may have with your favourite ORM,you could also take it one step further and think about it this way. After all these years of pain and suffering from the ,the JPA 2.1 expert group is now trying to tweak their way out of this annotation madness by adding more declarative,annotation-based fetch graph hints to JPQL queries,that no one can debug,let alone maintain.

The alternative is simple and straight-forward SQL. And with Java 8,we’ll add functional transformation through the Streams API. .

But obviuosly,your views and experiences on that subject may differ from ours,so let’s head on to a more objective discussion about…

6. Not using Common Table Expressions

While common table expressions obviously offer readability improvements,they may also offer performance improvements. Consider the following query that I have recently encountered in a customer’s PL/SQL package (not the actual query):

So what does this do? This essentially converts a payment’s amount from one currency into another. Let’s not delve into the business logic too much,let’s head straight to the technical problem. The above query results in the following execution plan (on Oracle):

The actual execution time is negligible in this case,but as you can see,the same objects are accessed again and again within the query. This is a violation of Common Mistake #4: Running the same query all the time.

The whole thing would be so much easier to read,maintain,and for Oracle to execute,if we had used a common table expression. From the original source code,observe the following thing:

-- Joining currencies and exchange_rates twice:
FROM currencies c,exchange_rates e

So,let’s factor out the payment first:

-- Then,we simply don't need to repeat the
-- currencies / exchange_rates joins twice
FROM payment p
JOIN currencies c ON p.cur_id = c.id
JOIN exchange_rates e ON e.cur_id = p.cur_id
AND e.org_id = p.org_id

Note,that we’ve also replaced table lists with ANSI JOINs as suggested 

You wouldn’t believe it’s the same query,would you? And what about the execution plan? Here it is!

No doubt that this is much much better.

The Cure:

If you’re lucky enough and you’re using one of those databases that supports window functions,chances are incredibly high (100%) that you also have common table expression support. This is another reason for you to migrate from MySQL to PostgreSQL,or appreciate the fact that you can work on an awesome commercial database.

Common table expressions are like local variables in SQL. In every large statement,you should consider using them,as soon as you feel that you’ve written something before.

The Takeaway:

Some databases (e.g. ,or ) also support common table expressions for DML statements. In other words,you can write:

This makes DML incredibly more powerful.

7. Not using row value expressions for UPDATEs

. They’re very readable and intuitive,and often also promote using certain indexes,e.g. in PostgreSQL.

But few people know that they can also be used in an UPDATE statement,in most databases. Check out the following query,which I again found in a customer’s PL/SQL package (simplified again,of course):

So this query takes a subquery as a data source for updating two columns,and the third column is updated “regularly”. How does it perform? Moderately:

Let’s ignore the full table scans,as this query is constructed. The actual query could leverage indexes. But T is accessed twice,i.e. in both subqueries. Oracle didn’t seem to be able to apply in this case.

To the rescue: row value expressions. Let’s simply rephrase our UPDATE to this:

Let’s ignore the funny,Oracle-specific double-parentheses syntax for the right hand side of such a row value expression assignment,but let’s appreciate the fact that we can easily assign a new value to the tuple (n,s) in one go! Note,we could have also written this,instead,and assign x as well:

As you will have expected,the execution plan has also improved,and T is accessed only once:

The Cure:

Use row value expressions. Where ever you can. They make your SQL code incredibly more expressive,and chances are,they make it faster,as well.

Note that the above is supported by . This is the moment we would like to make you aware of this cheap,in-article advertisement:

jOOQ - The best way to write SQL in Java

;-)

8. Using MySQL when you could use PostgreSQL

To some,this may appear to be a bit of a hipster discussion. But let’s consider the facts:

  • MySQL claims to be the “most popular Open Source database”.
  • PostgreSQL claims to be the “most advanced Open Source database”.

Let’s consider a bit of history. MySQL has always been very easy to install,and it has had a great and active community. This has lead to MySQL still being the RDBMS of choice with virtually every web hoster on this planet. Those hosters also host PHP,which was equally easy to install,and maintain.

BUT!

We Java developers tend to have an opinion about PHP,right? It’s summarised by this image here:

The PHP Hammer

Well,it works,but how does it work?

The same can be said about MySQL. MySQL has always worked somehow,but while commercial databases like Oracle have made tremendous progress both in terms of query optimisation and feature scope,MySQL has hardly moved in the last decade.

Many people choose MySQL primarily because of its price (USD $ 0.00). But often,the same people have found MySQL to be slow and quickly concluded that SQL is slow per se – without evaluating the options. This is also why all NoSQL stores compare themselves with MySQL,not with Oracle,the database that has been winning the  benchmarks almost forever. Some examples:

While the last article bluntly adds “(and other RDBMS)” it doesn’t go into any sort of detail whatsoever,what those “other RDBMS” do wrong. It really only compares MongoDB with MySQL.

The Cure:

We say: Stop complaining about SQL,when in fact,you’re really complaining about MySQL. There are at least four very popular databases out there that are incredibly good,and millions of times better than MySQL. These are:

(just kidding about the last one,of course)

The Takeaway:

Don’t fall for agressive NoSQL marketing.     company,even if MongoDB continues to disappoint,technically.

  .

Both companies are solving a problem that few people have. They’re selling us niche products as commodity,making us think that our real commodity databases (the RDBMS) no longer fulfil our needs. They are well-funded and have big marketing teams to throw around with blunt claims.

In the mean time,  just got even better,and you,as a reader of this blog / post,are about to bet on the winning team :-)

… just to cite  once more:

The Disclaimer:

This article has been quite strongly against MySQL. We don’t mean to talk badly about a database that perfectly fulfils its purpose,as this isn’t a black and white world. Heck,you can get happy with SQLite in some situations. MySQL,being the cheap and easy to use,easy to install commodity database. We just wanted to make you aware of the fact,that you’re

reference:http://java.dzone.com/articles/yet-another-10-common-mistakes

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


navicat查看某个表的所有字段的详细信息 navicat设计表只能一次查看一个字段的备注信息,那怎么才能做到一次性查询表的信息呢?SELECT COLUMN_NAME,COLUMN_COMMENT,COLUMN_TYPE,COLUMN_KEY FROM information_schema.CO
文章浏览阅读4.3k次。转载请把头部出处链接和尾部二维码一起转载,本文出自逆流的鱼yuiop:http://blog.csdn.net/hejjunlin/article/details/52768613前言:数据库每天的数据不断增多,自动删除机制总体风险太大,想保留更多历史性的数据供查询,于是从小的hbase换到大的hbase上,势在必行。今天记录下这次数据仓库迁移。看下Agenda:彻底卸载MySQL安装MySQL_linux服务器进行数据迁移
文章浏览阅读488次。恢复步骤概要备份frm、ibd文件如果mysql版本发生变化,安装回原本的mysql版本创建和原本库名一致新库,字符集都要保持一样通过frm获取到原先的表结构,通过的得到的表结构创建一个和原先结构一样的空表。使用“ALTER TABLE DISCARD TABLESPACE;”命令卸载掉表空间将原先的ibd拷贝到mysql的仓库下添加用户权限 “chown . .ibd”,如果是操作和mysql的使用权限一致可以跳过通过“ALTER TABLE IMPORT TABLESPACE;”命令恢_alter table discard tablespace
文章浏览阅读225次。当MySQL单表记录数过大时,增删改查性能都会急剧下降,可以参考以下步骤来优化:单表优化除非单表数据未来会一直不断上涨,否则不要一开始就考虑拆分,拆分会带来逻辑、部署、运维的各种复杂度,一般以整型值为主的表在千万级以下,字符串为主的表在五百万以下是没有太大问题的。而事实上很多时候MySQL单表的性能依然有不少优化空间,甚至能正常支撑千万级以上的数据量:字段尽量使用TINYINT、SMALLINT、MEDIUM_INT作为整数类型而非INT,如果非负则加上UNSIGNEDVARCHAR的长度只分配_开发项目 浏览记录表 过大怎么办
文章浏览阅读1.5k次。Mysql创建、删除用户MySql中添加用户,新建数据库,用户授权,删除用户,修改密码(注意每行后边都跟个;表示一个命令语句结束):1.新建用户登录MYSQL:@>mysql -u root -p@>密码创建用户:mysql> insert into mysql.user(Host,User,Password) values("localhost_删除mysql用户组
MySQL是一种开源的关系型数据库管理系统,被广泛应用于各类应用程序的开发中。对于MySQL中的字段,我们需要进行数据类型以及默认值的设置,这对于数据的存储和使用至关重要。其中,有一个非常重要的概念就是MySQL字段默认字符串。 CREATE TABLE `my_...
MySQL是一个流行的开源关系型数据库管理系统,广泛应用于Web应用程序开发、数据存储和管理。在使用MySQL时,正确设置字符集非常重要,以确保数据的正确性和可靠性。 在MySQL中,字符集表示为一系列字符和字母的集合。MySQL支持多种字符集,包括ASCII、UTF...
MySQL存储函数 n以内偶数 MySQL存储函数能够帮助用户简化操作,提高效率,常常被用于计算和处理数据。下面我们就来了解一下如何使用MySQL存储函数计算n以内的偶数。 定义存储函数 首先,我们需要定义一个MySQL存储函数,以计算n以内的偶数。下...
MySQL是一个流行的关系型数据库管理系统,基于客户机-服务器模式,可在各种操作系统上运行。 MySQL支持多种字符集,不同的字符集包括不同的字符,如字母、数字、符号等,并提供不同的排序规则,以满足不同语言环境的需求。 //查看MySQL支持的字符集与校对规...
在MySQL数据库中,我们有时需要对特定的字符串进行截取并进行分组统计。这种操作对于数据分析和报表制作有着重要的应用。下面我们将讲解一些基本的字符串截取和分组统计的方法。 首先,我们可以使用substring函数对字段中的字符串进行截取。假设我们有一张表stude...
MySQL提供了多种字符串的查找函数。下面我们就一一介绍。 1. LIKE函数 SELECT * FROM mytable WHERE mycolumn LIKE 'apple%'; 其中"apple%"表示以apple开头的字符串,%表示任意多个字符...
MySQL 是一种关系型数据库管理系统,广泛应用于各种不同规模和类型的应用程序中。在 MySQL 中,处理字符串数据是很常见的任务。有时候,我们需要在字符串的开头添加一定数量的 0 ,以达到一定的位数。比如,我们可能需要将一个数字转换为 4 位或 5 位的字符串,不足的...
MySQL是一种流行的关系型数据库管理系统,支持多种数据类型。以下是MySQL所支持的数据类型: 1. 数值型数据类型: - TINYINT 保存-128到127范围内的整数 - SMALLINT 保存-32768到32767范围内的整数 - MEDIU...
MySQL中存储Emoji表情字段类型 在现代互联网生态中,表情符号已经成为人们展示情感和思想的重要方式之一,因此将表情符号存储到数据库中是一个经常出现的问题。MySQL作为最流行的开源关系型数据库管理系统之一,也需要能够存储和管理这些表情符号的字段类型。 UT...
MySQL是一种关系型数据库管理系统。在MySQL数据库中,有多种不同的数据类型。而其中,最常见的数据类型之一就是字符串类型。在MySQL中,字符串类型的数据通常会被存储为TEXT或VARCHAR类型。 首先,让我们来看一下VARCHAR类型。VARCHAR是My...
MySQL字符串取整知识详解 MySQL是一种开源的关系型数据库管理系统,广泛应用于各个领域。在使用MySQL过程当中,我们经常需要对数据进行取整操作。本文将介绍如何使用MySQL字符串取整来处理数据取整问题。 什么是MySQL字符串取整? MySQL...
使用MySQL进行数据存储是现代应用程序开发中一个非常重要的组成部分。在MySQL中,数据存储的一个重要特点就是字符长度无限制。在下文中,我们将会详细探讨MySQL字符长度无限制的特征和优势。 什么是MySQL字符长度无限制? MySQL字符长度无限制是指在...
在MySQL中,常常会涉及到字符串和数字之间的比较。然而它们有着不同的排序规则,因此需要注意对它们进行正确的比较。 首先我们来看一下数字比较。 SELECT 1 < 2; -- 返回 1 SELECT 2 > 1; -- 返回 1 SELEC...
MySQL是一种流行的关系型数据库管理系统,可以处理各种不同类型的数据。其中字符串是MySQL中最重要的数据类型之一,因为它可以存储各种不同的数据,例如邮件地址、文本信息、数字等等。在MySQL中,有时候我们需要将字符串按照某个符合进行分隔,例如将一条包含多个数字的字符...
在MySQL中,我们经常需要将字符串与变量拼接起来,以便满足数据操作的需求。可以使用CONCAT函数来进行字符串与变量的拼接,下面是一个使用CONCAT函数的例子: SELECT CONCAT('Hello', ' ', 'world'); 这个例子...