Wednesday, January 25, 2006

Emulating check constraints

This blog post has moved. Please find it at:

http://www.mpopp.net/2006/01/emulating-check-constraints/.

7 comments:

rpbouman said...

Elegant depends on your requirements. So far, all attempts to emulate this fall into either one of two categories:

PREVENTIVE APPROACH
This approach avoids bad data from being entered into the database in the first place. Arjen Lentz showed how to do this in a declarative way using views:

http://arjen-lentz.livejournal.com/49881.html

(by Arjen Lentz)

The CHECK OPTION of the view prevents bad data from being inserted. Users should have access only to the view, and not to the underlying table.

Functionally, the same result could be achieved by performing all inserts, updates etc using procedures. You'd lose the ability to work with entire data sets of course, but if the requirements allow it, sp's are probably easier to extend to accomodate future change requests.

The downside of the preventive approach is that data is just silently discarded. In most cases, this will be unacceptable. Having the application check if the insert or update did in fact succeed could be acceptable, but you're still not in a position to know *WHY* the data was rejected. This will usually make you want to have the application check the integrity of the data anyway, making some people wonder why to even bother.

(If somebody is in fact wondering, I think I can offer a good reason why to do this apparent duplication.
Checks on the database server side have to do with keeping the data in a valid state. This MUST always be ensured. The server forms the single leading point of definition. Later on, a new or another application might be built. Although you might be duplicating the code to check the data in that app to, you will never have to worry that the data in the database gets fouled up.

Checks on the application's side should NEVER be relied on to ensure the integrity of the database. However, checks at the application's side can be useful for the comfort of the end-user, and for maintaining state in the application. For example, a form in a web application should always validate data as early as possible, but at least before submitting. This prevents a frustrating user experience. Users will not keep trying again and agiain, waiting for the server to tell them what's wrong with their subitted data. Instead, they will just quit using the application. So, even if you'd be able to use check constraints, you should still also check the data before it is sent to the database. Of course, not everything can always be checked in the application, but an attempt should be made to be as complete as possible)


REJECTIVE APPROACH
When you detect an invalid situation, you can always execute statement that's sure to fail. So far, most emulations that I've seen favour this rejective approach:

http://gilfster.blogspot.com/2005/11/check-constraints-in-mysql-50.html and http://mysql.gilfster.com/page.php?parent_id=2&page_id=2.0.7

(By Scott Maxwell, presented by Andrew Gilfrin)

(I hinted on such a solution too:
http://rpbouman.blogspot.com/2005/11/using-udf-to-raise-errors-from-inside.html
)

Detection can be done inside a trigger, but the stored procedure approach can be employed here too, if the requirements allow it.

The elegancy problem always boils down to the choice of statement that is ensured to fail.

In Scott Maxwell's article, a table is especially created to force unique key violations. Key violations are provoked with a special procedure that accepts a string. The string is inserted twice, and the second insert fails with a key violation. The nice thing is that the duplicate value is reported as part of the error message, making it potentially possible to devise some special handling. At the very least, the client that performs the action can extract the reason for failure from the error message.

In your example, you wanted to write a preventive solution, but you ended up with a rejective one. In your case, the limitations on modifying the data in the trigger table is exploited. The disadvantage is that the limitation might be lifted someday, which would turn your now rejective solution into a then preventive solution. This could pose a problem for some applications.

So, forget elegancy for now - think about wheter this possible alteration of behaviour is acceptable.

(A problem with all the rejective approaches is that the generated error has nothing to do with the actual integrity check. So, you will always have to think of a way of your own to track what error exactly occurred. For example, you could use special user variables, such as @my_error_number and @my_error_message. These would be set each time just after the integrity is being violated, just before forcing the raising of an exception using you favourit hack - mutating trigger table, unique key violation or UDF call. Of course, these variables should be cleared too right after handling the error)

rpbouman said...

For completeness: just want to add that you could also have a corrective approach - this will almost never be possible.

Example: a field must alwayse be in uppercase. Trigger would simple upper the result and insert that value

Markus Popp said...

Hey - that's great ... could be a good article for its own ;-). Thank you!

rpbouman said...

Thanks, but it seems I jumped the gun on the CHECK OPTION views:

"The downside of the preventive approach is that data is just silently discarded."

is simply not true. bad, bad of me. Check out Partha Dutta's feedback on Markus' entry to see my assertion proved wrong:

http://pdutta.homelinux.com/index.php?title=emulating_check_constraints_another_meth&more=1&c=1&tb=1&pb=1

Markus Popp said...

However, it's great to get that amount of feedback (including Partha's of course) and new things to try out. Using views does really seem to be a better solution than triggers - although I still think that *real* check constraints would be the best solution. Probably in 5.2, I guess.

Unknown said...

If I remember well, if you use 'InnoDB' table type, you should be able to use such constraint directly in MySQL (I'm sur for foreign key constraint but I think it's also the case for check constraint)

Anonymous said...

@Thibs:
"The CHECK clause is parsed but ignored by all storage engines. See Section 1.7.5.4, “Foreign Keys”." doesnt matter which storage engine you use...
(this is stated in manual for 5.0, 5.1 and 6.0...)

Nice work with workarounds just a shame they dont support in a pretty nice standard way :)