Wirtting query in reserved word safe: any possible solution?

Reserved word conflict is an always pain for multiple DB backend supported system: Drupal also face this for many years (e.g. http://drupal.org/node/371, since June 30, 2002). This is the most complicated bottleneck for daily maintenance and introduce any new DB backend to Drupal: whenever existing DB vendor change their standard, we need to follow it; whenever a new DB backend introduce, another set of reserved words are coming in, too.

There is many possible solution, e.g. prevent use of ANY reserved words based on requirement of ALL supported database engines (as like as Moodle's implementation); BTW, as this solution will left issue as ALWAYS OPEN, I am not going to discuss within this review.

Another possibility is the use of escape characters, e.g. `` for MySQL, "" for PostgreSQL/Oracle/DB2 and [] for MSSQL. This approach is widely used for many successful projects, e.g. phpmyadmin, phppgadmin, Oracle/DB2/MSSQL EM. The primary consideration is: How to implement this syntax for Drupal, which is simple and elegant for both end user and DB abstraction designer?

For sure that we will need some update for existing Drupal queries syntax, where the ultimate goal is query will finally present as following example before process by PHP database connection (e.g. SELECT batch FROM {batch} WHERE bid = :bid AND token = :token from line 15 of includes/batch.inc, CVS HEAD, rewrite with TNG DB standard). In case of MySQL, after translation it should looks like:

SELECT `batch` FROM `batch` WHERE `bid` = :bid AND `token` = :token

I will discuss different syntax change approach, based on: a) user experience, b) similar to ANSI coding style, c) backward compatibility, d) backend complexity, e) overall performance and f) workload for syntax conversion. A simple marking will be given for each item (e.g. good/functioning: 2; fine/not bad: 1; poor/not functioning: 0) so we can have some simple idea between different proposals.

Here is a quick rundown of my review:

Solution 1: Only use one set of bracket ({} or [] only) => 6 pt.

For this format we will only use one set of bracket, e.g. SELECT {batch} FROM {batch} WHERE {bid} = :bid AND {token} = :token or SELECT [batch] FROM [batch] WHERE [bid] = :bid AND [token] = :token

a) user experience => 2
User only need to remember one set of syntax, easy to understand.
b) similar to ANSI coding style => 2
Similar as native RAW SQL syntax under MySQL (with pair up ``) or PostgreSQL (with pair up "").
c) backward compatibility => 1
In this format we will tread pair up bracket as escape character placeholder, which is a bit different from original escape table prefix style.
d) backend complexity => 0
The main problem of this syntax is: we have no way to classify table name from other constrain name, unless preform some complex regex which phase SQL command meanings. The replace of table prefix become a very complicated work when compare with current {} syntax.
e) overall performance => 0
Complex regex always result as greatly performance degrade, especially when it must preform for every single query. This seems like we are working for some SQL syntax phaser...
f) workload for syntax conversion => 1
Not too bad for this solution, if we choice {} only.

Overall comment: Not a bad idea for user since very easy to follow. Just need to think pair up brackets as escape character placeholder directly. BTW, backend complexity and performance degrade may be a great pain for all of us.

Solution 2: Mix up both {} and [] syntax => 8 pt.

This format will keep {} syntax as table prefix placeholder PLUS escape character placeholder (double definition), which [] syntax ONLY use for column or constrain name as escape character placeholder (single definition), e.g. SELECT [batch] FROM {batch} WHERE [bid] = :bid AND [token] = :token

a) user experience => 1
The mix up syntax result as a complex meaning. New comer from RAW SQL may confuse about when, where and how to use each bracket. The 1 mark is given as reference to 5a, but shouldn't belongs to 2 marks when compare with 4a.
b) similar to ANSI coding style => 0
Every DB engine always prefer to use single escape character within single query, even some of them support multiple, e.g. MSSQL support both "" and [], but never mix them up within MSSQL EM.
c) backward compatibility => 2
We only need to take care column and constrain name escape. Table name is already safe.
d) backend complexity => 1
As we have 2 set of syntax, we need to have 2 phasing procedure. We need to revamp table prefix replacement function as duplicate meaning, and other new function for [] replacement only. Not complex, but for sure not elegant.
e) overall performance => 2
Simple regex is enough for this solution. Performance degrade shouldn't be a main concern.
f) workload for syntax conversion => 2
Relatively simple as we only consider column and constraint name only.

Overall comment: We trade user experience in this case for simple implementation and performance. For sure it is friendly to DB abstraction layer designer, but duplicated meaning of brackets result as a bad outlook of SQL. DB abstraction is try to simplify variation for normal developer, but not making them confuse. I don't really like this approach.

Solution 3: Single set of bracket, but making column escape special => 3 pt.

Table escape will keep as [tbl_name] or {tbl_name}, where all column escape revamp as [tbl_name.col_name] or {tbl_name.col_name}, e.g. SELECT {b.batch} FROM {batch} b WHERE {b.bid} = :bid AND {b.token} = :token. Perform magical handling for column name escape so we will not inject table prefix for it, where syntax for table name escape become duplicate meaning with both table prefix and escape character placeholder.

a) user experience => 0
For classify column name, we always need to use short_hand.column_name syntax, even for very simple SQL. Is that really meaningful for normal developers? I don't think so...
b) similar to ANSI coding style => 0
Think about bracket syntax as escape character and you may feel how strange when compare with ANSI: we NEVER write SQL with "b.batch" or `b.batch`! It is not even a valid syntax! (P.S. correct syntax should be `database_name`.`table_name`.`column_name` in case of MySQL with reserved word safe; check phpmyadmin operation and get more useful references)
c) backward compatibility => 1
We only need to take care column and constrain name escape. Table name is already safe.
d) backend complexity => 1
The magical handling for column is not complex, where simple regex is already enough. We need to revamp table prefix replacement function as duplicate meaning, too. Not complex, but for sure not elegant.
e) overall performance => 1
Magical handling for column escape and duplicated table escape may result as a little performance degrade. Anyway, regex can still keep as relatively simple when compare with SQL syntax phaser.
f) workload for syntax conversion => 0
All core queries should review its column name syntax, plus the new bracket syntax for escape. It is a complicated work.

Overall comment: The worst solution. It is far away from ANSI standard, complicate and duplicate backend handling, where don't reduce workload for SQL syntax upgrade for TNG DB abstraction but even introduce more concern. It may seems to be friendly at first sight; BTW, making Drupal's SQL syntax as such special won't give us expected benefit. When migrate other OSS project progress to Drupal, we will understand its weakness.

Solution 4: Single set of bracket, but making table name special => 10 pt.

Table escape will revamp as [@tbl_name], where all column escape revamp as [col_name], e.g. SELECT [batch] FROM [@batch] WHERE [bid] = :bid AND [token] = :token. First search for [@tbl_name] and replace table prefix, e.g. [prefix_tbl_name]. Then replace all pair up brackets as escape characters.

a) user experience => 2
Simple and clear. The meaning of brackets is very clear as escape characters placeholder, where drp_ (or something other) will always as table prefix placeholder.
b) similar to ANSI coding style => 2
I can't even find the weakness from this point of view. Pair up brackets really do its duty well.
c) backward compatibility => 1
As table syntax is now revamp from {tbl_name} as @tbl_name (from table prefix replacement point of view), the backward compatibility is quite weak. We need some temporary solution before all core queries are review and revamped.
d) backend complexity => 2
Table prefix replacement need revamp but still very simple. Escape character replace can well preform with simple regex.
e) overall performance => 2
Performance for table prefix should keep as existing implementation, where regex for escape character replacement is already proved as very minor. This solution shouldn't have much performance impact.
f) workload for syntax conversion => 1
We need some effort for table name revamp; BTW, it is still simple enough when compare with above solutions.

Overall comment: Simple and clean syntax, close to ANSI standard, simple backend implementation and high performance. Revamp table name is for sure better than that of column name, from many point of view. If we are design for something completely new, this should be a good approach.

Solution 5: Keep existing table name syntax, where add [] as escape character placeholder only => 11 pt.

Simply add [] syntax as escape character placeholder, e.g. SELECT [batch] FROM [{batch}] WHERE [bid] = :bid AND [token] = :token. Beyond existing table prefix replacement, replace all pair up brackets as escape characters.

a) user experience => 1
When compare with drp_table syntax, this solution is for sure less friendly. BTW, for existing Drupal developer it is not such complicated for understand.
b) similar to ANSI coding style => 2
[{tbl_name}] => "prefix_tbl_name" or [col_name] => "col_name". Close to ANSI standard, isn't it?
c) backward compatibility => 2
As we keep table name syntax, queries without new [] will ALWAYS work fine as like as existing: without [] just means as not always safe from reserved word conflict, but none of drawback for MySQL and PostgreSQL.
d) backend complexity => 2
Table prefix replacement need revamp but still very simple. Escape character replace can well preform with simple regex.
e) overall performance => 2
Table prefix replacement keep unchanged so no extra workload for it. Escape character replacement can handle with simple regex which come with nearly none of performance degrade.
f) workload for syntax conversion => 2
Only need to go though each queries and simply add [] for almost everything. It is even simpler then drp_table syntax conversion.

Overall comment: The most benefit for this solution is backward compatibility and user experience for existing Drupal developer, when compare with drp_table syntax: They come with similar meaning and performance. If we try to keep most implementation unchanged but simply make everything as reserved word conflict safe, this should be the best approach.

Conclusion?

I try to compare most possible solution horizontally with marking, and let this works more fair. BTW, please remember that personal comment always come with bias... You don't need to agree all of my point of view: just take it as reference, and give more indeed thinking for what you are trusted with. Comment is welcome since it always result as a more solid and complete solution.

From my point of view, solution 4 (i.e. [drp_table] and [column] syntax) and 5 (i.e. only add [] as escape character placeholder) should be the most suitable handling. They keep the SQL syntax clean for end user, where also simple for implementation, too. Keep everything as simple as possible is always a good approach for problem solving: simple coding always less buggy than that of complicated.

The correctness of using $db_prefix with db_name is not my duty (http://drupal.org/node/302327). If everyone agree for that invalid syntax, just let it be. All patches above come with a functional hack of this exceptional case. Again the look and feel of RAW SQL syntax is not my main concern: the variation just affect feeling of normal developer; since both syntax can finalize as reserved word safe query with [] syntax, Oracle/DB2/MSSQL drivers can do their internal work well. Just choose your preferred syntax and take it easy.


table prefixing is usually database prefixing

dalin's picture

Keep in mind that the prefixing done by {} is more commonly prefixing with a database name instead of a table prefix. ex. {users} becomes master_db.users . I think this may impact some of your assessments on the similarity to ANSI syntax and also on the complexity of regex replacements. For example if you rewrite master_db.users as `master_db.users` is that even going to work or do you need to have `master_db`.`users` ?

Also I disagree with your assessment of 2a. I would give it a 1 or maybe even a 2 which bumps that solution to be comparable to solutions 4 or 5.

I guess this may be a misunderstanding for prefixTables()

hswong3i's picture

prefixTables() is target for table prefix replacement, but not for database name replacement. In case of Drupal, whenever we need to switch database connect, call of db_set_active() is required; therefore we ALWAYS connect to single database at once and never do cross database SQL.

On the other hand, based on ANSI standard as I remember, ANSI-compatible database engine should support database_name.table_name.column_name syntax. In case of MySQL, this format should written as `database_name`.`table_name`.`column_name` for reserved word safe. Try to use phpmyadmin and keep focus on what its example SQL is generated after each operation, which may give you a good example (i.e if syntax written as `database_name.table_name.column_name`, MySQL should never work for it and complain with syntax error).

For 2a I must agree with you: it should come with 1 mark. When compare with 5a, as they both come with mix up syntax so should all belongs to 1 mark; when compare with 4a, 4a only use single syntax which is much elegant and so should belong to 2 marks. Personally I will not suggest for solution 2 since the backend implementation is less elegant when compare with that of 4/5; but as a relatively fair competition mark review for solution 2 is a must.

----------------------------------------
Edison Wong

Wrong

Larry Garfield's picture

I can't say I've ever used Drupal's database prefixing to add a database name. I've only ever used it to share tables between two sites running on the same database, which is a scary thing to be doing in the first place. I don't actually know of anyone doing what you describe in practice, although it does work at present AFAIK.

Hm, I will learn that.

Pcmaster's picture

Hm, I will learn that.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <h1> <h2> <h3> <h4> <h5> <h6> <em> <strong> <code> <del> <blockquote> <q> <sub> <p> <br> <ul> <ol> <li> <dl> <dt> <dd> <a> <b> <u> <i> <sup> <acronym> <pre> <img>
  • Lines and paragraphs break automatically.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • Images can be added to this post.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.