Summary

At the first glance, SQL’s SELECT statement seems pretty straightforward to be used to retrieve a set of rows from a database. However, the full syntax of SELECT statement is really complex, and using all (or even most) of them properly is a big plus for applications in terms of performance issues. These series of articles give you necessary information to help you play this orchestra, magnificently.

Introduction

SQL Server (pronounced, see-cool) queries are a buried treasure of SQL Server. Most of the people out there prefer to deal with easy-to-follow-manageable queries, and this way, they sacrifice SQL Server's speed by introducing unnecessary round-trips between the database and the web server. Today, we are going to talk about most of the aspects of SQL Server queries, and show you how to create complicated SELECT statements. Yeah, today’s paper is completely dedicated to SELECT statement of MS-SQL Server 2000. Thanks to Microsoft that ships the pubs and Northwind databases with its SQL Server so that we do not have to start from scratch to create a database for today's paper. Just open it up, and follow my guidelines.

SELECT Syntax

SQL introduced the SELECT statement to retrieve data from tables. By retrieving data, we mean fetching one or many rows as well as columns from one or many tables of one or more databases! Actually, the SELECT statement is divided into two main parts, namely, first clause and second clause:

SELECT select_list /*this is the first clause*/ 
  FROM table_source(s) /*this is the second clause*/

Where select_list is the name of one (or more) column(s) and table_source is the name of one (or more) table(s) to be retrieved by the query. That’s enough for now. Launch your SQL Server snap-in, go to the pubs database, and select the ‘tables’ item from the left panel, so that the right-pane will be refreshed with the table names already included by pubs database. Right-click on Authors table, and from the context menu already opened, select ‘return all rows’ under the ‘open table’ item. This causes the SQL Server to return all the rows already stored in the Authors table. Now, click the Show/Hide SQL pane button in the toolbar.

We are now ready to start our exploration. As far as you see, Authors table contains 9 columns, au_id, au_lname, au_fname, phone, address, city, state, zip, and contract respectively. In the SQL pane, write the following SELECT statement:

SELECT state FROM authors

To execute the query, left-click on the “brown exclamation mark” in the toolbar. What you get then is the state column of the Authors table:

state
-----
CA
CA
CA
CA
CA
KS
CA
CA
CA
CA
TN
OR
CA
MI
IN
CA
CA
CA
MD
CA
CA
UT
UT
(23 row(s) affected)

If you look precisely, you will notice that there are plenty of authors already settled in CA. What if we would like to see how many states our authors come from? In other words, we don’t want to get CA 15 times, UT 2 times and the like. To do so, we use the DISTINCT keyword:

SELECT DISTINCT state FROM authors

state
----- 
CA
IN
KS
MD
MI
OR
TN
UT
(8 row(s) affected)

As you see, DISTINCT keyword eliminates duplicate rows from the result set. Now, it is time to count the number of rows returned to the client by the above-mentioned query. This is done by an aggregate function, COUNT:

SELECT COUNT(DISTINCT state) AS NUMBER_OF_ITEMS FROM authors

NUMBER_OF_ITEMS
----------------------
8
(1 row(s) affected)

What the heck AS keyword is supposed to do in above query? Well, AS keyword is just provided to name a derived column. A derived column is a column that has no name. To understand this better, let’s rewrite the above-mentioned query this way:

SELECT COUNT(DISTINCT state) FROM authors

This results in the following result set:

----------- 
8
(1 row(s) affected)

You see, the column has got no name! Such a column is called a derived column in SQL Server. There are two ways to name a derived column:

Use the AS keyword (as described above).
Use the assignment operator (as follows).

SELECT NUMBER_OF_ITEMS = COUNT(DISTINCT state) FROM authors

NUMBER_OF_ITEMS
----------------------
8
(1 row(s) affected)

However, both are supposed to do the same thing. Anyhow, the above-mentioned SQL statement shows us that our authors come from 8 different states. Neat, huh?

Now, it is time to gear from derived columns to the aggregate functions. An aggregate function is a routine that processes a result set and produces a single value; i.e., you cannot expect the COUNT procedure to return a result set! This is not applicable by aggregate functions! They are only supposed to return a given statistic to the user. There are plenty of aggregate functions including but not limited to AVG, MAX, MIN, SUM, and STDEV that computes the average, maximum, minimum, summary, and the standard deviation of given rows, respectively. The important thing to remember is that NULL values are eliminated by aggregate functions and therefore have got no effect on them.

To understand this better, let us switch to the ‘titles’ table. As far as you could see, Titles table has got a column named price. Suppose that we are going to calculate the total price of the titles we have got already in stock. To do so, we simply execute the following query:

SELECT SUM(price) AS TOTAL FROM titles

TOTAL
--------------------- 
236.2600
(1 row(s) affected)

However, please pay attention that there were 2 NULL valued prices that have been left intact by the SUM aggregate function. It is enough about aggregations, for now. Let us switch back to our Authors table and continue the exploration of the first clause of the SELECT statement.

Let’s say that the authors’ specifications have been entered in the Authors table as soon as they registered with us, one by one so that the first person whose name is already in the table has been registered with us sooner than the second author already settled in the second row. Now, let’s say we need to list the first name and the last name of the 5 first lucky authors who have registered with us. To do so, we shall use the TOP keyword with our SELECT statement:

SELECT TOP 5 au_lname, au_fname FROM authors

au_lname au_fname
----------------------------------- -------------------- 
Bennet Abraham
Blotchet-Halls Reginald
Carson Cheryl
DeFrance Michel
del Castillo Innes
(5 row(s) affected)

Instead of asking for the first 5 rows directly, we could also ask the SQL Server to retrieve the first x percent of the rows. For instance, to obtain the top 7% of the rows, we shall query this way:

SELECT TOP 7 PERCENT au_lname, au_fname FROM authors

au_lname au_fname
----------------------------------- -------------------- 
Bennet Abraham
Blotchet-Halls Reginald
(2 row(s) affected)

Since there were totally 23 records in Authors table, we will get 2 rows by specifying 7% of rows, since:

23 * 7 / 100 = 1.61
round(1.61) = 2

So, we have got 2 rows affected!

Now, consider that we wish to echo the first name and the last name of our authors in an ASP page. We have got two choices. First, we could fetch the first name and the last name of authors using a RecordSet object in ASP and echo them as follows:

Response.Write(oRS(“au_fname”) & “ “ & oRS(“au_lname”))

Second, we could ask the SQL Server to retrieve the first name and the last name separated by a space as a single column:

SELECT au_fname + SPACE(1) + au_lname AS au_fullname FROM authors

au_fullname
-------------------------------------
Abraham Bennet
Reginald Blotchet-Halls
Cheryl Carson
Michel DeFrance
Innes del Castillo
Ann Dull
Marjorie Green
Morningstar Greene
Burt Gringlesby
Sheryl Hunter
Livia Karsen
Charlene Locksley
Stearns MacFeather
Heather McBadden
Michael O'Leary
Sylvia Panteley
Albert Ringer
Anne Ringer
Meander Smith
Dean Straight
Dirk Stringer
Johnson White
Akiko Yokomoto
(23 row(s) affected)

Where SPACE function returns a string of repeated spaces. For example, if you would like to distinct the first name of the author from the second name by inserting 3 spaces between them (instead of 1), you can write the following SQL statement:

SELECT au_fname + SPACE(3) + au_lname AS au_fullname FROM authors

Anyway, after doing so, we could write the following code to echo the authors’ full name on screen:

Response.Write(oRS(“au_fullname”))

Now the question is that which way shall we prefer to the other and why. Before answering this question, I kindly ask you to remember two simple guidelines, now and forever:

Do not fetch unnecessary columns if you really do not want to use them.
Do whatever processing (as far as applicable) on database server side, not the poor client and/or web server side.

If you violate one of these rules, you simply introduce unnecessary round-trips between the database and the web server. I will explain all of these completely in this article later. For now, just remember those two simple guidelines.

In the example above, when you retrieve two columns separately, you are trying to move unnecessary result sets between the web server and the calling process. However, when you fetch the records using the second method, you simply ask the SQL Server to return back those two columns as a single column, say, au_fullname. Since the second method follows those two guidelines, we simply go that way. There’s no ifs, ands or buts.

You can also use other SQL functions to populate the result set. For example, to get the above-mentioned result set in uppercase, we can use the UPPER function as follows:

SELECT UPPER(au_fname + SPACE(1) + au_lname) AS au_fullname FROM authors

au_fullname
------------------------------
ABRAHAM BENNET
REGINALD BLOTCHET-HALLS
CHERYL CARSON
MICHEL DEFRANCE
INNES DEL CASTILLO
ANN DULL
MARJORIE GREEN
MORNINGSTAR GREENE
BURT GRINGLESBY
SHERYL HUNTER
LIVIA KARSEN
CHARLENE LOCKSLEY
STEARNS MACFEATHER
HEATHER MCBADDEN
MICHAEL O'LEARY
SYLVIA PANTELEY
ALBERT RINGER
ANNE RINGER
MEANDER SMITH
DEAN STRAIGHT
DIRK STRINGER
JOHNSON WHITE
AKIKO YOKOMOTO
(23 row(s) affected)

Identity Columns

To continue our exploration, let us see how to retrieve an identity column. First, take a look at ‘jobs’ table. It contains an identity column, namely, job_id, that’s defined as ‘smallint’ data type. To retrieve this column, we have got two choices. First, going the traditional way:

SELECT job_id FROM jobs

which retrieves the specified column (job_id). Second, using the IDENTITYCOL keyword:

SELECT IDENTITYCOL FROM jobs

The second choice, however, could be used if you know that the Jobs table has got an identity column! If you try to execute the above query against a table that doesn’t include an identity column (e.g., authors), you will get an error.

Retrieving column names of a table

In some cases, you need to get the columns of a specific table. There are basically 3 approaches to this problem:

Use the sp_help procedure to get extended information about a database object. By database object, we mean a table, view, and the like (you can find extended information on this procedure at BOL).
You can use the following select statement to retrieve a table’s columns:
```
SELECT TOP 0 * FROM table_name
```
The third approach, which I personally prefer, is using the schema object of SQL Server. For example, to retrieve the column names of Authors table, you can use the following SELECT statement:
```
SELECT column_name FROM INFORMATION_SCHEMA.COLUMNS 
    WHERE TABLE_NAME = 'authors'
```
Which results in the following result set:
```
column_name
----------------------------
au_id
au_lname
au_fname
phone
address
city
state
zip
contract
(9 row(s) affected)
```

Naming conventions

We have already said that the SELECT statement has got the following syntax:

SELECT select_list FROM table_source(s)

What we haven’t mentioned is that how we can retrieve some columns from two (or more) different databases. To understand this, we need to understand the naming convention involved in SQL Server. Any instance of SQL Server may contain different databases. Each database in turn may contain many other tables, and each table in turn may contain one or more columns. In such circumstances, when you need to name a specific column, you have to go this way:

When we are dealing with a single database, however, we do not need to mention the full name. Instead, we use the relative names:

SELECT au_fname FROM authors

Since we are dealing with Pubs database, SQL Server manages everything else by itself. The right way to do the above-mentioned query, however, is using the USE keyword to specify what database we are going to interact with:

USE pubs
SELECT au_fname FROM authors

However, there’s no harm in using the full name too:

SELECT pubs.dbo.authors.au_fname FROM pubs.dbo.authors

This means that we are going to fetch the first name of the authors, from the Authors table in Pubs database. Now, you can combine all of these altogether to fetch the authors’ first name from the Pubs database, and the CatgoryID column from the Categories table of Northwind database:

SELECT pubs..authors.au_fname, Northwind..Categories.CategoryID
  FROM pubs..authors, Northwind..Categories

The above query emphasizes that you can omit the owner name of the table. This way, SQL Server (implicitly) considers the table owner as the one who has made the connection to DB. For example, if you have connected to a database using a specific member ID and you execute the above query, the table’s owner will be implicitly considered as the same member ID’s role. That is how SQL Server manages to retrieve the necessary columns from a database.

Table Aliases (Correlation name)

What’s an alias? An alias is a nickname used to give a “SELECT” statement readability. It is sometimes called a “correlation name” too. There are two ways to assign an alias to a table:

table_name AS table_alias
table_name table_alias

Consider the following select statement from the previous sample:

SELECT pubs..authors.au_fname, Northwind..Categories.CategoryID
   FROM pubs..authors, Northwind..Categories

To rewrite this query using aliases, you’ll need to give those two tables (pubs..authors and Northwind..Categories) two different aliases, say, a & c respectively:

SELECT a.au_fname, c.CategoryID FROM pubs..authors a, Northwind..Categories c

Please pay attention that if you give a table an alias, you will need to reference that specific table using the alias name, and therefore the following query is illegal:

SELECT pubs..authors.au_fname, c.CategoryID
  FROM pubs..authors a, Northwind..Categories c

Security Issues

Security, IMSHO, is one of those interesting issues that everyone should be aware of. You cannot store a record or two in a database server without considering the security issues. What if a role is broken and a CC# is stolen? Huh? What if an intruder creates a new account/password pair without having the appropriate permission to do so?

Considering the security issues, however, doesn't mean that you can block every single intruder. I honestly believe that whatever you do to block intruders, there will always be another man on this earth who can break the rule. So, all you can do is just try your best to block most of those intruders - Remember, you cannot block all of them!

Well! The question now is what security has got to do with the SELECT statement. To understand this, you should know that there are 3 types of permissions in SQL-SERVER:

Object permission - which involves granting or revoking permission rights to the objects. By objects, we mean data and/or stored procedures. Does a certain user have the appropriate access to a certain database to query for a specific table and/or view? Does she have the proper right to insert, update, or delete a row within a table of a certain database? This is where "Object Permission" comes into play. This kind of data manipulation requires the user to have a class of permission called “Object Permission”.
Statement permission - which involves granting or revoking permission rights to create a database and/or table within a database. CREATE TABLE, CREATE VIEW, and CREATE PROCEDURE are 3 kinds of Statement permissions introduced in SQL-Server.
Implied permission - which states that the owner of an object has got the permission to control the activities of the object. Once a “foo” user owns a “bar” table, she can add and remove data on the “bar” table. The implied permission also states that the “foo” user can control the way other users work with the “bar” table.

With this information in hand, it is now time to say that granting or revoking those three types of permissions to users is done by database administrators, and therefore a programmer doesn’t care about such permissions. So, what is left to the programmer? Well. This is what we have considered as “Security Issues” and is discussed below.

To start, consider a typical ASP application that is supposed to let an authorized user to login to the system. To develop such an ASP application, we simply create a form containing two edit boxes, one provided for user name entry and the other to get a password from the visitor:

<!—-login.asp file!-->
<form method="GET" action="login.asp">
    <table border="0" width="100%" cellspacing="0" cellpadding="5">
        <tr>
          <td width="13%"><font size="2" face="Verdana">User ID:</font></td>
          <td width="87%"><input type="text" name="UserID" size="20"></td>
        </tr>
        <tr>
          <td width="13%"><font size="2" 
              face="Verdana">Password:</font></td>
          <td width="87%"><input type="password" 
                         name="Password" size="20"></td>
        </tr>
        <tr>
          <td width="13%"></td>
          <td width="87%"><input type="submit" 
                   value="Submit" name="btnSubmit"></td>
        </tr>
    </table>
</form>

Considering “Mehdi” as the username and “mypass” as the password, the following URL is generated when the submit button is pressed:

login.asp?UserID=Mehdi&Password=mypass&btnSubmit=Submit

Therefore, to authorize a given user, we can simply write a function, namely, IsAuthorized as shown below:

Function IsAuthorized(szUser, szPassword)
    IsAuthorized = FALSE
    Set oConnection = Server.CreateObject("ADODB.Connection")
    szConnection = "provider=sqloledb; server=myserver; uid=myid; "
    szConnection = szConnection & "pwd=mypwd; database=mydb;" 
    oConnection.Open szConnection

    szSqlStatement = "SELECT * FROM Login WHERE UserID = '" & szUser
    szSqlStatement = szSqlStatement & "' AND Password = '" & szPassword & "'"

    Set oRS = Server.CreateObject("ADODB.Recordset")
    oRS.Open szSqlStatement, _
                oConnection, _
                adOpenStatic, _
                adLockOptimistic, _
                adCmdText

    If Not oRS.EOF Then
        IsAuthorized = TRUE
    End If

    oRS.Close()
    Set oRS = Nothing

    oConnection.Close()
    Set oConnection = Nothing
End Function

Now, let’s take a precise look to what we already wrote. Again, consider an intruder accessing our login page using the following user id: ' OR 1 = 1 --

What does actually happen in such circumstances? To understand this, let’s take a look at our SQL statement constructed in IsAuthorized function based on the above entry:

SELECT * FROM Login WHERE UserID = '' OR 1 = 1 --' AND Password = ''

Since two hyphen mentioned by the intruder indicates a user-provided text (i.e., comments) in SQL-92 standards, whatever comes after those hyphens is not considered as the SQL statement and therefore the above statement is simplified to:

SELECT * FROM Login WHERE UserID = '' OR 1 = 1

Well! You could guess the rest. Since 1 is equal to 1, the above-mentioned query results in all the rows already settled in the Login table. You see the intruder had neither user ID nor password but he has been identified as an authorized member.

In some circumstances, though, it will get even worst when an intruder inserts a pair of ID and password to a table. So what the heck can we do to prevent abusing the system? There are basically 3 ways to prevent such things:

Create a procedure, say, sp_is_authorized() to make the actual authorization on database side. This way, the intruder won’t be able to bypass the authorization process as described above. The specified procedure follows:
```
CREATE PROCEDURE sp_is_authorized
    @user_id char(16), @password char(16) AS
    declare @nRet int
    select @nRet = count(*) from Login where 
            UserID = @user_id AND Password = @password
    return @nRet
GO
```
We will talk about this usage of SELECT statement later in this article.

Rewrite the IsAuthorized function using the ADO parameterized queries:

Function IsAuthorized(szUser, szPassword)
    IsAuthorized = FALSE
    On Error Resume Next
    Set oConnection = Server.CreateObject("ADODB.Connection")

    szConnection = "provider=sqloledb; server=myserver; "
    szConnection = szConnection & "uid=myid; pwd=mypass; database=pubs;"

    oConnection.Open szConnection

    Set oCmd = Server.CreateObject("ADODB.Command")

    oCmd.ActiveConnection = oConnection
    oCmd.CommandText = _
      "select * from Login where UserID= ? AND Password = ?"
    oCmd.CommandType = adCmdText

    oCmd.Parameters.Append(oCmd.CreateParameter("UserID",
                                                adChar,
                                                adParamInput,
                                                16,
                                                szUser))
 
   oCmd.Parameters.Append(oCmd.CreateParameter("Password",
                                                adChar,
                                                adParamInput,
                                                16,
                                                szPassword))

    Set oRS = Server.CreateObject("ADODB.Recordset")
    Set oRS = oCmd.Execute()
    If Not oRS.EOF Then
        IsAuthorized = TRUE
    End If

    oRS.Close()
    Set oRS = Nothing

    oCmd.Close()
    Set oCmd = Nothing

    oConnection.Close()
    Set oConnection = Nothing
End Function

Construct your dynamic SQL statement after the given user ID and password is checked against the invalid characters. The most common way to do so is using the Regular Expression component, RegExp, which is introduced with IE 5.0. Since describing it goes beyond the scope of this paper, we could simply leave it to the reader.

The final thing we are going to say in this regard is that the above-mentioned situation is still in charge even when you are using the POST method (instead of GET) to submit your form. I strictly emphasize that using the POST method to submit a form is not still secure if you are not going to apply the above-mentioned conditions. There are several ways that an intruder can POST illegal information to your form, and therefore you cannot count on this, solely. So, what we already mentioned is applied to any form (either submitted by POST or GET method) that processes user input.

Conclusion

In the first episode of this series, we discussed SELECT syntax and its fundamental features. We will hopefully examine more materials in the next part of this paper, including but not limited to XML-related features and Joins.

Any comments, questions, and/or suggestions as ever before are welcomed. They really improve how things are done!