Click here to Skip to main content
14,981,861 members
Articles / Database Development / SQL Server / SQL Server 2016
Article
Posted 11 Aug 2016

Stats

23.3K views
20 bookmarked

Masking Sensitive Data with SQL Server 2016

Rate me:
Please Sign up or sign in to vote.
4.92/5 (17 votes)
15 Aug 2016CPOL5 min read
How sensitive data in SQL Server 2016 can be masked from the outer world using Dynamic Data Masking
In this article, we will understand what Dynamic Data masking is and how it helps to eliminate all the chances of unmasked data getting revealed.

Data Masking

Introduction

At the time of writing this article, it’s been more than 2 months since the most secure and powerful SQL Server yet released for the public. It has many features which you can use to make your data and application more secure, more maintained and easily retrieved as per your requirements. It has introduced both client side security such as Always Encrypted as well as server side security such as Data masking, etc. In this post, we’ll examine how Dynamic Data Masking can help secure our data.

Earlier, we were masking the sensitive data by applying logic at application end or by replacing characters at SQL Server end and then returning the masked data to the application. In the former case, the risk was at:

  1. When application fails to mask data
  2. Sensitive data transferred over network
  3. All the users in SQL Server with SELECT permission still can access the unmasked data

In the later case, the first two risks are eliminated but still there is a chance that the sensitive data can be revealed by the SQL Server user through which the application retrieves data.

  1. When application fails to mask data
  2. Sensitive data transferred over network
  3. All the users in SQL Server with SELECT permission still can access the unmasked data

So, let’s quickly jump into the next section to understand what Dynamic Data masking is and how it helps to eliminate all these chances of unmasked data getting revealed.

Data Masking

If you search for the definition of Data Masking over the web, you’ll end up with varied descriptions. The simplest and straight forward definition can be:

“Data masking is the process of hiding original data with random characters or data.”
--wikipedia

If we look further, there are various data masking techniques available such as:

  • Shuffling: Shuffling order of the characters in the value.
    Example: 12345 -----> 35312
  • Nulling: Substituting characters in the value with null (hash) symbol.
    Example: 12345 -----> ###45
  • Substitution: Substituting value with another value from a substitution table.
    Example: Suvendu Giri -----> John Ptak
  • Masking Out: Masks the complete data or a selective part of the data.
    Example: suvendu@mydomain.com -----> suvendu@xxxx.xxx
    etc.

Considering many techniques of data masking, they can be categorized into three types:

  1. Static Data Masking: Original data in production changed to masked data
  2. On-the-fly Data masking: Copies data from one source to another source and masking done on the later
  3. Dynamic data masking: Masking done dynamically at runtime

Dynamic Data Masking

“Dynamic data masking limits sensitive data exposure by masking it to non-privileged users. Dynamic data masking helps prevent unauthorized access to sensitive data by enabling customers to designate how much of the sensitive data to reveal with minimal impact on the application layer. It’s a data protection feature that hides the sensitive data in the result set of a query over designated database fields, while the data in the database is not changed.”
--- MSDN

So, dynamic data masking alters the result for non-privileged users while streaming and not with data in the production database.

Data Masking in MySQL

As far I know, till the recent version of MySQL, there is no inbuilt mechanism to provide dynamic data masking capabilities. The possible options for MySQL users seems to be:

  1. Use 3rd party tools
  2. Use custom scripts/functions

Examples:

SQL
SELECT '****' AS PhoneNumber
FROM `Employees`
--Input: 1234567890        Output: ****
SQL
SELECT REPEAT('*', CHAR_LENGTH(PhoneNumber) - 6) AS PhoneNumber
FROM `Employees`
--Input: 1234567890                        Output: **********
SQL
SELECT CONCAT(SUBSTR(PhoneNumber, 1, 4), _
              REPEAT('*', CHAR_LENGTH(PhoneNumber) - 4)) AS PhoneNumber
FROM `Employees`
--Input: 1234567890                        Output: 1234******

Data Masking in PostgreSQL

Same as MySQL. It seems they are still lacking this feature. However, we can make use of some custom scripts or user defined functions.

Azure SQL Database too supports the dynamic data masking similar to SQL Server and I have heard from my friends that Oracle provides some kind of mechanism for data masking, although I haven’t ever used it.

Dynamic Data Masking in SQL Server 2016

There are four masking functions available in SQL Server 2016 to provide separate ways to mask your data when they are returned by a query. These are:

Default

When you want all the characters to be masked, you can use Default masking function. The characters for masking and number of masked characters to be returned by a query is determined by the following logic:

  • XXXX – four Xs if data length is more than or equal to four
  • 0 in case of numeric datatype
  • 0 in case of binary data type
  • 01/01/1900 for datetime datatypes

Syntax:

SQL
MASKED WITH (FUNCTION = 'default()')

Creating table:

SQL
CREATE TABLE [dbo].[Employee](
[SecretCode] VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),
-- other fields

Modifying column:

SQL
ALTER TABLE [dbo].[Employees]
ALTER COLUMN [SecretCode] ADD MASKED WITH (FUNCTION = 'default()');

Example:

Let’s check these functionalities using Table variable.

SQL
DECLARE @Employees AS TABLE
(
     SecretCode VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),
     NetSalary MONEY MASKED WITH (FUNCTION = 'default()'),
     DateOfBirth DATETIME MASKED WITH (FUNCTION = 'default()')               
)

INSERT INTO @Employees
SELECT 'ABCDEFGHIJ', 80000,'1990-05-21'

SELECT * FROM @Employees

Result:

SecretCode        NetSalary            DateOfBirth
xxxx              0.00                 1900-01-01 00:00:00.000

NOTE: You need to check this with a non-admin user as admin users are granted to see all the data. So, if you will execute this using an admin user, then you’ll see no difference. Let’s say your non-admin user is ‘demouser’, then add the following line at the top of the query.

SQL
EXECUTE AS USER='DemoUser'

Email

It partially masks the characters of an email id. The interesting thing to notice is, it masks characters as well as the length of the email id making it impossible to predict the email id from the few displayed characters.

Syntax:

SQL
MASKED WITH (FUNCTION = 'email()')

Example:

Let’s modify the same example to add an emailed column as follows:

SQL
DECLARE @Employees AS TABLE
(
     SecretCode VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),
     NetSalary MONEY MASKED WITH (FUNCTION = 'default()'),
     DateOfBirth DATETIME MASKED WITH (FUNCTION = 'default()'),
     EmailID VARCHAR(50) MASKED WITH (FUNCTION = 'email()')
)
SQL
INSERT INTO @Employees
SELECT 'ABCDEFGHIJ', 80000,'1990-05-21', 'suvendugiri@mydomain.in'

SELECT * FROM @Employees

Result:

SecretCode    NetSalary     DateOfBirth                EmailID
xxxx          0.00          1900-01-01 00:00:00.000    sXXX@XXXX.com

Partial

It masks the data based on custom inputs to the function. It masks the data partially. Three parameters need to be passed to this masked function such as:

  1. Prefix: Number of characters that has to be shown from the start
  2. Padding: Characters that you need to show in between prefix and suffix
  3. Suffix: Number of characters that has to be shown from the end

Syntax:

SQL
MASKED WITH (FUNCTION = 'partial(prefix,padding,suffix)')

Example:

Further adding a new column to the same example as:

SQL
DECLARE @Employees AS TABLE
(
      SecretCode VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),
      NetSalary MONEY MASKED WITH (FUNCTION = 'default()'),
      DateOfBirth DATETIME MASKED WITH (FUNCTION = 'default()'),
      EmailID VARCHAR(50) MASKED WITH (FUNCTION = 'email()'),
      FullName VARCHAR(50) MASKED WITH (FUNCTION = 'partial(2,"***",2)')
)

INSERT INTO @Employees
SELECT 'ABCDEFGHIJ', 80000,'1990-05-21', 'suvendugiri@mydomain.in','Suvendu Shekhar Giri'

SELECT * FROM @Employees

Result:

SecretCode   NetSalary   DateOfBirth              EmailID         FullName
xxxx         0.00        1900-01-01 00:00:00.000  sXXX@XXXX.com   Su***ri

Random

Masks any numeric value with a random value based on the provided upper and lower boundary values as parameters to the masked function.

Syntax:

SQL
MASKED WITH (FUNCTION = ‘random(lower_bound,upper_bound)

Example:

SQL
DECLARE @Employees AS TABLE
(
     SecretCode VARCHAR(10) MASKED WITH (FUNCTION = 'default()'),
     NetSalary MONEY MASKED WITH (FUNCTION = 'default()'),
     DateOfBirth DATETIME MASKED WITH (FUNCTION = 'default()'),
     EmailID VARCHAR(50) MASKED WITH (FUNCTION = 'email()'),
     FullName VARCHAR(50) MASKED WITH (FUNCTION = 'partial(2,"***",2)'),
     Age INT MASKED WITH (FUNCTION = 'random(90,100)')
)

INSERT INTO @Employees
SELECT 'ABCDEFGHIJ', 80000,'1990-05-21', 'suvendugiri@mydomain.in','Suvendu Shekhar Giri',32

SELECT * FROM @Employees

Result:

SecretCode  NetSalary  DateOfBirth              EmailID        FullName  Age
xxxx        0.00       1900-01-01 00:00:00.000  sXXX@XXXX.com  Su***ri   93

If you want a user to see the values stored in database same as admin, you can grant access to the user using the following command:

SQL
GRANT UNMASK TO DemoUser

You can again apply the restriction to see the data masked by following command:

SQL
REVOKE UNMASK TO DemoUser

To remove the masked function from a column, you can drop it similar to other constraints like:

SQL
ALTER TABLE Employees
ALTER COLUMN FullName DROP MASKED

Conclusion

Dynamic Data Masking in SQL Server 2016 is really a promising feature and will help in avoiding writing so many custom scripts to mask the sensitive data. The role/access based viewing data makes perfect sense and is very useful. Currently, the access for a user can be given on a database. It would be great if there can be a mechanism to give this access based on table or schema.

Please leave your feedback, comments or suggestions in the comments section below.

Thanks for reading. :)

History

  • 12th August, 2016: First published

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Suvendu Shekhar Giri
Software Developer
India India
No Biography provided

Comments and Discussions

 
GeneralMy vote of 5 Pin
Renju Vinod11-Sep-16 20:17
professionalRenju Vinod11-Sep-16 20:17 
GeneralRe: My vote of 5 Pin
Suvendu Shekhar Giri11-Sep-16 21:20
professionalSuvendu Shekhar Giri11-Sep-16 21:20 
GeneralMy vote of 5 Pin
Surya(10980329)8-Sep-16 22:55
MemberSurya(10980329)8-Sep-16 22:55 
GeneralRe: My vote of 5 Pin
Suvendu Shekhar Giri9-Sep-16 0:58
professionalSuvendu Shekhar Giri9-Sep-16 0:58 
SuggestionUpdate a paragraph related to Oracle database technology Pin
Jesus Carroll22-Aug-16 7:04
professionalJesus Carroll22-Aug-16 7:04 
PraiseRe: Update a paragraph related to Oracle database technology Pin
Suvendu Shekhar Giri22-Aug-16 7:12
professionalSuvendu Shekhar Giri22-Aug-16 7:12 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.