The case
When the cloud computing is concerned we used to think about rapid elasticity, resource pooling or broad network access. All the points are valid but what about security? Is it really safe that data is loaded somewhere off the premises? Are all the legal rules met as well?
Querona supports many nice features: one of such features is
data encryption.
Data encryption
The untrusted connection is defined in Querona as a target which the data cannot be pushed to “as-is”, but encrypted beforehand. Querona automatically encrypts the data for you while keeping the data types untouched. String data after encryption is unreadable, but numeric data remains similar to the original data.
A few key points for data encryption in Querona:
- Types are intact – no need to rework a target table schema.
- Data looks pretty real – an ignorant reader might not notice that data is fake because dates keep the age, and numbers keep precision and scale.
- The consistency of encrypted values based on equal originals – you can still query the table because predicates will work (linkability).
- Encrypted content can be decrypted manually using built-in functions.
In the example below, we will create a cache of data in the cloud database. The goal is to store and process view data in the cloud while keeping the cached data encrypted in the target database.
The solution
Let’s create a virtual database called
ADW_source and import its metadata (you can read about it in the previous post:
“How to integrate disparate data sources in 10 minutes”).
First, we create the virtual database itself:
Our new target connection is on the local network, thus let’s treat it as
trusted. Data processed using trusted connection is not going to be encrypted.
We need only one table metadata to be imported. We use
FactCurrencyRate in our case:
Our virtual database was created:
Next step is to add a virtual database that we will use for data caching and processing. We will use Azure SQL Data Warehouse database hosted in Microsoft Azure cloud. The new database name will be
AzureDWCache_source.
This time we set the connection as
untrusted, to inform Querona that it needs to take special care of data marked as sensitive. Untrusted connection requires a seed value which forms a base encryption parameter used for this connection (and all databases base on the connection). The seed value requires an integer value within the range defined as follows: -2147483648 — 2147483647. Our connection requires also
BULK INSERT batch size parameter which we will discuss later, so now let’s leave it as it is by default.
We have no need to import metadata from the AzureDW DB, because we will instruct Querona to use that database to materialize view data. Finally, we have two databases: source with one table, and the target with no tables:
Now, before table
FactCurrencyRate is cached in the Azure cloud database, we need to mark a few columns as sensitive. Let’s go to the source table column list:
We want to mark some column as containing sensitive data. We set column
AverageRate property “Contains sensitive data” to true and save our changes:
We do the same for
EndOfDateRate and
Date columns. All three columns which hold data marked as sensitive are as follows:
To encrypt cached data on
AzureDW we need to create a view in
AzureDWCache_source DB. To do so we go to the database properties, open
Tables & views and click
Generate views menu item. We select
ADW_source as the “Source database”. We’re going to import one table:
There is one table to import.
Now,
AzureDWCache_source contains the desired view. What about the column sensitivity information? We have marked columns in source virtual database as sensitive, this information should propagate further and affect sensitivity setting of the columns in the view, right? Let’s check this out:
To make data cacheable we need to set a proper caching of the view. We select caching strategy to use one physical table:
When the source, the view and the caching mode of the view are set, we can build the cache for the first time. To do it we enter
Management tasks sub-view and run
Rebuild persistent cache task:
When the task is finished we check
Caching & Partitioning for details:
We see that cache was created and loaded. Now we know that the data is cached in the SQL Data Warehouse database in Azure. What about the cache contents? Let’s check our cloud database and connect to it directly using SQL Server Management Studio:
As you can see there is a table created named FactCurrencyRate that holds our cached data. But the data inside is different than original one coming from
ADW_source database – columns
AverageRate,
EndOfDayRate and
Date contain data which looks pretty similar to the original, but actually, only the format is the same.
How Querona handles querying such a cached view? Let’s go back to Querona and check by executing the query below:
select [CurrencyKey], [DateKey], [AverageRate], [EndOfDayRate], [Date]
from [AzureDWCache_source].[dbo].[FactCurrencyRate]
using
Query editor and see the data.
Data displayed is the same as the original, as no data encryption was performed…
NOTE:
There is also another way, more advanced, to check if the query was actually performed against the new table. This feature is called Execution plan cache and for sure will make you in love with Querona even more, so… stay tuned to Querona.io!
OK, you might think that Querona does not reverse the data encryption and gets the data directly from the source database rather than the cache in the cloud. Let’s kick the source database out of the system and delete
ADW_source database. Now the only one database left is the one based on Azure SQL Data Warehouse:
Let’s do the final check and query the cached view again:
Wow, it still works! Data looks familiar…
In this article we have shown:
- how to connect Querona to on-premise database and mark selected columns as containing sensitive information
- how to connect Querona to SQL Data Warehouse database in the Azure cloud and mark it as untrusted in Querona
- how to cache (materialize) a view containing sensitive information in an “untrusted” database
- how to verify that encryption of sensitive data actually took place and data cached (materialized) in the untrusted database was encrypted automatically by Queorna
- cached, encrypted data taken from the untrusted database, is presented by Querona to the user in it’s original, decrypted form.
Thank you and keep virtualizing!