The case

Modern data warehousing solutions are often highly distributed and cloud-based. Users, usually authenticated via security mechanisms available in solutions like Active Directory, should have secure access to the data warehouse or a Data Lake, hosted anywhere and on any platform (Windows, Linux). Technologies like Apache Spark or Microsoft Azure HDInsight do not provide an easy way to configure the use of Windows Integrated Authentication (WIA). Some other data sources do not support WIA at all.

The solution

Querona greatly simplifies the process of using an Apache Spark cluster as a secure data warehouse. It comes with a bundled stand-alone Apache Spark 2 instance that may be easily migrated to an external cluster. We will show how to use Querona as a secure authentication proxy for an Apache Spark data warehouse. We assume that we have an existing Northwind database on a standalone or cloud-based Spark cluster (for example Azure HDInsight) and the Querona Spark driver is already installed there. Installation of Querona driver is very straight-forward operation – you need to run a shell script that will automatically execute on head nodes and install Querona Spark application. Please note that Querona fully supports loading data into Apache Spark and we will write about it in a separate blog post. Let’s create a virtual database and import the metadata from the Apache Spark cluster. In Querona, go to Databases and click Add Database:
Add database in Querona
Fill in the details and select “Add new connection” to at the same time create a virtual database and an underlying connection:
Creating oracle connection
On the next screen, fill in all necessary details and click Test Connection to ensure everything works as expected:
Database connection established
Now Querona will download all metadata from the source database – select all tables and finish the wizard by clicking Create:
Import metadata form database to Querona
The newly created virtual database appears on the list:
Virtual databases list
Now, let’s grant an existing Windows user access to the virtual database and configure role membership. In Querona, go to Administer -> User Management and click Add User. In the Login field enter the domain\login-name of the selected user, check Integrated authentication, and select a Regular User type:
Add user to Querona
On the next screen, select the (default) public role and finish by clicking Create:
Assign role to user
The new user should appear on the list. Now we need to assign the user to our virtual database. Go back to Databases and drill through Databases -> Spark-Northwind -> Access Rights -> Select User. Pick the user from the list:
Assign user to database
Let’s give the user CRUD permissions  together with View definition and click Save:
Access rights to database
Now, assuming the user is logged into Windows, let’s use SQL Server Management Studio to connect to Querona via TDS using Integrated Authentication:
Connection to Querona in SQL Server Management Studio
After successful connection, expand Databases and you should see your database; drilling down will allow you to view the structure:
Querona databases in SQL Management Studio
Let’s run a sample query:
Sample query from Querona
Now, let’s try to drop a table:
Access Management works in Querona
As expected, we get an Access Denied message. And voila! We have just secured an internal or external Apache Spark cluster using Windows Integrated Authentication provided by Querona.