Here we discuss and show ways to use data ingested by Azure Synapse to provide new insight into business processes and answer business questions. Then we discuss Apache Spark support along with PowerBI and Azure ML integrations.
As we discussed in the first article of this series, Azure Synapse Analytics enables us to build a complete data analytics solution from raw data to data visualizations — all while staying within the same tool. That includes all capabilities related to data warehousing, Big Data, Spark, data movement orchestration, security, and monitoring all in one place. It efficiently works with Power BI to visualize and analyze that data.
In the second article of this series, we explored how Azure Synapse Analytics supports ingesting and preparing data, saving time by avoiding custom ETL code.
Now, in this third and final installment, we examine how Azure Synapse Analytics helps data science and business intelligence (BI) professionals gain new insight using seamless tool integrations and guided experiences.
Power BI Integration
Azure Synapse Analytics integrates so tightly that we can launch Power BI directly from Synapse Studio. It is as simple as clicking New linked service then Connect to Power BI. Next, we connect to a Power BI workspace.
We do all this with the help of Pass-through Authentication (PTA), an Azure AD Connect feature enabling you to access different products without being prompted to log in and reenter credentials.
By selecting the Power BI workspace and clicking Create, we can easily link both workspaces without copying or moving data. Now, when we go to the Develop tab, we see that Power BI is one of the artifacts in the list. By expanding that entry, we see all datasets present in the Power BI workspace, and by further expanding it, we see all the reports in that workspace.
Now, let’s say we want to create a new virtual database called NYC Yellow and link it to a Power BI dataset. First, start a Power BI Desktop, and we will see all the databases within that workspace. Then, select one, click Continue to open that Power BI data source file, and connect to that dataset.
Connecting to Azure Synapse Analytics is similar to connecting to an SQL Server, so it feels like just another SQL connection.
For Power BI users and data analysts, Azure Data Explorer provides a query over data lake function. Customers can query historical data from Azure Data Lake Storage (ADLS) Gen2 and join it with the near-real-time data cached in Azure Data Explorer without ingesting the data into Azure Data Explorer.
Now we can use Power BI’s Power Query Navigator and decide whether we want to perform an import or a direct query over the data.
We can use Power BI to create a dataset and publish it to our Azure Synapse workspace. Then, we can start building a new report from that dataset in Synapse Studio. Then, when we open the report and make a change in Synapse Studio, we can go back to the Power BI workspace and see those changes reflected immediately.
Azure Synapse Analytics can query billions of data rows with fast results. However, we do not need to, and we should not work with detailed data all the time. We can work with materialized views to optimize our queries and automatically tune performance. As detailed data updates, the materialized views reflect the changes.
Azure ML Integrations
Azure Synapse Analytics gives us everything we need to ingest, manage, prepare, and enrich our data for analysis. It integrates many different Azure services, such as Power BI and Azure Machine Learning (ML).
Azure Data Lake Storage natively integrates with Azure Synapse Analytics to provide data lake storage for your data. Also, Azure Synapse Analytics can store data and manage data in tables inside SQL pools. Another option is to virtualize data, reading data directly from the source using Azure Synapse Link.
There are many ways to access data in Azure Synapse Analytics. No matter where our data is, Azure Synapse Analytics brings them all into one platform where we can analyze them together. For example, we can use Spark compute to process semi-unstructured big data stored in Parquet files in our data lake. What if we are also working with relational data stored in tables inside SQL pools (formerly called SQL Data Warehouse)? Azure Synapse Analytics brings these two types of data together. We can work with disparate data sources with a unified experience.
With Azure Synapse Studio, we can build our data integration pipelines, manage our data, monitor everything, and set up security. Additionally, we can work with this data using the SQL-based analytics runtimes in Azure Synapse Analytics.
Beyond dedicated SQL Pools, Azure Synapse Analytics enables us to work with serverless on-demand SQL pools and serverless Apache Spark pools to prepare our data for analytics.
Aggregating Different Data Source Styles
Azure Synapse Analytics enables us to integrate different services to create our solution in Azure. Accessing data is much simpler and getting value from our information is much faster — so much so that those services grant your pipelines seamless integration.
For example, our Azure Data Factory pipelines can automatically communicate with Spark pools or with Azure Data Lake Storage Gen2. All we have to do is provision an Azure Synapse workspace.
Machine Learning Capabilities in Azure Synapse Analytics
As we provision an Azure Synapse workspace, it covers machine learning capabilities out of the box, so we don’t need to link it to any other services like Azure ML or Cognitive Services. We already have data training and score capabilities natively built-in while using serverless Apache Spark pools in Azure Synapse Analytics. Also, we can use the predict keyword function to run predictions in T-SQL queries in dedicated SQL pools.
Azure Synapse Analytics provides options for audiences comfortable with code-free experiences. However, we can also perform many of the machine-learning-related capabilities through code. For example, we can use T-SQL, PySpark, Scala, or C# in Synapse notebooks. Additionally, notebooks provide guided user interface (UI) experiences within Synapse Studio by generating a code artifact whenever needed, giving users accelerated time-to-insight. Or, we can just go with a code-free experience.
Let’s explore some machine learning experiences available in Azure Synapse Analytics.
Model training with AutoML is available as a guided Ul experience. In Synapse Studio, we can use the same AutoML you use in your Azure ML workspaces. The difference is that this time we can execute on serverless Apache Spark pools inside of Azure Synapse Analytics.
Model scoring in SQL pools is another guided Ul experience, as we can deploy a model from Azure ML in Azure Synapse Analytics. For example, our data science teams may have trained some models, and we use the Azure machine learning model registry to track those models. Now we can access models from the Azure ML model registry and use T-SQL to score them in the SQL pool. Instead of moving the data out, running scoring, and moving the data back, we can score the model exactly where the data lives.
Then there is Cognitive Services, a guided Ul experience for data enrichment with Anomaly Detector and text analytics sentiment analysis. Let us say we open one of your Spark tables containing emails. Its unstructured, textual data is not suitable for SQL operations. However, it enables some interesting data analysis. We can use Synapse Studio to apply machine learning and run a sentiment analysis on those emails.
Azure Cognitive Services comes with a pre-trained model for sentiment analysis. We can use this model to enrich our data. After choosing English as the language and selecting the Comments column of the Spark table containing our emails, Synapse Studio generates code for us. It creates a new Synapse notebook containing code snippets in the PySpark (Python) language. All we have to do now is run the code cells, and Cognitive Services performs sentiment analysis on our emails, creating a new sentiment column in the results. We now have enriched data.
In this final installment of the Azure Synapse Analytics series, we have discussed how Azure Synapse Analytics enables us to provide new insight into business processes and answer business questions in a myriad of ways.
We have also discovered that Azure Synapse Analytics tightly integrates with Power BI through a linked service. This way, we can perform data analysis using Power BI without even leaving Synapse Studio, producing business intelligence to drive improvements and growth.
Lastly, we have explored Azure Synapse Analytics’ out-of-the-box machine learning capabilities with help from guided experiences for model training, model scoring, and Cognitive Services.
In previous articles, we explored how Azure Synapse Analytics enables data preparation and management and other activities by combining data integration, warehousing, and analytics in one service with plenty of integrations with other Azure tools. Azure Synapse Analytics saves time loading and preparing data for quick insight into the analytics driving our business so that you can take action ahead of your competitors.
Check out Microsoft’s Hands-on Training Series for Azure Synapse Analytics to learn more about using these integrated data tools to drive business intelligence and machine learning — right from a member of the Azure Synapse engineering team.
In this deep dive into Azure Synapse, you can learn to:
- Start your first Synapse workspace
- Build code-free ETL pipelines
- Natively connect to Power BI
- Connect and process streaming data
- Use serverless and dedicated query options