The article explains the significance and features of newly added PolyBase in SQL Server and its role in managing Big Data.
PolyBase is a newly added technology that helps in accessing data from outside of the database, making it easier to run queries (using T-SQL language) and return data from Hadoop. It has brought Hadoop and SQL close and has become one of the most powerful tools. Hadoop, a framework used for processing tasks of big data, had been made publically available in 2012. The article will shed more light on the functions of PolyBase and how is it helping users with Big Data.
Handling Big data is a complicated subject and requires sound knowledge of languages. PolyBase makes is easier for users as it can be used through very common T-SQL language, and PolyBase doesn’t need additional software installs to function. The interface is simple, transparent and user doesn’t need expertise on Hadoop to operate PolyBase.
So PolyBase can make Queries on data stored in Hadoop or Azure Blob Storage. It can also import data from these destinations. PolyBase is suitable when user requires functions such as Exporting and importing of data and to create ad-hoc queries against external table. It can also be used with Business intelligence tools (BI) of Microsoft to have better data analysis.
The best feature of PolyBase is that it doesn’t need SQL Server storage as it remains on eternal storages. PolyBase can directly reach HDFS, Hadoop Distributed File System, which can store Big Data in exabytes, and return data to SQL Server. The cost effective storage is the winning feature of PolyBase.
PolyBase is efficient with managing all sorts of queries, even if they are exclusively executed. There are options like Scale-out groups which enable data transfer between Hadoop and SQL server as it also provides useful resources to operate the external data.
If the user wishes to administrate more control over the PolyBase performance, it could be done by enabling the mode of full predicate push down. PolyBase will execute a MapReduce application using YARN on Hadoop. Using this mode, long running jobs will have advantages and there would be minimum data movement.
Overall the performance is too good and cost effective, as it also allows scaling options like 2x, 4x and 8x for better processing of Azure SQL Data without any downtime.
What about Security?
Other than performance and compatibility, data security is the one of the major concerns of any organization.
PolyBase is secure and safe as its queries only read data which the logged in user has permissions to see. It also has TDE i.e. Transparent Data Encryption which forbids other users to access restricted data.
Polybase extends its support to on-premises as well as cloud-based unstructured data-storage platforms. PolyBase currently supports two main Hadoop Distributions viz. HDO (Hortonworks Data Platform) and CDH (Cloudera Distributed Hadoop).
Some of the Cloud Based Solutions include Windows Azure HDInsight, Azure Data Lake and Windows Azure Blob Storage. However PolyBase is not supporting Cloudera Encrypted Zones yet.
Despite all its sophistication, even the latest iterations of SQL Server remain vulnerable to incidents of data corruption. Hence always make it a point to keep a tool that can recover mdf database file around.
Victor Simon is a data recovery expert in DataNumen, Inc., which is the world leader in data recovery technologies, including mdb repair and sql recovery software products. For more information visit https://www.datanumen.com/