Apache Spark 2.3.0 is now available for production use on the managed big data service Azure HDInsight. Ranging from bug fixes (more than 1400 tickets were fixed in this release) to new experimental features, Apache Spark 2.3.0 brings advancements and polish to all areas of its unified data platform.
Today we're announcing updates to Apache Spark, Apache Kafka, Machine Learning Services, and Azure Data Lake Storage Gen2, and enhancements to Enterprise Security Package. These new capabilities will continue to drive savings for many customers. In addition, Microsoft is deepening its commitment to the Apache Hadoop ecosystem and has extended its partnership with Hortonworks to bring the best of Apache Hadoop and open-source big data analytics to the cloud.
We are happy to announce that HDInsight Tools for VSCode now supports argparse and accepts parameter based Pyspark Job submission. We also enabled the tools to support Spark 2.2 for PySpark author and job submission.
To provide more authentication options, Azure Toolkit for Eclipse and IntelliJ now supports integration with HDInsight clusters through Ambari for job submission, cluster resource browse and storage files navigate.
To provide more authentication options, HDInsight Tools for VSCode now can be connected to HDInsight cluster through Ambari for job submissions. You can easily link (HDInsight: Link a cluster) or unlink (HDInsight: Unlink a cluster) a normal cluster by using Ambari managed username and password, which is independent of your Azure signing process. The Ambari connection applies to Spark and Hive clusters in all the Azure environments which host HDInsight services.
We are excited to announce the general availability of the StorSimple Data Manager. This feature allows you to transform data from StorSimple format into the native format in Azure blobs or Azure Files. Once your data is transformed, you can use services like Azure Media Services, Azure Machine Learning, HDInsight, Azure Search, and more.
We are excited to announce public preview integration with Power BI direct query, which allows you to create dynamic reports based on data and metrics you already have on your Interactive Query clusters in Azure BLOB store or Azure Data Lake Store. You can now also build visualizations on your entire data set much faster with the most recent data.