Wednesday, October 28, 2015

CTP 3.0 for SQL Server 2016 Availability

Today (Oct 28, 2015,  Microsoft corporate VP Joseph Sirosh announced the latest update for the SQL Server 2016 Community Technology Preview (CTP) 3.0. ( See SQL Server 2016: Everything built-in).  The download has been released week during the PASS Summit - and is now available. You can find it here - SQL Server 2016 CTP Download.

On March 7, 2016, the first Release Candidate (RC0) is available.  You can find it here: SQL Server 2016 Release Candidate 0.

Tuesday, October 27, 2015

Setting up Hyper-V Server in a Workgroup

I wanted to setup and do some testing with the Hyper-V Server, which has a published unlimited trial period. So effectively, it is free. Not a bad deal.  

Unfortunately, Hyper-V Server is a command line server - so no nice, easy to use Hyper-V tools like you get with a full server installation. Fortunately, you can connect to the server remotely, using the Hyper-V tools that are available in Windows 10 (and perhaps Windows 8 - but I do not know). So my goal was to use the Hyper-V tools in Windows 10 as the client to connect to and manage my free/unlimited Hyper-V Server installation setup in a Workgroup (and not a domain). Unfortunately, I found that Hyper-V Server really wants to be connected to a domain. So to connect, I needed some more steps.

I did find an extremely useful post on how to setup Hyper-V Serve in a work group. You can find it here:  Setting upWindows Hyper-V Server 2012 R2 in a workgroup.  [http://nearlydeaf.com/?p=1149]. I only had to add one more step (#8 below) to get my Hyper-V tools on Windows 10 to connect. 

I followed it pretty closely, but here is a summary of the steps I followed:


  1. Name the Hyper-V Server with a useful name such as:                       hypervserver
  2. Suggest that you adjust the Hyper-V server to use a static IP: such as 192.168.1.17
  3. Run:  WinRM quickconfig   on the Hyper-V Server
  4. Run:  WinRM on the Hyper-V Server
     the extended steps listed below

        winrm set winrm/config/client/auth '@{CredSSP="true"}'
        winrm set winrm/config/service/auth '@{CredSSP="true"}'
        winrm set winrm/config/service/auth '@{Basic="true"}'
  5. Run:  netsh advfirewall set allprofiles state off   on the Hyper-V Server. Update the hosts file Windows 10.  Start notepad as an administrator and open the hosts file. Add an entry that points to the Hyper-V Server. 
    Such as:
        192.168.1.17  hypervserver
        find the file here:
      windows\system32\drives\etc)
  6. Run:  On Windows 10: In Powershell  (run as administrator) 

    Set-Item wsman:\localhost\Client\TrustedHosts hypervserver -Concatenate -Force

    - did not work the first time, so ran winrm quickconfig, then re-ran the full command:  'hypervserver' is the name I gave to server, same as in hosts
  7. Run:  On Windows 10: In Powershell  (run as administrator):

    enable-wsmancredssp -role client -delgatecomputer hyperv2012svr
  8. This final set was not listed on the post I found, but was required before I could connect to my Hyper-V Server from Windows 10:
           Run:  on windows 10:
                winrm s winrm/config/client  '@{TrustedHosts=”RemoteComputer”}'




PolyBase Configuration for Cloudera

Cloudera is perhaps the biggest player in Hadoop, so it makes sense that we understand what's needed to connect SQL Server 2016 to Cloudera.  To get started, we downloaded the latestvirtual server image from Cloudera, which for our purposes was 5.4.2.0. You can get a Cloudera QuickStart VM here.  

Once we had Cloudera up and running, we could move on to the next step of configuring our server for Cloudera. The PolyBase documentation does not specifically list out our version of Cloudera 5.4.2.0, so we chose the next best listing - which was Option 6: Cloudera 5.1 on Linux.  With this, we need to update our server configuration accordingly:

sp_configure 'hadoop connectivity', 6;
reconfigure
 
One thing to keep in mind is that PolyBase can only connect to one Hadoop installation at atime.  For a more detailed listing of how to setup PolyBase in SQL Server 2016 see our post on:  Setting Up PolyBase in SQL Server 2016. 


Hadoop YARN - Locating the yarn.application.classpath


Cloudera 5.4.2.0 is a YARN based Hadoop server, so we'll also need to locate the yarn.application.classpath value in Hadoop and then update the SQL Server 2016 with the  yarn.application.classpath value from Hadoop. Fortunately, the Cloudera VM starts up with a typical CentoOS UI, so locating the 'yarn-site.xml'  file is a lot easier than in Hortonworks. And you only need to do it once.

In Cloudera, the easiest way to find the file using the UI tools is with the File Browser. At the top left of the UI locate the Applications tab and navigate to Applications | System Tools | File Browser, and open the File Browser. 




Next, in File Browser using the search "binoculars" locate 'yarn-site.xml'.   Then open the yarn-site.xml file that is in the /etc/hadoop/conf.empty folder. 




Open the file and locate the yarn-application.classpath.  




For my Cloudera installation the yarn.application.classpath is:

       <property>
         <description>Classpath for typical applications.</description>
          <name>yarn.application.classpath</name>
          <value>
             $HADOOP_CONF_DIR,
             $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
             $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
             $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
             $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
          </value>
 </property>

Use this value to update the complementary yarn-site.xml file on your SQL Server 2016 installation. Typically, you can find it here:

         C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\Binn\Polybase\Hadoop\conf\

Our post on Setting up PolyBase for YARN in SQL Server 2016 has additional details on configuring SQL Server for a YARN server.  Once you have the server configured for Cloudera and YARN, you should be all set to connect and use Hadoop via the PolyBase engine. 

See our posts on:




Monday, October 19, 2015

PolyBase - Inserts, Updates now supported (DML)

With SQL Server 2016 CTP2.x PolyBase connected to Hadoop, it's a great way to access all the data stored in your Hadoop installation using familiar T-SQL commands.  And if you have been reading some of the latest comments on NoSQL, your company may have the urge to start using Hadoop as your sole data warehouse and repository. All you have to do is migrate your data warehouse data, and with the PolyBase connection, it should be fairly simple - right?

[ Take me directly to PolyBase - Insert data into a Hadoop Hue Directory, where we outline just how to do it.]

For the early CTP releases INSERT was not supported. I have now tested it ( see PolyBase - Insert data into a Hadoop Hue Directory ), and you can now INSERT INTO an external table via PolyBase. Once you have an external table setup the next step is to run an INSERT command. 

Note: The following was written before INSERT was fully supported.

Perhaps something like this:

       insert into [dbo].InsertTest_1 ( firstcol, secondcol )
       values ('firstcolvalue1','secondcolvalue1');

In the earlier CTP versions, after running the insert command, you were greeted with an error message similar to the following:

       Msg 46519, Level 16, State 16, Line 68
       DML Operations are not supported with external tables.

With SQL Server 2016 released, we decided to cycle back and test the INSERT function. To keep it simple, we used the following command against a three column, all nvarchar table in Hadoop:


   insert into [dbo].AWDW_CSV_3String ( FullName, AddressLine1, DataSrc )
   values ( 'Ziggy Stardust','1 Moonshot Drive','Manual' );

The message returned was:

   Msg 46914, Level 16, State 1, Line 61
   INSERT into external table is disabled. Turn on the configuration
   option 'allow polybase export' to enable.


OK, update the configuration option.

   sp_configure 'allow polybase export', 1;
   reconfigure

Close, but still have issues.  PolyBase denies the existence of my table:

   Msg 7320, Level 16, State 102, Line 64
   Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11"
   for linked server "SQLNCLI11". EXTERNAL TABLE access failed because the
   specified path name 'hdfs://192.168.1.120:8020/user/hue/AWDW_CSV_3String.csv'
   does not exist. Enter a valid path and try again.



Switching to a Hive table, we were successful in inserting data into our table. But not initially.
The full file address for our test Hive table was

     LOCATION='/apps/hive/warehouse/sample_08/sample_08'

This did not work. Using only the relative Hive table location (below) did allow us to insert new data:


     LOCATION='/apps/hive/warehouse/sample_08'



Ok, now take me directly to PolyBase - Insert data into a Hadoop Hue Directory, where we outline just how to do it.




Tuesday, October 13, 2015

PolyBase, Performance and Statistics

Performance is always a consideration with databases, and using PolyBase is no different.  DBA's will create indexes, setup specialized indexed views and create and update statistics.  Unfortunately, external tables do not have most of these options. 

                Indexes?                  - No  
                Indexed views?       - No
                Statistics?                - Yes


Only statistics are the current option. Using a test dataset in a Hortonworks 2.0 Hadoop system, I was able to increase performance by about 10%. This was a small dataset, so a large increase was not expected. In the future I'll compare performance on a larger dataset.

PolyBase and Views

PolyBase and Views

Views can be created against PolyBase external tables, using T-SQL. Unfortunately, an indexed view is not supported. Once you include the option "with SCHEMABINDING"  in your Create View, you  get an error:

         Msg 46518, Level 16, State 9, Procedure PolyBaseTableView, Line 115
         The option 'SCHEMABINDING' is not supported with external tables.


I expect there will be situations where being able to create an indexed view against an external PolyBase table would be ideal. Conversely, the potential size of datasets loaded into Hadoop or Azure could be so large, that the process of creating a local index on such a large dataset could exceed the capacity of the database server.  Still, I wish we had the option - and let the DBA make the determination.

Monday, October 5, 2015

Microsoft Overhauls Reporting Services – finally!

Microsoft Overhauls Reporting Services – finally!

Reporting Services has been one of the long disappointments with SQL Server. When it was first introduced back in 2003, it provided a much needed reporting platform to go along with SQL Server. But after the updated release as part of SQL Server 2005, the Reporting Services platform has languished.  Stuck in a world that still used an old version of Internet Explorer. I remember all the hacks I had to do just to get Reporting Services to work with a browser other than IE. It was ugly.

With the latest community technology preview of SQL Server 2016, Microsoft has radically overhauled Reporting Services. Support for HTML5 - check. Support for mobile reports - check, using the nicely done Power BI mobile app - no need to create a second. Support for Power BI Desktop .pbix reports - promised. According to the SSRS Blog roadmap, and the Power BI ideas forum, SSRS will gain the ability to ability to render Power BI Desktop reports. That would be great.

The Reporting Services stand-alone development tool has also undergone a face lift. It now has the look and feel of Power BI 2.0. Allowing you to directly publish to a reporting server or just create a dataset to be used by the new Mobile Report Builder.  Visual Studio not required. Still, the revised SSRS Report Builder is much like the version released years ago. (Robert Sheldon's write-up of how to create a basic report in the previous Report Builder 3.0  is still useful ) But, the most significant change is the output. SSRS 2016 reports support HTML5, allowing support across all modern browsers.

Printing with reporting services was another drawback. The report viewer toolbar required ActiveX. It has finally been ditched! And now supports PDF.

So, the updated stand-alone report builder, clearly targeted at analysts and power users, allows staff to create reports, and then publish them to a report server.  All, without the need for an IT developer. And HTML5 support allows those reports to be viewed by just about any modern browser. You can find the current listing of enhancements to Reporting Services at the book online site:  What's New in Reporting Services.

Links:



New, stand-alone Reporting Services Report Builder, CTP 2.4




Friday, October 2, 2015

PolyBase – error connecting to Hadoop file

PolyBase – error connecting to Hadoop file

At least with the current SQL Server 2016 CTP 2.3, PolyBase errors out trying to connect to certain text files in Hortonworks Hadoop. You typically get an error message similar to the following:

Update - June 14, 2016:  MSFT published a Polybase focused white paper for the Azure Data Warehouse, which supports Polybase.  This out lined - at least for Azure DW, the file types supported by Polybase.  For Azure DW, Polybase only supports UTF-8 files. This could explain some of the failed load issues experienced.  


Msg 107090, Level 16, State 1, Line 79
Query aborted-- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.
OLE DB provider "SQLNCLI11" for linked server "(null)" returned message "Unspecified error".
Msg 7421, Level 16, State 2, Line 79
Cannot fetch the rowset from OLE DB provider "SQLNCLI11" for linked server "(null)". 

I’ve investigated what might be causing these errors, trying to discover if there are certain traits that cause the error.



Could not find or load main class path.....

Dennes found on Hortonworks a possible solution for the following error.

         
      Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster  
 

This may be beneficial for other issues, such as Could not obtain block.
       
Answer by Montrial Harrell · Jun 11 at 04:19 AM
Got it!! I added the below property to the mapred-site.xml file and the query ran successfully.

<property> <name>mapreduce.app-submission.cross-platform</name> <value>True</value> </property>   
 

(this was provided by Montrial Harrell at the Yarn ClassPath Value for Polybase Pushdown - Hortonworks Community Connection    -- review the entire post, as there may be other suggestons that might be of assistance.)

...cannot be cast...

If you're error message includes the phrase:  "...cannot be cast..." see page PolyBase wants a strongly typed ORC Hive Table.

Could not obtain block:

Dennes found that the error message Could not obtain block:  "...usually means the port 50010 (data node port) is not responding to SQL Server."  See also adjustments noted above for "Could not find or load main class"

EOF does not match expected format:

From my investigation, if EOF (end of file) has a different format from that declared in the CREATE EXTERNAL TABLE command, you will get an error. If, for example, the file has a trailing line with row count, or other meta data.

Data Type Mismatch

If the declared data type does not match with the incoming data type, you’ll get an error. In setting up a new table, it might be useful to first declare all columns as varchar or nvarchar to minimize any potential errors.  Once you establish a working connection, create a new table with the correctly typed columns. Loading dates into PolyBase has a very specific protocol, and not all date formats are supported. 

Data Size insufficient

If the size of the declared data type is not sufficiently large for incoming data type, you’ll get an error. In setting up a new table, it might be useful to first declare all columns as long varchar or nvarchar to minimize any potential errors. An example might be where you declare the column to be varchar(10), but the maximum size found exceeds 10.  No auto-truncate.

Line Delimiter

At least so far, I’ve been able to connect to files with the three ‘standard’ approaches to establishing the EOL: 
1)  CR/LF   0D 0A     (DOS/Windows)
2)  LF                 0A    (UNIX)
3)  CR        0D           (Mac)

File Format

The UTF-8 format seems to be the most common format, PolyBase works with the UTF-8 format. I have had trouble with files in the older ANSI/ASCII format.  My single attempt with UTF-16 did not work, but that may not be due to the UTF-16 format.  As noted above, Azure Polybase only supports UTF-8.

Other – still to be discovered problems


Several Hadoop files I have are not connecting to PolyBase, and I have not resolved what the problem. As I get them resolved, I’ll add those discoveries here.