Tuesday, August 30, 2016

Free Power BI visuals

OKVis is out to enhance and extend custom visuals and tools for Power BI. And they are offering it free of charge - under the listed MIT License. When we last checked, they had six (6) custom visuals listed:

  • synoptic panel
  • smart filter
  • sparkline
  • bullet chart
  • card with states
  • candlestick

If you are using Power BI, be sure to take a look. And give a thumbs up the the sponsor of OKVis - sqlbi (http://www.sqlbi.com/). You can find both the visuals and the tools here:  http://okviz.com/.

ROLAP - now a viable option

The time for ROLAP has arrived:


Most operational data stores (ODS) and data warehouses (DW) reside on a rowstore relational database. It's a great tool for data organization, but rowstores and their related indexes don't mesh with the demanding needs of analytics, business intelligence, reporting and self-service oriented cubes. So it was not uncommon to push selected data down from the data warehouse into customized MOLAP cubes. MOLAP cubes offered business analysts great self-service data mobility, usually using a connected EXCEL pivot table, or more expensive tools such as Tableau.

Ideal for the analyst, but pushing data down from a data warehouse or operational data store to a MOLAP cube was lots of added work, risk and most importantly, loss of data timeliness. Once the data warehouse or ODS was updated, yet another process had to be started to update and process the MOLAP cube. Once a day updates were not uncommon for MOLAP cubes. Fast data it was not! And if you have reviewed the literature on large MOLAP cubes, you've found that they can get unwieldy.

MDX was another road block. Using a BI tool such as Excel, analysts were shielded from the sometimes complex MDX commands. But problems arose when analysts wanted to do custom queries against the MOLAP cube. Queries that would be far easier against a traditional data warehouse using SQL.

Columnar databases to the rescue.


With its release of SQL Server 2016, Microsoft has a full featured relational database that can be run as a columnar database. By simply upgrading to SQL Server 2016, and converting your existing rowstore indexes into a columnstore, your data warehouse or ODS can now support a ROLAP cube. Efficient and fast. And no more MOLAP processing.  Here at Realized Design, we have done some limited testing using ROLAP, and have been pleased with the results. For the relational backend, we have used both SQL Server 2016 and the specialized analytic/columnar database EXASOL. In both cases, ROLAP proved comparable to a MOLAP design. But without all the extra processing. And thus far more elegant. You can find a detailed analysis of our work so far with EXASOL here: EXASOL review at RealizedDesign.

But don't just take our word for it. Here are two additional links where individuals have successfully used ROLAP cubes against very large databases. And they have been please with the results.

In 2014, Karen Gulati did exploratory ROLAP work using SQL Servers new column store indexes. See: Harnessing the Power of both worlds.

Also in 2014, Hilmar Buchta did some work with SSAS ROLAP against an MSFT Parallel Data Warehouse  See:  Parallel Data Warehouse (PDW) and ROLAP - Hilmar Buchta. 


So, just when you thought SSAS multi-dimensional was dead, its back!

Next Steps:  Build an SSAS ROLAP Cube using EXASOL

Monday, August 29, 2016

PolyBase Error ...app.MRAppMaster not found

On the HortonWorks community site, we noticed an unusual error relating to the PolyBase pushdown capability and the resulting error:


Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
 

We have not seen this error, but wanted to include it in our listing of identified PolyBase errors. The user later noted that their solution required a change to the mapred-site.xml file, as follows:

       
 <property> <name>mapreduce.app-submission.cross-platform</name> <value>True</value> </property>
       

You can find the full user thread here:  Yarn ClassPath Value for Polybase Pushdown

Wednesday, August 17, 2016

EnableRolapDistinctCountOnDataSource

We are digging into SSAS ROLAP, and based on our research, the setting that enables the database engine to manage distinct counts should be updated to 1. To adjust this feature, you have to modify the ini file and then restart the SSAS engine.

Why we made the change


Our research found two key postings, both respected SQL Server professionals:

In 2014, Karen Gulati did exploratory ROLAP work using SQL Servers new column store indexes. One key finding was to update the EnableRolapDistinctCountOnDataSource setting to 1. See: Harnessing the Power of both worlds.

Also in 2014, Hilmar Buchta did some work with SSAS ROLAP against an MSFT Parallel Data Warehouse, and determined to update the setting.
See:  Parallel Data Warehouse (PDW) and ROLAP - Hilmar Buchta.

Summary

  1. Locate msmdsrv.ini
  2. Make a backup of the original, unaltered ini file. (backups are our friend)
  3. Locate <EnableRolapDistinctCountOnDataSource>
  4. Change the setting to 1
  5. Save the file and restart SSAS

Details

msmdsrv.ini

Locate msmdsrv.ini and make a backup. Typically it is located in the folder: 
C:\Program Files\Microsoft SQL Server\MSASxx.MSSQLSERVER\OLAP\Config
(where xx is the version)

Make the Change

Locate the entry for EnableRolapDistinctCountOnDataSource, make the change and save the file.
With the change made, restart the SSAS service.

Before
       
    <EnableRolapDistinctCountOnDataSource>0</EnableRolapDistinctCountOnDataSource>

After
       
   <EnableRolapDistinctCountOnDataSource>1</EnableRolapDistinctCountOnDataSource>

FIX: SSAS 2012 to back-end SQL Server 2014

We are not using this configuration, but since we did stumble upon it, we just wanted to make a note of it in case any readers are using this configuration.



Tuesday, August 16, 2016

Free Power BI eBook Manual

Microsoft Press is offering a free ebook on Power BI. And at 176 pages, it is not a lightweight book. Starting off with an introduction that covers the basics, it goes deeper into how to share the dashboard, refreshing the data, using Power BI Desktop and building a data model.  It called Introducing Microsoft Power BI and its worth downloading and using as a daily reference. You can find it here:  Free ebook: Introducing Microsoft Power BI.

Monday, August 15, 2016

ROLAP Polling Query - use a change tracking table

Setting up the polling query for our real-time ROLAP cube against an EXASOL database, we used an easy, but very bad practice approach.  We set our polling query directly against a fact table. Nice and easy for testing and proof of concept. Production? don't consider using it.

Polling a count against a small or medium size fact table might won't create much of a problem. When the dataset is huge it might take more resources.  So, when the data set is really large, and our goal is very near real time, why not poll a very small table.

Data Change Table

Let's call it TrackDataChange.  A minimalist version will have just a few columns:  the source system that loaded the data, the load date and a primary key ID. The only difficult part is you have to configure your ETL process(es) to insert a single row to your new TrackDataChange table. The new record simply marks that the ETL process has completed.


With the TrackDataChange table in place, and your ETL processes configured to insert a row after inserting new fact table table, your polling query will look like this:

        SELECT COUNT(*) FROM DWTEST.TRACKDATACHANGE;

You'll probably want more than just the basic columns. The key point is to run SSAS's polling query against an independent table dedicated to tracking fact table changes.

Pros

the polling query avoids data tables used for data analysis
the TrackDataChange table is narrow and holds few records
the TrackDataChange table is quick to query
the TrackDataChange table is quick to insert into

Cons

the ETL process is marginally more complex

Thursday, August 11, 2016

Database Lifecycle Management

On its website, Redgate has posted an overview on Database Lifecycle Management.  It's a real problem, and one that has note been completely solved.  Application code can be ripped out, and replaced wholesale.  Not so with the database. While there are bits of what we might call code, the database is really the data, with a small bit of code. But once setup, it's live.  There are no cut and paste approaches.  Changes have to be well understood, and implemented with caution.  Redgate's whitepaper can be found here:  Solving the database deployment problem with DLM.

They also have a free DLM tool, called DLM Dashboard. while it is part of the full SQL Toolbelt, you can download just the DLM Dashboard for free.  Redgate is a great operation, putting a lot of effort into supporting SQL Server, including its sponsorship of http://www.sqlservercentral.com/.
You can find the DLM Dashboard here:  http://www.red-gate.com/products/dlm/dlm-dashboard/.

Book: R Programming for Data Science

Author and teacher Roger D. Peng has published R Programming for Data Science. You can find it here at Leanpub/rprogramming.  Mr. Peng has been using and teaching R since 1998 (almost 20 years) and his book provides not just a good book on R, but also thoughtful insight into just why R works the way it does, and how to take advantage of R.

Videos

OK, need another reason to consider getting this book?  All of the sections and chapters have embedded links to YouTube videos. 

Monday, August 8, 2016

Visio Shapes (vssx) on TechNet

You can find a long list of Visio shapes, symbols and icons on TechNet for Visio 2016/2013. It's called the Integration Stencils Pack and you can find it here:  Microsoft Integration Stencils Pack for Visio 2016/2013.

Using Power BI to query EXASOL via ROLAP

You're using Power BI against an EXASOL analytic database. Unfortunately, the only connection currently supported uses ODBC (as of August, 2106). ODBC is a stable connector, but is generally considered to be slow. A native connector would be ideal.

Update July 2017:  GitHub has a Power BI Connector to Exasol project. it is still in beta, and it is NOT an official Exasol supported project. But, it was created by Thomas Bestfleisch, an Exasol Solution Engineer. You can find it here: GitHub - Power BI to EXASOL project.

Power BI does have native connectors to other systems, so might an EXASOL connector be in the pipeline?  Going to the Power BI Ideas section, where new features are requested and voted on, you do find a posted suggestion to create an EXASOL Direct Query Connector. (Power BI ideas - EXASOL Direct Query Connector). Unfortunately, it only has a nominal number of votes. Fortunately, you do have an alternative.

Virtual Data Mart - using ROLAP




EXASOL has an OLAP Connector for SSAS that allows Microsoft shops to create virtual data marts. Using SSAS's real-time ROLAP protocol, you can setup a virtual data mart against an EXASOL analytic database. SSAS's ROLAP provides an intermediate bridge between Power BI and your EXASOL database. This has multiple benefits

  • Curate analytic data into targeted virtual analytic cubes (or single if need be)
  • Add a layer of security, when needed
  • Push processing down to the analytics database engine

We've already taken a look at SSAS ROLAP cubes using EXASOL.  You can find the details here:  Build an SSAS ROLAP Cube using EXASOL.




Friday, August 5, 2016

Partition Settings for our SSAS ROLAP cube using EXASOL

Here are the partition settings we used to set up our SSAS Real-time ROLAP cube using EXASOL as the relational database backend.  Read the full report here:  Build an SSAS ROLAP Cube using EXASOL.

1:  Using Standard settings, adjust the slider to Real-time ROLAP.


2: Under Options, on the General tab, we used the default settings, as listed below:


3: Under Options, on the Notification tab, we made the following adjustments:

  • Scheduled polling - select this option
  • Polling interval     - we set this to one (1) minute.
    You are after a balance between real time, and not overloading the database with excessive queries - there is no correct answer. And, this can be changed easily. See below.
  • Polling query - here we simply used a count(*).
    For our example it was:    SELECT COUNT(*) FROM DWTEST.FACTTESTSALES;
    against the fact table on the relational database. Note:  for production, run the polling query against a track data change table - see ROLAP Polling Query - use a change tracking table.




Making Live Changes to our ROLAP Partition

Once in production, you may decided to change the polling interval, or some other adjustment to your ROLAP partition. You can do that within SQL Server Management Studio.  ( make sure you are running the most current version of SSMS, which is a standalone tool - SSMS - Now a stand-alone, separate tool from SQL Server)

To make changes to the partition, follow these steps:

In SQL Server Management Studio
  1. connect to the Analysis Server
  2. locate the target partition
  3. right click - properties
  4. Proactive Caching - select on the left menu bar
  5. Options Button - select
You are ready to make changes to your partition.

 1:  Use SSMS to open the partition properties:


2: Select the Proactive Caching section, to get access to your ROLAP settings.



See our full report on building an SSAS ROLAP cube with EXASOL here:
Build an SSAS ROLAP Cube using EXASOL.





Thursday, August 4, 2016

Build an SSAS ROLAP Cube using EXASOL

For this post, we are building a real-time ROLAP cube using EXASOL.  For us, the concept of ROLAP, and the goal of a real time multi-dimensional data warehouse provided an ideal design. Use an analytics &  columnar based relational database for both the structured data warehouse and as the back end to our ROLAP cube.

As technology advances, real-time, or near real-time, analysis is becoming increasingly important. And while Hadoop, Big Data, and Data Science have captured most of the attention - cubes, or more specifically, multi-dimensional cubes are still a great tool. Upgrade the cube to real-time using ROLAP and it just gets better. On their website, EXASOL indicates support for use as the database backend to an Analysis Services (SSAS) Real-time ROLAP cube. We decided to take them up on it.

Overall, the process was straight forward.  First, implement a traditional star schema data warehouse in EXASOL. Load the data warehouse with SSIS, plus setup some test incremental data adds. Next, use SQL Server Data Tools to design and deploy a ROLAP cube. And finally, use EXCEL to connect with and test out our cube.

What we are looking for in this process:


  • Ease of data population, updates and data maintenance- using SSIS
  • Design and configuration of SSAS ROLAP cube using EXASOL
  • Usability - the Analyst experience


Data population, updates and data maintenance

Our primary focus is the ability to use an ETL tool to easily add supplemental records to the fact table. To simulate a near real-time data warehouse environment. And once added to the database, are the new records visible in our ROLAP cube? 

EXASOL supports both the legacy ODBC protocol as well as an ADO.NET connection. With both of these protocols, we were able to easily create and run SSIS ETL packages from our source data into the EXASOL data warehouse. For this exercise, we used the SQL Server Data Tools 2016 for Visual Studio 2015, without any problems. The version of the EXASOL ADO.NET Data Provider was for version 5.0.15.

As expected, using SSIS data loading preformed well, giving us the confidence that as a data warehouse, EXASOL could be easily maintained from a data management perspective. See - EXASOL - Using SSIS to load our Data Warehouse.

 Design and configuration of ROLAP cubes


Here we had our first speed bump.  The current version of SSDT 2016 for Visual Studio 2015 had multiple issues and would not properly connect to EXASOL  to create a new SSAS cube - so we pulled up an older version (1).  Using SQL Server Data Tools 2014 for Visual Studio 2013 - for an analysis services multi-dimensional project, we were able to easily setup a new Data Source connection to our EXASOL data warehouse using the ADO.NET data provider. Once connected, we could easily create a new Data Source View.

With our data source view in place and adjusted to our satisfaction, we could start cube design. We were pleased that the overall process flowed seamlessly. With no special adjustments required to support EXASOL. Since ours was a ROLAP cube, rather than a more common MOLAP, partitions required more attention.

(1) I have been informed by Mathias Golombek, CTO, that they are working on an updated ADO.NET driver to resolve these issues.

Real-time ROLAP, with Options

In Partition Storage Settings, we chose the standard Real-time ROLAP setting. Our only adjustment beyond this was to set up the Scheduled Polling settings in the Notifications section. Here, the polling interval was set to one (1) minute, with the polling query set to do a count against our fact table. One minute is good for near real time.  If needed, you can shorten the polling interval granularity to a matter of seconds. With our interval of one minute, our scheduled polling worked as expected. After running an incremental fact table update, the latest records were visible in the ROLAP cube after the interval wait time. For more detail on the partition settings used see: Partition Settings for our SSAS ROLAP cube using EXASOL.

Usability

Every analyst I know uses Excel. And as a free form tool, it's great. It also connects to SSAS cubes, so it makes a good tool to test a user's experience. Sure Tableau might be better, but at a very high cost.

First off, the overall analyst experience was good and really no different than if the tool was connected to an SSAS MOLAP cube. For this test we are working with EXASOL's community edition VM, which limits our configuration to a single node.  Our single node VM configuration has four (4) virtual 3.07GHz Xeon processors and 15 GB of memory running on a dedicated spindle. Our fact table has slightly more than 11 million records, plus five dimension tables.  While 11 million records is on the smaller side for a data warehouse, it's not an unreasonable size for medium sized companies and divisions.  I know quite a few companies that would love to have over a million sales transactions a year.

Warming the Indexes

Looking over the user manual, there are no discussions on setting up indexes. And that is by design. EXASOL manages all indexes and optimizations internally. As users query the database, the optimizer determines what indexes and statistics it needs. And we found this out with our single node VM. As we simulated an analyst querying the database via Excel, the response improved rapidly. Where a first query might take several seconds, the follow on queries dropped to milliseconds. Essentially, user demand drove the optimizer.

But I need my hands on the Optimizer control

I understand. DBA's need to have the ability to make adjustments. But since most of you work with SQL Server, you already benefit from some internal optimization features. That's why there are fewer controls compared to Oracle. It's not that they don't exist. It's just that behind the scenes, a few very bright database guru's worked out algorithms that respond in real time. That's progress. A 1960's Jaguar E-type needs a full time mechanic. It is a great museum piece, but your daily car is probably more modern.  With both cars and databases, less is often more. 

Summary

With advancements in columnar relational databases, ROLAP should be back on the list of tools that support day to day operational business intelligence, analytics and reporting.



Tuesday, August 2, 2016

EXASOL - Using SSIS to load our Data Warehouse

We're using SSIS to populate an EXASOL data warehouse. In our case, the SSIS design is being done using the 2016 version of SQL Server Data Tools - for Visual Studio 2015 ( download the latest SSDT's here: Download SQL Server Data Tools (SSDT)).

Install the EXASOL ADO.NET Driver

If you have not yet installed the EXASOL ADO.NET driver, download and install it. Find the Download Section here:  EXASOL Download Section. These are ordered by version, so make a note of the version you just downloaded. We are using version 5.0.15.

ODBC:
EXASOL also has an ODBC driver, both 32 bit and 64 bit. You can opt to use this, rather than ADO.NET. Overall, the process is similar.

Data Types

The data types in EXASOL have distinct differences from those in SQL Server - especially dates. So before you get started, you'll want to map out both the source and target data types, along with any transformations required. For a comparison of EXASOL and SQL Server data types, along with methods on transforming data see: Load EXASOL from SQL Server: Data types & Conversion.

In general, we found that it's best to use a view in SQL Server, or a SQL Command as part of the OLE DB source task. You can also use a Data Conversion Task to make your conversions. However, we believe using the database engine is more efficient for these types of transformations.

Data Flow

Once you have your data sourced, converted, transformed and setup, the final step is loading data into EXASOL. First, create a new connection to EXASOL using the Connection Manager and the ADO.NET driver you just installed. It will look something like this:



ADO.NET Destination

Setting up an ADO.NET destination task to EXASOL is fairly straight forward, with just a few minor considerations. On the Connection Manager page of the ADO.NET Destination Editor, chose the desired connection. In our example, its 192.168.1.100:8563.sys.

No tables or views could be loaded

At least for now, the drop down combo box does not display a list of tables. No worries, just type in the name of the table, ideally using both the schema and the table name:  <schema>.<tablename>. The example below has TESTDATA.TESTDATA.




Preview

Use the Preview button to test the connection.  You should see a preview of your target table, along with a few sample records.


Mappings

Use the mappings section to verify or update the mappings, Identical names are mapping automatically. All others have to be done manually.


That's it. If you have any loading errors, focus on data types, and especially dates. We also found that tinyint required an explicit data type conversion.  See: Load EXASOL from SQL Server: Data types & Conversion.

Next Steps:  ROLAP - now a viable option
                     Build an SSAS ROLAP Cube using EXASOL