Friday, December 23, 2016

A simple overview - random forests, decision trees, regression, oh my!

Looking for an easy summary overview for decision trees, regressions, support vectors, and bayes? Well, we found on at iocentral.io. You can find it at:  Want to know how to choose Machine Learning algorithm?  We've replicated the core parts below:


Decision Trees: Decision tree output is very easy to understand even for people from non-analytical background. It does not require any statistical knowledge to read and interpret them. Fastest way to identify most significant variables and relation between two or more variables. Decision Trees are excellent tools for helping you to choose between several courses of action. Most popular decision trees are CART, CHAID, and C4.5 etc.
In general, decision trees can be used in real-world applications such as:
·        Investment decisions
·        Customer churn
·        Banks loan defaulters
·        Build vs Buy decisions
·        Company mergers decisions
·        Sales lead qualifications
 
Logistic Regression: Logistic regression is a powerful statistical way of modeling a binomial outcome with one or more explanatory variables. It measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution.
In general, regressions can be used in real-world applications such as:
·        Predicting the Customer Churn
·        Credit Scoring & Fraud Detection
·        Measuring the effectiveness of marketing campaigns
 
Support Vector Machines: Support Vector Machine (SVM) is a supervised machine learning technique that is widely used in pattern recognition and classification problems - when your data has exactly two classes.
In general, SVM can be used in real-world applications such as:
·        detecting persons with common diseases such as diabetes
·        hand-written character recognition
·        text categorization – news articles by topics
·        stock market price prediction
 
Naive Bayes: It is a classification technique based on Bayes’ theorem and very easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods. Naive Bayes is also a good choice when CPU and memory resources are a limiting factor
In general, Naive Bayes can be used in real-world applications such as:
·        Sentiment analysis and text classification
·        Recommendation systems like Netflix, Amazon
·        To mark an email as spam or not spam
·        Facebook like face recognition
 
Apriori: This algorithm generates association rules from a given data set. Association rule implies that if an item A occurs, then item B also occurs with a certain probability.
In general, Apriori can be used in real-world applications such as:
·        Market basket analysis like amazon - products purchased together
·        Auto complete functionality like Google to provide words which come together
·        Identify Drugs and their effects on patients
 
Random Forest: is an ensemble of decision trees. It can solve both regression and classification problems with large data sets. It also helps identify most significant variables from thousands of input variables.
In general, Random Forest can be used in real-world applications such as:
·        Predict patients for high risks
·        Predict parts failures in manufacturing
·        Predict loan defaulters
The most powerful form of machine learning being used today, is called “Deep Learning”.
In today’s Digital Transformation age, most businesses will tap into machine learning algorithms for their operational and customer-facing functions.
Source: http://www.iotcentral.io/blog/want-to-know-how-to-choose-machine-learning-algorithm

Tuesday, December 20, 2016

What's new in SQL Server 2016

James Serra posted a SlideShare covering what's new in SQL Server 2016, including the recently released SP1.  SP1, BTW, opened up just about all the features to all versions of SQL Server.

If you like slide shows that outline the latest with SQL Server, take a look at the SlideShare on LinkedIn.  You can find it here:

What's new in SQL Server 2016 - James Serra

Monday, December 19, 2016

Index usage stats for SQL Server

Need information on if your indexes are being used?  Greg Larsen has posted a query utilizing the DMV sys.dm_db_index_usage_stats.  You can find it here:  Which Indexes are not Used? - Greg Larsen

SQL Server v.Next CTP 1.1 released

Microsoft released the first update to the Linux v.Next version of SQL Server, officially lised as SQL Server v.Next Community Technology Preview (CTP) 1.1. From the announcement, the key database engine enhancements include:


Key SQL Server v.Next CTP 1.1 database engine enhancements include:

  • Language and performance enhancements to natively compiled T-SQL modules, including support for OPENJSON, FOR JSON, JSON built ins as well as memory-optimized tables support for computed columns.
  • Improved the performance of updates to non-clustered columnstore indexes in the case when the row is in the delta store. 
  • Batch mode queries now support "memory grant feedback loops,” which learn from memory used during query execution and adjusts on subsequent query executions; this can allow more queries to run on systems that are otherwise blocking on memory.
  • New T-SQL language features:
  • Introducing three new string functions: TRIM, CONCAT_WS, and TRANSLATE
  • BULK IMPORT supports CSV format and Azure Blob Storage as file source
  • STRING_AGG supports WITHIN GROUP (ORDER BY)s
  • Most of the improvements in SQL Server 2016 Service Pack 1 are now available in SQL Server on Linux, with the exception of a few features such as Polybase 
Source:  Microsoft community update email: Dec 19, 2016

How to Update -  See:    Update SQL Server on RHEL - CentOS

Update SQL Server on RHEL - CentOS

Sourced from the Microsoft support documents:

Notes:  


1) No restart/reboot after running the update command
2) Microsoft's posted command uses yum update, not yum upgrade.


    "In order to upgrade the mssql-server package, execute the following command:

   sudo yum update mssql-server

These commands will download the newest package and replace the binaries located under /opt/mssql/. The user generated databases and system databases will not be affected by this operation."


Source:

Monday, December 12, 2016

SSMS vNext and "Cannot connect to WMI provider"

Problem:  


After installing SQL Server Management Studio for vNext, the Configuration Manager no longer opens, with a message similar to the following:


             Cannot connect to WMI provider. You do not have permission or the server is                        unreachable. Note that you can only manage SQL Server 2005 and later servers with SQL Server Configuration Manager.
Invalid namespace [0x8004100e]



Steps to correct:



  1. Search for the file:  sqlmgmproviderxpsp2up.mof
    You will probably find it in the folder:

       C:\Program Files (x86)\Microsoft SQL Server\140\Shared

  1. Run the following command, adjusted using the file location from the search above.

    C:\WINDOWS\system32>mofcomp "%programfiles(x86)%\Microsoft SQL Server\140\Shared\sqlmgmproviderxpsp2up.mof"



Based on Microsoft support - kb966013:




Linux - SQL Server Service: Status - Stop - Start - Restart Commands

A listing of commands to check the status of the SQL Server service, along with stopping, starting and restarting the service (RHEL):


Status Check

                 systemctl status mssql-server -l

Start Service


                 sudo systemctl start mssql-server

Re-start Service


                 sudo systemctl restart mssql-server

Stop Service

                 sudo systemctl stop mssql-server



Thursday, December 8, 2016

Database Uptime vs. Aircraft Engine Maintenance

Brent Ozar has a great way to compare maintaining databases to maintaining aircraft engines. It goes like this:


"...if you have an airplane, it’s absolutely imperative that its engines not fail mid-flight. In order to accomplish that, you have to have regular downtime for mechanics to examine and replace parts – and that doesn’t happen up in the air. 

With Availability Groups, we’re lucky enough to be able to transfer our passengers databases from one airplane to another quickly – but we still have to have those other airplanes getting constant examinations and patches from mechanics." (Brent Ozar, December 7, 2016)


You can read the full posting at:  Availability Groups: More Planned Downtime for Less Unplanned Downtime