Write My Paper Button

WhatsApp Widget

Write My Paper Button

WhatsApp Widget

Part B is a report on the various models that are built using the data set, including any imputations and transformations you may want to perform.  We will be building two sets of models, with different partition.

Part B is a report on the various models that are built using the data set, including any imputations and transformations you may want to perform.  We will be building two sets of models, with different partition.

To the Data node from above, add a ‘Manage Variables’ node.  The presence of this node is something required by Viya if we want to impute and/or transform variables.  We do not need to set anything within it.

To the ‘Manage Variables’ node, add nodes for imputations (if desired) and transformations.  Make the desired changes to the data, as you see fit, then execute the pipeline.  The transformation node should be on the bottom, after the imputation node.

To the bottom-most node in the flow that is not ‘Data Exploration’, add one node for each of the models that we want to examine:

Decision tree (use default settings) Forest (use default settings).  In many texts, this method is also known as Random Forest. Neural Network (4 total, using some different parameters.  for other parameters, use defaults) 1 hidden layer, 50 neurons per layer, TANH hidden layer activation function 5 hidden layers, 50 neurons per layer, TANH hidden layer activation function 1 hidden layer, 100 neurons per layer, TANH hidden layer activation function 1 hidden layer, 50 neurons per layer, ReLU hidden layer activation function Logistic Regression (4 total, with different variable selection methods.  for other parameters, use defaults): Forward Backward Stepwise (none) – this method forces in all the variables SVM (Support Vector Machine)  Use default settings. When you add the first node for one of your models, you will see that Viya also adds a node called ‘Model Comparison’ at the bottom.  As you continue to add nodes for the different models, Viya will connect all the subsequent models to the ‘Model Comparison’ node as well.

Before running your flow, left-click on the ‘Model Comparison’ node.  On the right panel that appears, go to the drop-down for ‘Class selection statistic’ and choose ‘Misclassification (Event)’.  We are telling Viya that we want to choose the best model(s) based on their ability to correctly classify outcomes.  Keep everything else set to the default.

This gives a total of 11 models on the data, with a training/validation partition ratio of 50/50.

Prepare a summary table with each method used and the misclassification rate on the validation partition.  Which model is the champion, having the lowest misclassification on the validation partition?

Discuss any observations you have on the results.  Were there any changes in the results for the neural network with the different parameter settings?  Did all the models come up with the same variables as the ones found to be predictive?  Were there any differences in the various regression methods?  If so, what were they?

–Create a second set of models with a different training-to-validation ratio

Next, create a new project as done before, with the initial data set.  This time, set the training partition = 60, the validation partition = 40, and keep the test partition again equal to zero.  By increasing the relative size of the training partition, we increase the amount of data available for training but still have enough data so that (hopefully) overfitting will not be a problem.

Except for the data exploration node, rebuild your pipeline in the new project just as you did before and execute it.

For this new set of models built using the 60/40 data partition, prepare a summary table with each method used and the misclassification rate on the validation partition.  Which model is the champion here?  Compare the two champions of the two different data partitions – are they the same method?  How do the misclassification rates for the two different partition functions compare?  Are any trends noticeable?  Are the variables determined to be predictive the same across both of the different partitions?

–Try some models using a feature that has been engineered

Create a new project with the same initial partition = 50 and validation partition = 50.  Replace one variable with an engineered variable using Viya (for a generic introduction to variable engineering, see the link at the bottom).  As an alternative, you may manually create a new data set in Excel and then import that to use here (if you do this, make sure that the spreadsheet contains values, not formulas, and you delete the column(s) used to generate your engineered variable). 

Make sure that you transform all the input variables as you did in the initial project with the same partitions.  You may also need to transform the engineered variable – explore the data in Viya and make your determination.  Create a pipeline using this data set and add nodes for each of the 11 different models noted above.  Set the parameters as noted above (when appropriate) and then execute the pipeline.  Compare the results of these models with the results of corresponding models from the project above.  Are there any improvements to misclassification rates?

End Part B – model building and evaluation

Feature Engineering:

An example by SAS on creating new variables for better predictive models within Enterprise Miner:

https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-Derive-New-Variables-for-Better-Predictive-Models/ta-p/221404