IBM Support

Why is the weighted N on parent node of a CHAID tree different compared to weighted N within the file?

Question & Answer


Question

I am working with IBM SPSS Statistics version 26 or below. I have a data file with 203 cases and some variables like one that indicates churn or not.
I would like to create a decision tree with CHAID to predict churn. As my sample has not enough observation I also have a weight variable with non-integer weights to balance the decision tree model.
When I do not weight my file the frequency table of my churn variable show these case counts (No: 157, Yes: 46 cases)
image-20200207082905-1
When I weight my file with my non integer weight variable the frequency for churn shows these number of cases (No: 172; Yes: 56 cases)
image-20200207082936-2
When I create a CHAID tree without weight I get the same number of N as for the Frequency table , which is expected:
image-20200207083001-3
However, for the weighted file I get different number of cases on parent node of the tree compared to the weighted frequency table which is not what I expect. I get now 156 No and 58 Yes for churn variable. Why is this difference?
image-20200207083021-4

Answer

This is functioning as designed. When you use a non-integer weight variable (fractional weight) then the values are rounded to the closest integer, therefore you get different weighted N n the tree compared to the weighted frequency table.
Please see the sentence below from the documentation Creating Decision Trees:
"Frequency weights If weighting is in effect, fractional weights are rounded to the closest integer; so, cases with a weight value of less than 0.5 are assigned a weight of 0 and are therefore excluded from the analysis. "

Related Information

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSLVMB","label":"IBM SPSS Statistics"},"Component":"","Platform":[{"code":"PF017","label":"Mac OS"},{"code":"PF033","label":"Windows"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
13 April 2020

UID

ibm11522875