I'm defining a mining structure against an OLAP dimension. The continuous value that I'm using both as input and for forecasting represents the time to complete a certain process.
There's something that strikes me as if it could be a problem, but I'm not sure. Our fact table has multiple columns (with multiple correponding measures in the cube). The "time-to-complete" measure is only populated on some of the fact rows - the rows that represent completion information. Other rows represent other information, and the "time-to-complete" value is set to 0. This works fine for cumulative time-to-complete and average time-to-complete, but it seems like it could mess up data mining. Will those 0-value facts skew the mining results I'm not seeing a way to filter out those entries and only include the non-zero facts in the mining processing.
Or perhaps I'm totally misunderstanding something, which is quite possible. :)

Data mining cube with 0-value facts
prashant mulay
Duncan Woods
I'm using the Neural Network algorithm, but I can't bring up the viewer - I just get an error (which I reported in another thread in these forums).
sam1996
Evan Mulawski
How would I do that There isn't a dimension slice that represents this condition, and that's the only filtering that I've been able to find so far.
A computed measure wouldn't be ideal, but it might work. I'll think about that.
Runespy
Kshama
Sounds like it will screw up DM stuff, though. If the values were nulls instead of 0, would DM handle it correctly
pjwhams
Sine you've modeled the variable as continous, a 0 value will be treated as a possible value whereas a null would be treated as missing by DM. They would hence generate different DM model.
If the only zero value in your data correspond to null (missing) value, you can either filter those out from the input data or have another computed measure be null whenever the measure is 0 and use that measure as input to the data mining algorithm.
Hope this helps
kgs1951
Aha!
That's the reason for this problem and may be the reason for the viewer problem as well. It seems you found a bug in the system, the bug being that the software didn't return an error when you tried to create this model
.
When you mark a column as "input" what you are stating is "use this column as an input to predict other columns." When you mark it as output you are stating "determine values of this column based on all other inputs." We have the concept of "input AND output" because we allow you to predict multiple attributes in a single model.
Therefore, what is happening is that for that column, you have no inputs, and it is an input for no other columns, since there are no more columns for which it could be an input. Simply put, it's an invalid model definition which we should have rejected from the start.
Thanks
-Jamie
JasonFx
I fixed the model, and the results seem to make more sense now. However, it didn't fix the problem with the mining model viewer.
Andre721
The zeroes will mess you up in DM and OLAP. An average of a value ignores nulls, but not zeroes.
You can set your measure to retain null values in the fact table - check out the data binding settings.
Hope this helps,
Richard
Bartosz
As a test, I created a new calculation that takes the original measure and converts 0-valued facts to Nulls. I added that new calculation to the mining structure and models and processed them. Then I ran a singleton prediction query for both the original and new measures. They came up with the exact same results. So, as far as I can tell, either:
a) The 0-value facts don't affect the mining results
b) The null facts are treated the same as the 0 facts (and not ignored)
c) Something about how I'm doing my test is flawed.
Anyone have any suggestions or thoughts here