Other Specifications

Notes on creating datasets for binary and multiclass classification

Easy Predictive Analytics reads the first 1000 rows of the file to determine how many values for the variables you want to predict (in the above example, there are two types, “Continuation” and “Withdrawal”). If the variable to be predicted is a character string, binary or multiclass classification is determined according to the determined unique number. If you want to perform binary classification, make sure that the first 1000 row contains two values for the variable you want to predict. If you want to perform multiclass classification, sort the first 1000 row so that there are at least three possible values for the variable you want to predict. Also, note that you cannot use more than 200.

Handling of missing values

Missing values are unrecorded data. Use an empty string if there is a missing value.

Size of prediction model creation (training) data

Prepare prediction model creation (training) data with 100 to 1 million rows and 2 to 200 columns. For Time Series Prediction Mode, prepare prediction model creation (training) data with 20 to 10,000 rows and 2 to 200 columns. When using data join, prepare the prediction model creation (training) data so that the total number of columns of the prediction model creation (training) data and the related data does not exceed 200 columns.

As the number of rows and columns increases, the learning time and memory usage increases. If the memory usage exceeds the capacity of your PC, the software may terminate.