Detailed Dataset Specifications

Notes on creating datasets for binary and multiclass classification

Easy Predictive Analytics reads the first 1000 rows of the file to determine how many values for the variables you want to predict (in the above example, there are two types, “Continuation” and “Withdrawal”). If the variable to be predicted is a character string, binary or multiclass classification is determined according to the determined unique number. If you want to perform binary classification, make sure that the first 1000 row contains two values for the variable you want to predict. If you want to perform multiclass classification, sort the first 1000 row so that there are at least three possible values for the variable you want to predict. Also, note that you cannot use more than 200.

Handling of missing values

Missing values are treated as missing information, rather than being replaced with other values such as zero. Use an empty string if there is a missing value.

Size of the training data

Prepare training data with 100 to 1 million rows and 2 to 999 columns. For Time Series Prediction Mode, prepare training data with 20 to 10,000 rows and 2 to 200 columns. When using data join, prepare the training data so that the total number of columns of the training data and the related data does not exceed 200 columns.

As the number of rows and columns increases, the learning time and memory usage increases. If the memory usage exceeds the capacity of your PC, the software may terminate.