The data format is tabular data, and the file formats are CSV (comma-separated) and TSV (tab-separated).
Each row corresponds to one sample (a single piece of data is called a sample. For example, in customer data, it refers to customers), and each column (variable) corresponds to an attribute of the sample (e.g., age, gender, etc.). The first row of the data file contains column names (variable names), and the second and subsequent rows contain sample information. Each row must have the same number of variables.
The variable to be predicted (for example, continue or withdraw) is one of the variables in the file. For an example of a dataset, see the figure below. The variable to be predicted is “Continuation and withdrawal”. Please refer to the enclosed sample data as an example.
The following data types are available: You can only specify text or numeric values for binary or multiclass classification, and numeric values for regression.
Data type | Description |
---|---|
String | Categorical value (e.g., the “gender” variable above) |
Text | Text (Text written in Japanese or English) |
Numeric value | Integer, decimal, or other number (e.g., the “past purchases” variable above) |
Date/time | Date and time (e.g., the “registration date” variable above) |
The date and time data included between 0:0 on January 1, 1970 and 23:59 on December 31, 3999 can be read and used as datetime data. To read datetime data, the format of the datetime data must be consistent. For example, the following data can be read as datetime data.
Date/time data description | Example |
---|---|
Year data | “2019” |
Month data | “2019-6” “201906” |
Day data | “2019/6/12” “2019-6-12” “20190612” “2019/06/12 00:00:00” |
Hour/minute data | “2019/06/12 03:00:00” “2019-06-12 21:30:00” |
Variables whose data type is date and time must be prepared in the following format: The dates and times that can be used are from 0:0 on January 1, 1970 to 23:59 on December 31, 3999. Seconds data may be present, but it is not used by Easy Predictive Analytics. (y = year, M = month, d = day, H = hour, m = minute, s = second)
yyyy-MM-dd HH:mm:ss
yyyy-MM-dd HH:mm
yyyy-MM-dd H:mm:ss
yyyy-MM-dd H:mm
yyyy-MM-dd
yyyy-MM-d HH:mm:ss
yyyy-MM-d HH:mm
yyyy-MM-d H:mm:ss
yyyy-MM-d H:mm
yyyy-MM-d
yyyy-M-dd HH:mm:ss
yyyy-M-dd HH:mm
yyyy-M-dd H:mm:ss
yyyy-M-dd H:mm
yyyy-M-dd
yyyy-M-d HH:mm:ss
yyyy-M-d HH:mm
yyyy-M-d H:mm:ss
yyyy-M-d H:mm
yyyy-M-d yyyy/MM/dd HH:mm:ss
yyyy/MM/dd HH:mm
yyyy/MM/dd H:mm:ss
yyyy/MM/dd H:mm
yyyy/MM/dd
yyyy/MM/d HH:mm:ss
yyyy/MM/d HH:mm
yyyy/MM/d H:mm:ss
yyyy/MM/d H:mm
yyyy/MM/d
yyyy/M/dd HH:mm:ss
yyyy/M/dd HH:mm
yyyy/M/dd H:mm:ss
yyyy/M/dd H:mm
yyyy/M/dd
yyyy/M/d HH:mm:ss
yyyy/M/d HH:mm
yyyy/M/d H:mm:ss
yyyy/M/d H:mm
yyyy/M/d
yyyy-MM
yyyy-M
yyyyMMdd
yyyyMM
dd-MM-yyyy
dd-M-yyyy
d-MM-yyyy
d-M-yyyy
yyyy
mmm-yy
(mmm
is the English abbreviation for the month name. For example, Jan-21
represents 2021年1月
. In this format, when the current year and month are yyyy年mm月
, only data within (yyyy-80)年(mm+1)月~(yyyy+20)年(mm)月
and within 1970年1月~3999年12月
can be used).