SVM_REGRESSOR
Trains the SVM model on an input relation.
This is a meta-function. You must call meta-functions in a top-level SELECT statement.
Behavior type
VolatileSyntax
SVM_REGRESSOR ( 'model-name', input-relation, 'response-column', 'predictor-columns'
[ USING PARAMETERS
[exclude_columns = 'excluded-columns']
[, error_tolerance = error-tolerance]
[, C = cost]
[, epsilon = epsilon-value]
[, max_iterations = max-iterations]
[, intercept_mode = 'mode']
[, intercept_scaling = 'scale'] ] )
Arguments
model-name- Identifies the model to create, where
model-nameconforms to conventions described in Identifiers. It must also be unique among all names of sequences, tables, projections, views, and models within the same schema. input-relation- The table or view that contains the training data. If the input relation is defined in Hive, use
SYNC_WITH_HCATALOG_SCHEMAto sync thehcatalogschema, and then run the machine learning function. response-column- An input column that represents the dependent variable or outcome. The column must be a numeric data type.
predictor-columnsComma-separated list of columns in the input relation that represent independent variables for the model, or asterisk (*) to select all columns. If you select all columns, the argument list for parameter
exclude_columnsmust includeresponse-column, and any columns that are invalid as predictor columns.All predictor columns must be of type numeric or BOOLEAN; otherwise the model is invalid.
Note
All BOOLEAN predictor values are converted to FLOAT values before training: 0 for false, 1 for true. No type checking occurs during prediction, so you can use a BOOLEAN predictor column in training, and during prediction provide a FLOAT column of the same name. In this case, all FLOAT values must be either 0 or 1.
Parameters
exclude_columns- Comma-separated list of columns from
predictor-columnsto exclude from processing. error_tolerance- Defines the acceptable error margin. Any data points outside this region add a penalty to the cost function.
Default: 0.1
C- The weight for misclassification cost. The algorithm minimizes the regularization cost and the misclassification cost.
Default: 1.0
epsilon- Used to control accuracy.
Default: 1e-3
max_iterations- The maximum number of iterations that the algorithm performs.
Default: 100
intercept_mode- A string that specifies how to treat the intercept, one of the following
-
regularized(default): Fits the intercept and applies a regularization on it. -
unregularized: Fits the intercept but does not include it in regularization.
-
intercept_scaling- A FLOAT value, serves as the value of a dummy feature whose coefficient Vertica uses to calculate the model intercept. Because the dummy feature is not in the training data, its values are set to a constant, by default set to 1.
Model attributes
coeff- Coefficients in the model:
-
colNames: Intercept, or predictor column name -
coefficients: Coefficient value
-
nAccepted- Number of samples accepted for training from the data set
nRejected- Number of samples rejected when training
nIteration- Number of iterations used in training
callStr- SQL statement used to replicate the training
Examples
=> SELECT SVM_REGRESSOR('mySvmRegModel', 'faithful', 'eruptions', 'waiting'
USING PARAMETERS error_tolerance=0.1, max_iterations=100);
SVM_REGRESSOR
----------------------------------------------------------------
Finished in 5 iterations.
Accepted Rows: 272 Rejected Rows: 0
(1 row)