BlitzML API reference¶
This is the complete API reference for the BlitzML python package.
Training L1-regularized models¶
L1 regularization is a popular approach to training sparse models. BlitzML efficiently solves L1-regularized problems with a variety of loss functions.
Problem classes¶
-
class
blitzml.
LassoProblem
(A, b)[source]¶ Class for training sparse linear models with squared loss. The optimization objective is
Parameters: - A (numpy.ndarray or scipy.sparse matrix) – n x d design matrix.
- b (numpy.ndarray) – Labels array of length n.
Blitz tries its best to avoid copying data, but depending on the formats of A and b, Blitz may make a copy of these arrays. To avoid copying, follow these guidelines:
- If A is dense (numpy.ndarray), define A as an F-contiguous array.
The dtype for A should match ctypes.c_double, ctypes.c_float, ctypes.c_int,
or ctypes.c_bool—that is,
A.dtype == ctypes.c_double
evaluates to True, for example. - If A is sparse, define A as a scipy.sparse.csc_matrix. The dtype for A.indices should match type ctypes.c_int. The dtype for A.indptr should match type ctypes.c_size_t. BlitzML can work with double, float, int, and bool dtypes for A.data without copying.
- The dtype for b should match type ctypes.c_double.
BlitzML will print a warning when objects with more than 1e7 elements are being copied. To suppress warnings, use
blitzml.suppress_warnings()
.-
compute_max_l1_penalty
(include_bias_term=True)¶ Compute the smallest L1 penalty parameter for which all weights in the solution equal zero.
Parameters: include_bias_term (bool, optional) – Whether to include an unregularized bias term in the model. Default is True. Returns: max_l1_penalty Return type: float
-
solve
(l1_penalty, include_bias_term=True, initial_weights=None, stopping_tolerance=0.001, max_time=31540000.0, min_time=0.0, max_iterations=100000, verbose=False, log_directory=None)¶ Minimizes the objective
where L is the problem’s loss function.
Parameters: - l1_penalty (float > 0) – Regularization parameter for L1 norm penalty on weights. When this value is larger, the solution generally contains more zero entries and Blitz generally completes optimization faster.
- include_bias_term (bool, optional) – Whether to include an unregularized bias parameter in the model. Default is True.
- initial_weights (numpy.ndarray of length d, optional) – Initial weights to warm-start optimization. The solver requires less time if initialized to a good approximate solution. Default is None.
- stopping_tolerance (float, optional) – Stopping tolerance for solver. Optimization terminates if (duality_gap / objective_value) is less than stopping_tolerance. Default is 1e-3.
- max_time (float, optional) – Time limit in seconds for solving. If stopping tolerance is not reached, optimization terminates after this number of seconds. Default is 1 year.
- min_time (float, optional) – Minimum time in seconds for solving. Optimization continues until this amount of time passes, even after reaching stopping tolerance. Default is zero.
- max_iterations (int, optional) – Iterations limit for algorithm. If stopping tolerance is not reached after this number of iterations, optimization terminates. Default is 100000.
- verbose (bool, optional) – Whether to print information, such as objective value, to sys.stdout during optimization. Default is False.
- log_directory (string, optional) – Path to existing directory for Blitz to log time and objective value information. This directory should be empty prior to solving. Default is None.
Returns: solution
Return type: BlitzMLSolution
-
class
blitzml.
SparseLogisticRegressionProblem
(A, b)[source]¶ Class for training sparse linear models with logistic loss. The optimization objective is
where i indexes the ith row in A and ith entry in b. Each label b_i should have value 1 or -1. BlitzML treats other label values as 1 if the value exceeds zero and -1 otherwise.
Calls to
solve
andcompute_max_l1_penalty
use interfaces identical to the same methods inblitzml.LassoProblem
.
-
class
blitzml.
SparseHuberProblem
(A, b)[source]¶ Class for training sparse linear models with huber loss. The optimization objective is
where
Here i indexes the ith row in A and ith entry in b.
Calls to
solve
andcompute_max_l1_penalty
use interfaces identical to those inblitzml.LassoProblem
.
-
class
blitzml.
SparseSquaredHingeProblem
(A, b)[source]¶ Class for training sparse linear models with squared hinge loss. The optimization objective is
where the “+” subscript denotes the rectifier function. Each label should have value 1 or -1.
Calls to
solve
andcompute_max_l1_penalty
use interfaces identical to those inblitzml.LassoProblem
.
Solution classes¶
-
class
blitzml._sparse_linear.
LassoSolution
[source]¶ Solution class for
LassoProblem
.-
bias
¶ Value of model’s bias term.
-
compute_loss
(A, b)¶ Compute the sum
where L is the problem’s loss function.
Parameters: - A (numpy.ndarray or scipy.sparse matrix) – n x d design matrix for evaluating loss.
- b (numpy.ndarray) – Corresponding labels array of length n.
Returns: loss
Return type: float
-
dual_solution
¶ Dual solution to optimization problem.
-
duality_gap
¶ Duality gap between primal and dual solutions (which upper bounds the suboptimality of these solutions).
-
objective_value
¶ Objective value of solution.
-
predict
(A)¶ Predict label values from feature vectors.
Parameters: A (numpy.ndarray or scipy.sparse matrix) – n x d design matrix to make predictions for. Returns: predictions Return type: numpy.ndarray of length n.
-
save
(filepath)¶ Save model to disk.
Parameters: filepath (string) – Location to save solution.
-
solution_status
¶ Status of Blitz algorithm upon returning model.
-
weights
¶ Array of model’s weight values.
-
-
class
blitzml._sparse_linear.
SparseLogisticRegressionSolution
[source]¶ Solution class for
SparseLogisticRegressionProblem
.Except for the following additional method, interface is identical to
LassoSolution
’s interface.
-
class
blitzml._sparse_linear.
SparseHuberSolution
[source]¶ Solution class for
SparseHuberProblem
.Interface is identical to
LassoSolution
’s interface.
Utility functions¶
-
blitzml.
load_solution
(filepath)[source]¶ Load BlitzMLSolution from disk.
Parameters: filepath (string) – Path to saved solution. Returns: solution Return type: BlitzMLSolution.
-
blitzml.
parse_log_directory
(log_directory)[source]¶ Parse files logged by BlitzML during a solve call.
Parameters: log_directory (string) – Path to directory containing log files to parse. Returns: logs – Iterable over information logged by BlitzML. Each item is a dictionary of logged values. Return type: generator