Getting Started


The optimization algorithms are based on Numpy. If using conda, run the following command to install all necessary packages and then activate the environment:

$ conda env create --file environment.yml
$ source activate casimir

If not using conda, the file environment.yml contains the names of the required packages.


Clone the repository available here:

$ git clone
$ cd casimir/

The following command complies the Cython code, which is needed only for the experiments on named entity recognition:

$ ./scripts/

Quick start: Binary Classification

Here is an example on how to use this code base for binary classification on the Iris dataset (class 1 or not class 1). Run the following in Python REPL

>>> import sklearn.datasets as ds
>>> from import LogisticRegressionIfo
>>> X, y = ds.load_iris(return_X_y=True)
>>> ifo = LogisticRegressionIfo(X, y==1)

This objective function can then be optimized using casimir.optim.optimize_ifo, using different optimization algorithms such as Casimir-SVRG, SVRG or SGD. In this example, we run 20 passes of SGD with a constant learning rate of 0.05 and weighted averaging (default):

>>> import numpy as np
>>> import casimir.optim as optim
>>> w, logs = optim.optimize_ifo(np.zeros(4), ifo, algorithm='SGD', num_passes=20,
                                 optim_options={'initial_learning_rate': 0.05})

This package prints out the function value after each pass through the dataset.

Playing with a larger dataset

Let us now experiment with a larger dataset. Download the covtype dataset (size 53.8 MB) from this link and place it in the folder data/. Then, run the file examples/ as follows:

>>> python examples/

The data is normalized so that a learning rate of 1 will work for SVRG and SGD. Feel free to play around with the optimization options in CasimirSVRG, SVRG and SGD. Note that the optimization algorithm is controlled by the parameter algorithm of optim.optimize_ifo. For reference, with the given parameter settings, at the end of ten iterations, Casimir-SVRG should achieve a function value of 0.6624 with warm_start = 'prox-center' and 0.6606 with warm_start = 'extrapolation' while simple SVRG reaches a function value of 0.0664.

Starting with Named Entity Recognition

If you already have the CoNLL-2003 dataset for named entity recognition and have installed the Cython code above, then proceed to examples/ If not, consult this page on how to obtain the data. This example is structured in much the same way as the previous example. It assumes that the data is available in the folder data/conll03_ner/. A learning rate of about \(10^{-2}\) works for SGD and SVRG (more aggressive learning rates work for SVRG as well). For Casimir-SVRG, try setting grad_lipschitz_parameter to 100 or thereabouts.

Note: These learning rates mentioned above are not tuned for best performance, but are simply ballpark numbers to get started. The parameters obtained from tuning may be found in scripts/