The optimization algorithms are based on Numpy.
If using `conda`

, run the following command to install all necessary packages and then activate the environment:

```
$ conda env create --file environment.yml
$ source activate casimir
```

If not using `conda`

, the file environment.yml contains the names of the required packages.

Clone the repository available here:

```
$ git clone https://github.com/krishnap25/casimir.git
$ cd casimir/
```

The following command complies the Cython code, which is needed only for the experiments on named entity recognition:

```
$ ./scripts/compile_cython.sh
```

Here is an example on how to use this code base for binary classification on the Iris dataset (class 1 or not class 1). Run the following in Python REPL

```
>>> import sklearn.datasets as ds
>>> from casimir.data.classification import LogisticRegressionIfo
>>> X, y = ds.load_iris(return_X_y=True)
>>> ifo = LogisticRegressionIfo(X, y==1)
```

This objective function can then be optimized using `casimir.optim.optimize_ifo`

, using different
optimization algorithms such as Casimir-SVRG, SVRG or SGD.
In this example, we run 20 passes of SGD with a constant learning rate of 0.05 and weighted averaging (default):

```
>>> import numpy as np
>>> import casimir.optim as optim
>>> w, logs = optim.optimize_ifo(np.zeros(4), ifo, algorithm='SGD', num_passes=20,
optim_options={'initial_learning_rate': 0.05})
```

This package prints out the function value after each pass through the dataset.

**Playing with a larger dataset**

Let us now experiment with a larger dataset.
Download the covtype dataset (size 53.8 MB) from this link
and place it in the folder `data/`

. Then, run the file `examples/logistic_regression.py`

as follows:

```
>>> python examples/logistic_regression.py
```

The data is normalized so that a learning rate of 1 will work for SVRG and SGD. Feel free to play around with the
optimization options in CasimirSVRG, SVRG and SGD. Note that the optimization algorithm is controlled by
the parameter `algorithm`

of `optim.optimize_ifo`

.
For reference, with the given parameter settings, at the end of ten iterations, Casimir-SVRG
should achieve a function value of 0.6624 with `warm_start = 'prox-center'`

and 0.6606 with `warm_start = 'extrapolation'`

while simple SVRG reaches a function value of 0.0664.

If you already have the CoNLL-2003 dataset for named entity recognition and have installed the Cython code above, then
proceed to `examples/named_entity_recognition.py`

. If not, consult this page on how to obtain the data.
This example is structured in much the same way as the previous example. It assumes that the data is available
in the folder `data/conll03_ner/`

.
A learning rate of about \(10^{-2}\) works for SGD and SVRG (more aggressive learning rates work for SVRG as well). For Casimir-SVRG, try
setting `grad_lipschitz_parameter`

to 100 or thereabouts.

Note: These learning rates mentioned above are not tuned for best performance, but are simply ballpark numbers to get started. The parameters obtained from tuning may be found in`scripts/named_entity_recognition.sh`

.