This page summarizes the API for various data related operations for tasking including binary classification, named entity recognition and object localization. Lastly, it talks about how to use the optimization algorithms to new tasks or datasets.
casimir.data.
LogisticRegressionIfo
(data, labels)¶Create an incremental First Order Oracle for logistic regression.
The component function \(f_i\) is the logistic loss defined on the \(i\) th example for binary classficiation.
Parameters: 


batch_function_value
(model)¶Return function value \(f(w)\) where \(w\) is model
.
batch_gradient
(model)¶Return gradient \(\nabla f(w)\) where \(w\) is model
.
evaluation_function
(model)¶Compute the classification error (01 loss).
function_value
(model, idx)¶Return function value \(f_i(w)\) where \(w\) is model
and \(i\) is idx
.
gradient
(model, idx)¶Return gradient \(\nabla f_i(w)\) where \(w\) is model
and \(i\) is idx
.
casimir.data.named_entity_recognition.
create_ner_ifo_from_data
(train_file, dev_file=None, test_file=None, smoothing_coefficient=None, num_highest_scores=5)¶Create Smooth IFOs for train, development and test sets in CoNLL 2003 for the structural hinge loss.
Each component function \(f_i\) is the structural hinge loss for the \(i\) th datapoint \(x_i,y_i\).
Parameters: 


Returns:  Train, development and test Smoothed IFOs. Dev (test) IFO is 
casimir.data.named_entity_recognition.
NamedEntityRecognitionIfo
(ner_dataset, num_ner_tags, ner_to_idx, smoothing_coefficient=None, num_highest_scores=5)¶Create a smoothed incremental first order oracle for structural hinge loss for the task of NER.
Supports both nonsmooth and \(\ell_2\) smoothed versions of the structural hinge loss but not entropy smoothing.
Parameters: 


evaluation_function
(model)¶Return the \(F_1\) score on all tags excluding 'O'
.
function_value
(model, idx)¶Return the pair \(\big( f_i(w), f_{i, \mu}(w) \big)\).
If smoothing_coefficient
is None
, i.e., no smoothing,
simply return \(f_i(w)\), where \(i\) represents the index idx
.
gradient
(model, idx)¶Return that gradient \(\nabla f_{i, \mu}(w)\) if smoothing_coefficient
is not None
or \(\nabla f_i(w)\) if smoothing_coefficient
is None
.
casimir.data.named_entity_recognition.
NerDataset
(lines, word_to_idx, pos_to_idx, chunk_to_idx, ner_to_idx)¶Class to parse CoNLL2003 data and allow random access into dataset.
Requires output of generate_counts_and_feature_map
.
casimir.data.named_entity_recognition.
viterbi_decode
()¶Viterbi decoding to find maximum scoring label sequence
Parameters: 


Returns:  highest score, best scoring label sequence 
casimir.data.named_entity_recognition.
viterbi_decode_top_k
()¶TopK Viterbi decoding to find \(K\) maximum scoring label sequences
Parameters: 


Returns:  \(K\) highest scores and the corresponding label sequences 
casimir.data.localization.
create_loc_ifo_from_data
(obj_class, label_file, features_dir, dimension=2304, smoothing_coefficient=None, num_highest_scores=5)¶Create a smoothed incremental first order oracle for train and validation sets for Pascal VOC 2007 given candidate bounding boxes and features.
Parameters: 


Returns:  Train, dev and test Smoothed IFOs. Dev (test) IFO is 
casimir.data.localization.
LocalizationIfo
(voc_dataset, smoothing_coefficient=None, num_highest_scores=10)¶Create the smoothed IFO object for the structural hinge loss object localization with Pascal VOC.
Supports both nonsmooth and \(\ell_2\) smoothed versions of the structural hinge loss but not entropy smoothing.
Parameters: 


evaluation_function
(model)¶Return average IoU, localization accuracy and average precision.
function_value
(model, idx)¶Return the pair \(\big( f_i(w), f_{i, \mu}(w) \big)\).
If smoothing_coefficient
is None
, i.e., no smoothing,
simply return \(f_i(w)\), where \(i\) represents the index idx
.
gradient
(model, idx)¶Return that gradient \(\nabla f_{i, \mu}(w)\) if smoothing_coefficient
is not None
or \(\nabla f_i(w)\) if smoothing_coefficient
is None
.
casimir.data.localization.
VocDataset
(obj_class, label_file, feature_dir, data_type, num_bboxes_per_image=1000, dim=2304)¶Store a set of images of Pascal VOC 2007 for image localization. Supports __len__
and __getitem__
.
Each image is assumed to contain only one object of the class of interest (it may contain other classes).
Apart from the images and labels of Pascal VOC 2007, this class also requires a set of candidate bounding boxes to contain this class (e.g., from the output of selective search) and the IoU (intersection over union) of each candidate bounding box with the true bounding box, as well as features of the bounding box.
The candidate boxes and their feature representation are directly loaded from disk.
Parameters: 


The framework of IFOs decouples the optimization from the data and loss function used, as captured by the figure below.
In order to define a new incremental first order oracle, one must override the class
casimir.optim.IncrementalFirstOrderOracle
or the class
casimir.optim.SmoothedIncrementalFirstOrderOracle
.
See the documentation of these classes for more details.
See IFOs for classification or for structural support vector machines for named entity recognition and visual object localization for reference on how this is done.