smarsh university logo white

Data Science Specialist


This web-based course provides a comprehensive introduction to Cognition Studio, with a focus on Data Science tasks. Attendees will gain a good understanding of the Cognition Studio environment. They will learn how to develop and implement models to flag potential violations to help meet their surveillance requirements.

Course Topics

  • Introduction to Conduct Surveillance and the compliance use case
  • Introduction to the "scenario-based" approach
  • Introduction to machine learning modules
    • Machine learning modules for text classification
    • Supervised machine learning
    • Predictions and confidence score
    • Confidence threshold
  • Quality metrics for text classifiers
    • Precision/recall/F1 score
    • Confidence threshold
    • PR Curve
  • Cross-fold validation
  • Creating a model in Cognition Studio UI
    • Creating a new model project
    • Creating a new label set
      • How to choose training data
      • Binary versus non-binary classifiers
      • Classifier spans; sentence-level, document-level, and "other"
      • Bootstrapping with examples and keywords
      • Labeling data
        • Samplers: Keyword, random, search, top-predictions, random predictions, highest entropy
        • Tagging guidelines for ambiguous and confounding samples
        • "Cross-set" tagging option
        • Iterating on the tag/train/predict loop
    • Annotation Consistency
      • Importance of annotation consistency
      • Recommendations for creating annotation guideline documentation
    • Model Evaluation in the Cognition Studio UI
      • Evaluation on unlabeled datasets
      • Evaluation on labeled datasets
    • (OPTIONAL) Model behavior "under-the-hood"
      • Data normalization
      • Text features
      • Pre-trained word vectors