Public Safety Canada Flashcards
Talk about the programming work you did with PSC
- My main project was conducting quality assurance on a ML and NLP system that automated reconnaissance of incel communication channels. This was done alongside policy analysts, where our role was identifying failures in the performance of the system and correcting them or documenting them in our reporting.
- Led an initiative to centralize intel sources from RCMP and PSC to create a database of terrorist activity records.
- Developed end-to-end ETL pipelines to retrieve data from external APIs and web sources.
- Developed scripts to automate workflows, ie., text cleaning, data exports
Talk about the research you did with PSC
At Public Safety Canada, my research focused on how individuals become radicalized within incel communities, the various factors leading to extremism, and what measures can be implemented to prevent that.
Bulk of my work done in this area was in conducting research, writing, and providing high level explanations on some of the more technical aspects within our research.
*The two most impactful initiatives were content moderation (consider over-moderation, do not create echo chambers), and **life experiences **(avoiding NEET lifestyles, moving out, going to college, etc). *
Talk about your research methods
Literature Review and Evidence Gathering
Data Analysis (2/5 will be recommended incel content through their algorithms)
Policy and Initiative Recommendations
After training a machine learning (ML) or natural language processing (NLP) model for detecting deviations in communication patterns
* Involves investigating patterns in false positives and false negatives to understand where the model is underperforming and to refine or retrain the model with targeted adjustments.
* Feature Engineering: If errors indicate that the model is failing to detect certain communication deviations, you might need to refine the feature set used for training the model. For example:
In ML models, adding more features like **temporal patterns** (how often certain keywords appear over time) or **network features** (who is communicating with whom and how frequently) could improve performance.
Speak technically about your ML experience
Main Point 1 (ML/NLP focus):
I worked on ML and NLP systems to detect communication patterns indicating radicalization, self harm, violence, ensuring accuracy through testing.
Main Point 2 (Model building and preprocessing):
I used TensorFlow to build text classification models and spaCy to preprocess text, helping the model detect subtle language patterns.
Main Point 3 (Dataset management and testing):
I tested the entire pipeline thoroughly with pytest and unittest, containerizing it with Docker for consistency.
Main Point 4 (Monitoring and improvement):
I set up real-time monitoring and feedback loops for continuous retraining, ensuring the system improved over time.
I worked extensively on machine learning (ML) and natural language processing (NLP) systems, focusing on detecting communication patterns that could indicate extremism, self harm, violence, specifically within incel communication channels. I was responsible for ensuring their accuracy through comprehensive testing and quality assurance.
I primarily used TensorFlow in Python to build sequence classification models, which classified text into categories like ‘normal communication,’ ‘radicalization,’ and ‘self harm.’ For the NLP tasks, I utilized spaCy to preprocess text, creating custom pipelines for tokenization.
I implemented thorough testing using pytest and unittest, ensuring each part of the pipeline functioned properly.
After deployment, I set up scripts to track model performance with new data, and established feedback loops for continuous retraining based on inaccuracies.