Working in NLP Flashcards
NLP and Data Science
-Creating structured data from unstructured data (an estimate of 80-85% of all business information is unstructured and growing faster than structured data). NLP is made for getting a structured representation out of unstructured text.
-ML: Supervised ML is used for training parsers (Ngram trainers, Naive Bayes) and training classifiers (Naive Bayes, and SVM)
-ML: Unsupervised ML is used for document clustering (K-means, Affinity propagation, Hierarchical clustering)
-Using NLP for feature engineering, including feature extraction
-NLG as Presentation Layer, summaries, short paragraphs.
-NLU for validating discrete data
-Search (web search and enterprise search - intranet)
-Amazon Comprehend (medical notes and apply entity extraction). Knowledge graph example, extract entities and find relationships.
-Legal service (Lexis Nexus)
-Law enforcement: https://www.computer.org/publications/tech-news/research/human-sex-trafficking-dark-web-ai-investigation-tool
Crawl documents (e.g., 5 million)
Extract useful information
Find entities that you care about
Map to an ontology or knowledge graph
Transfer documents to graph based representation - happened 8-9 years ago with graph-based approach. Traverse a graph. Nodes and edges and techniques to traverse.
-In most professions, entities are defined by SMEs (physicians, etc.)
-Chatbots, conversational AI. Virtual assistant, etc.
Most common use cases. Interaction falls into a category. What is the intent of the user. Intent classification - service request, information request, action request, info request, decision request, notification request. Mapping intents to tasks. Determined by the domain experts.
-Beyond Meena is Lambda. Article today 5/18.
https://www.blog.google/technology/ai/lamda
Job roles that utilize NLP
Software engineer
-Become a master of one thing like document clustering, semantic parsing, ontologies, sentiment, etc.
Knowledge engineer
- Interfaces with AI engineers and SMEs
- Hybrid role, people skills + technical skills, bridge, translator
Data Scientist (apply algorithms)
- Researcher
- Statistician
- Software Engineer
- AI practitioner
- Visualizer
- Communicator
DBA
- Choosing between db paradigms to fit projects/organizations
- Designing data models
- Implementing data governance policies
- Performing ETL
- Help generate reports
Applied linguistics researcher
- Applies linguistics to help/manage real people
- Chief application is education (ESL, learning impaired, bilingual)
- Automated writing evaluation
- Online reading help
- Automated grade-level estimation
- Nonnative speaker support
Cognitive scientist
- Neuroscientist, biologists, economist\, psychologists
- Studies mechanisms of human thinking, deciding, speech acts, language acquisition, and everything we do with words
Marketing technologies (Martech)
- Tries to get the right product or service in the front of the right person in the right place at the right time
- Tries to show contextual ads (relates to page its on)
- Newsletter
- Customer service records
Sectors that utilize NLP
Public sector
Private
Non-profit
Information Services
- Search and browsing
- Lexis Nexus
- AMA medical journals
eCommerce, selling things online
-parse product descriptions to determine if products are same and to do price comparisons
Customer Service Desk
- Call rep’s unstructured notes
- Text submissions from the customer
- Chat sessions between customer and rep
- Case based reasoning, uses NLP and ML for rep to type in a description. NLP finds most similar case.
- Symptom-syndrome analysis
Law Enforcement/Miltary
- Time sensitive fuzzy search for named entities
- Disaster alerts
- Terrorist activity alerts
- Other perpetrator detection
Legal
-Electronically Stored Information (ESI)
Business Intelligence
- Data sources: CRM, ERP, Supply Chain
- Users: OLAP Analysis, Data Mining, Reporting
Consumer Devices
- Amazon echo, siri, etc.
- NLI (Natural Language Interface
- Smart phones, watches
Publishing and Media
-Hot button detection (trending topics). What is their audience most interested in lately? Watch how conversation is changing day to day, week to week.
Inferred data vs. Declared Data
-Surveys would be an example of declared data. Gathering and analyzing twitter data would be inferred data.
Organizations Supporting NLP
Associations supporting NLP:
- ACM: Association of computing machinery.
- IEEE: Institute of Electrical and Electronics Engineers. A lot of money and members, IBM, Intel. Technical committees on semantic computing, pattern analysis, machine intelligence, intelligent informatics, data engineering, local workshops.
- AAAI: Association for the Advancement of Artificial Intelligence. If you can only join one, join this one.
- IJCAI: International Joint Conference on Artificial Intelligence. (idgchi)
- AAAL: American Association of Applied Linguistics. Addresses language of the lives of individuals. interdisciplinary field draws on a wide range of approaches.
- ICLA: International Cognitive Linguistics Association.
- SIAM international conference on Data Mining
- ACL: Annual Conference of the Association for Computational Linguistics