ETHICAL AND SOCIAL ISSUES IN NLP Flashcards
Negative uses of NLP
Profiling of users?
Generating harmful tweets/comments?
Propaganda?
Manipulation and framing?
Using the right training data
Incomprehensible training data – not fully disclosed, explored or understood
- The entire Web?
- A large Twitter dataset collected by a single keyword?
eg ChatGPT is only based on data in the US
Size doesn’t guarantee diversity
Static data vs changing social views
Data bias
AI is trained on biased data
Racially, gendered, ethically
All data is biased
also depending on how the data is annotated
Ethical issue with data
Data have significant implications
- Is the data representative? What about bias?
- Is the data appropriate for the task?
- How was the data labeled?
- Copyright? Private data?
AI Data Pollution
We could run out of data to train AI
It’s going to get trickier to find good-quality, guaranteed AI-free training data
Issues with testing and validation
Instead of just accuracy:
Systematic evaluation needed
- What is the value? Can the model do harm?
- How feasible it is to deploy it?
- Validation should lead to trust and confidence
LLM Risks
- Discrimination, hate speech and
exclusion - Information hazards (privacy)
- Misinformation
- Malicious Use (Fraud)
- Human-Computer interaction
harms (stereotypes) - Environmental and socioeconomic
harms (hurt creative economies?)
Environmental impact
Training a single BERT base model (without hyperparameter tuning) on GPUs was estimated to require as much energy as a trans-American flight
Geopolitical impact
Counties battling to create the best AI
Importance of having the leading AI
Resource challenges
Computational infrastructure
- Who can afford these? Only the selected ones?
- Training large models is challenging and very costly
- Running costs
Right staff in right places?
Right partnerships?
Regulations
We need an agreed regulatory framework
We need agreed ethical and validation framework(s)
Transparency
What is pre-mortem instead of post-mortem
Consider known and try to understand unknown risks and limitations of new product/project before it has been even designed