Social Computing Flashcards
Confusion Matrices
a more accurate substitute for the basic probability that a user gives the correct answer
Consequences of information asymmetry
Leads to buyers making an adverse selection as the market consists of poor-quality items, as explained in The Lemon Problem
Moral Hazard
Hazards that arise when an individual is incentivised to take greater risks as they are shielded from the negative consequences of their actions.
Such as, failing to uphold an agreement after payment as there is no consequences to their actions
Purpose of Reputation Systems
Mitigate moral hazards and address the information asymmetry that leads to adverse selections in the market by quantifying the trust of individuals/entities using the wisdom of the crowd
Calculate Reputation Value
Calculated through averages of user ratings, improved with:
- correction for user bias
- weighted rankings (based on user trustworthiness i.e. past ratings rated helpful by others)
Confidence Values
Convey the certainty of the accuracy of a reputation values
Challenges with Reputation Systems
Ballot Stuffing
Slander & Self Promotion
Whitewashing
Fear of Retaliation
Individual Bias
Quality Variations
Lack of Incentives
Ballot Stuffing
Multiple ratings from a single user
To tackle, ensure there is some effort-based (or monetary) cost to providing rating
Slander & Self-promotion
Competitors place reviews to damage sales of others
To tackle, require identity authentication and proof of transaction. Allow ratings of helpfulness of ratings. Use trust interference based on a user’s history
Whitewashing
Users change their identity to reset their reputation
To tackle, ensure there is an effort-based cost to changing identity e.g. linking phone number
Fear of Retaliation
Users fear retaliation and give overly positive ratings
To tackle, simultaneously disclose buyer and seller ratings e.g. Airbnb. Or prevent seller from rating buyers
Individual Bias
Users rate overly positive or negative
To tackle, adjust score based on history
Quality Variations
Sellers sell low-value items to achieve a good reputation value, only to exploit this and sell low quality goods at high value
To tackle, weight ratings based on price and recency
Lack of Incentives
Users may not want to provide ratings
To tackle, provide non-monetary, or monetary, rewards
Motivations to Participate in Crowdsourcing
Financial Rewards
Self-development
Enjoyment
Altruism (Selflessness)
Social Aspects (Recognition, Competition, Validation)
Social Mobilisation
Address complex search problems through referral schemes e.g. recursive incentive scheme
Output-agreement Mechanism
Gamification technique
ESP Game for Image Labelling
Input-agreement Mechanism
Gamification technique
TagATune Game
Provide descriptions and guess if listening to same song as peer
Benefit of Gamification
Increased engagement
Anchoring Effect
Individuals rely heavily on the first piece of information they receive (initial pay) when making subsequent judgments or estimations
Expert Crowdsourcing Attributes
Quality is highly important
less workers relied on
Crowdsourcing Contests
Individuals submit solutions and best submission is rewarded
- Workers behave strategically and pick tasks with less competition
- Early success experience is critical for workers to continue contributing
- Only a small amount of workers succeed
Multi-armed Bandit Problem
Describes the challenge of maximising rewards when balancing the exploration of new options and the exploitation of the best option known
ε-greedy
ε-greedy considers a probability of whether to explore or exploit.
ε-greedy(0.05) is 5% chance of EXPLORATION and 95% chance of EXPLOITATION
Average Quality per Cost
AKA Reward to Cost Ratio
ε-first
allocates a percentage of the budget to exploration before repeatedly pulling arm with highest average quality per cost.
ε- first(0.1) gives 10% of budget to exploration
Best performance
Thompson sampling
keeps a prior for each arm and picks an arm by picking the best sample from each prior, it updates each prior using Bates’ theorem.
Naturally explores promising arms and exploits as uncertainty drops
Plurality Voting
top voted candidate wins. It fails to consider voter preferences beyond first choice
Borda count
candidates receive points based on sum of positions in voters’ preferences
Condorcet Winner
a winner determined through pair-wise plurality
Condorcet Paradox
there are scenarios in which the majority of voters will be dissatisfied i.e. no matter the outcome the majority will prefer another option
Condorcet Criterion
satisfied by a voting system which always chooses the Condorcet winner and therefore reflects the collective preferences of the voters
Borda count & plurality voting do not satisfy
Black’s rule & Copeland method satisfy
Black’s Rule
selects Condorcet winner or uses Borda count as a fallback
Copeland Method
selects winner by subtracting pairwise losses from pairwise victories
Spearman’s Footrule distance
the sum of displacements between candidates in two rankings
Kendall-Tau Distance
the sum of pairwise disagreements of candidates in two rankings
Finding Optimal rankings
compare all enumerations to voter preferences and sum up the distances
Spearman’s Footrule optimal ranking can be solved in polynomial time
Kendall-Tau is NP-hard
Collective Intelligence
intelligence that emerges from collaboration, collective efforts, or competition of many individuals
Social Computing
covers methods for building computational systems that harness collective intelligence
Human Computation
outsources computational microtasks, which are difficult for machines to complete, to humans
Requirements for a microtask
- Hard for computers, easy for humans
- Easily and quickly explainable to non-experts
- Fast to complete
- Amenable to automatic quality control and aggregation
- Robust to some noise
MTurk
Amazon Mechanical Turk (MTurk) is a crowdsourcing platform enabling the outsourcing of Human Intelligence Tasks (HITs) to remote workers known as Turkers
Solyent
a word processor powered by human computation with the functions of
- Shortn: Shorten Text
- Crowdproof: Proofreading & Grammatical Corrections
- The Human Macro: Issue tasks i.e. find an image to accompany text
Find-Fix-Verify pattern
- Find: Workers highlighted sections that needed attention e.g. mistakes, text that can be shortened
- Fix: Other workers proposed improvements for only one of the most commonly highlighted sections
- Verify: Other workers voted on the best improvements
Benefits of Find-Fix-Verify
ideal for complex tasks as it
- Separation of find and fix avoids lazy users who focus on the easiest fixes
- Highly parallelisable as a document can be split into chunks and distributed among workers
- Verification ensures quality
- Each stage has independent decisions
- Small tasks decrease the likelihood of errors
TurKit
TurKit is a framework for facilitating development of applications using Mechanical Turk. Its Crash-and-Rerun feature ensures progress is cached when a crash occurs and work resumes from the last successful checkpoint
Human Computation Quality Control Methods
Majority Voting
Weighted Voting: Assigns a higher weight to users who tend to agree more with others
Weighted Voting with Prior
How to get prior for Weighted Voting with Prior
use ground truths to identify each user’s probability of choosing the correct answer. Helps to deal with uncertainty of the accuracy of users’ answers. Prior Knowledge comes from
- Previous Interactions
- Reputation System
- Gold Standards (True Labels)
Trust
A subjective probability by which an individual expects another individual performs a given action on which its welfare depends
Reputation
Considers the collective opinion of multiple individuals about a single individual; an aggregation of personal experience
Experimental Design
Orchestration of experiments for analysis using proper scientific approach to tune a system for an optimal design
A/B Split Testing
Compares two versions of content to determine which is better
A/B Split Testing Advantages
- Easy to test, implement and analyse
- Flexible factor changes
A/B Split Testing Disadvantages
- Impractical to test all possible combinations to capture interdependence between factors so unlikely to find optimal solution
- Not possible to determine which factors contribute most as too many factors change at once
- Inefficient
- Will not find optimal
Multivariate Testing
Simultaneously tests multiple factors:
- One Factor At a Time
- Full Factorial Design
- Fractional Factorial Design
One Factor At a Time (OFAT)
Changes only one factor at a time from the base treatment and analyses the outcome
One Factor At a Time Advantages
Frequently Used
One Factor At a Time Disadvantages
- Significant number of trials required per treatment to obtain statistical significance
- Unbalanced, values have different numbers of occurrences (bias towards base treatment values)
- Limited number of treatments when changing one factor
Full Factorial Design
Considers all possible combinations of factor values
Num Treatments = Num Factors ^ Num Options per Factor
Interaction Effects
The impact of a factor is dependant on a second factor e.g.
- Best page content depends on best page layout
Main Effects
Individual factors calculated by average outcome over all other varied factors
Full Factorial Design Advantages
- Number of trials to achieve statistical significance is less than OFAT
- Balanced, each values occur the same number of times
- Can capture interation effects
Full Factorial Design Disadvantages
- Requires significant number of recipes
- Number of treatments increases exponentially with factors
- Number of trials increases as treatments increaes
- Impractical to test all combinations
Fractional Factorial Design
Considers a subset of all possible combinations of factor values to reduce the number of treatments required
Latin Squares
Control two sources of variation simultaneously in a practical way.
- They do not eliminate all confounding of variables but can eliminate some bias from interaction effects by ensuring each value is tested equally as often with different combinations of other factors
Latin Squares Advantages
- Requires less recipes (m^2) than Full Fractional Design (m^3) with same benefits
- Interaction effects can cancel out
Latin Squares Disadvantages
- Can only measure main effects
- Requires same number of trials as Full Fractional Design
- Does not necessarily produce the best recipe
- Limited to 3 factors
Online Auction
Dynamic pricing based on competition acting as a price discovery mechanism
English Auction
Auctioneer progressively increases the bid price until no further bids are made
Dutch Auction
Price starts high and gradually lowers until a bidder accepts the price at which point the item is sold
Sealed-bid First Price Auction
Each bidder submits a single bid in a sealed envelope and the item is allocated to to the highest bid at the price bid
Sealed-bid Second Price Auction (Vickrey Auction)
Each bidder submits a single bid in a sealed envelope and the item is allocated to the highest bidder at the price of the second-highest bid
Private Value
A bidder’s willingness to pay for an item, independent of what others believe the item is worth
Utility
The satisfaction for bidder i winning the auction is
ui = vi - pi
vi is private value
pi is payment
if vi = pi then the bidder is indifferent to winning or losing as the utility gained from winning is 0 and there is 0 utility on a loss
Dominant Strategy
Provides the best payoff regardless of the strategies of other players
Best Strategies
Maximise payoff while considering strategies of other players
English Auction Dominant Strategy
Bid the smallest amount (the bid increment) and stop when your private value is reached
First-Price Sealed-Bid Auction Best Strategy
SHADE BID
Bidders will speculate about the bids of others and the probability of the highest bid.
Calculate the utility gained given the probabilities of each outcome
There is a 40% chance the highest bid (except you) is £150
- If you bid £151 and win, u = 0.40(£300 - £151) = £59.60
There is a 40% chance the highest bid is £210
- If you bid £211 and win, u= 0.80(£300-£211) = £71.20
There is a 20% chance the highest bid is £240
- If you bid £241 and win u = 1.00(£300-£241) = £59
The best strategy to maximise utility is to bid
There is no dominant strategy
Vickrey Auction Dominant Strategy
Bid your private value vi
- If the highest bid (excluding yours) is $>v_i$ then
- By bidding less than $v_i$, you always lose and gain $0$ utility
- By bidding more than $v_i$, you might win but always gain negative utility (overpaying)
- If the highest bid (excluding yours) is $<v_i$ then
- By bidding less than $v_i$, you might lose
- By bidding more than $v_i$, you always win but gain the same utility as bidding $v_i$
- There is no scenario (bidding more/less than $v_i$) which guarantees $u_i > 0$
Strategic Equivalence of Auctions
English Auctions and Vickrey Auctions are Strategically Equivalent
Dutch Auctions and First-Price Sealed-Bid Auctions are Strategically Equivalent
Vickrey vs English
Vickrey is better
It avoids wasteful effort of counter speculation but still ensures the bidder with the highest private value receives the item (efficiency)
Revenue Equivalence Theorem
All four auction protocols produce the same revenue provided that
- Bidders are rational
- Bidders are risk neutral
- Private value assumption holds
- Bidders are symmetric, with the same beliefs about the probability of bids made by others
Winner’s Curse
the winning bid exceeds the intrinsic value of the item
Bidder Collusion
2 or more bidders from bidding rings to manipulate the auction
Corruption
an auctioneer misuses their position
- They may lie about submitted bids
- Revealing all bids after an auction provides transparency
Sniping
Last minute bids. Occurs when closing time is fixed
- Prevents being outbid by other
- Solved with flexible deadline and sealed-bid auctions
Risk of Low Profit
Solved with a reserve price
Organic Search
Unbiased search where results are based on a ranking algorithm
Sponsored Search
Paid and biased search where ranking is based on auction mechanism
Pay-per-Impression
Advertiser pays per million impressions (when ad is displayed)
Per-per-Click
Advertiser pays when ad is clicked
Pay-per-Transaction
Advertiser pays when an actual purchase is made. Attribution challenges
Slot Allocation
sorting advertisements in ascending order according to bid * quality score
GSP
Generalised Second-Price Auction
- Pay the smallest amount you could have bid and still retain the same position
A1: 10 * 0.6 = 6
A2: 7 * 0.9 = 6.3
A3: 5 * 0.4 = 2
Order: A2, A1, A3
A2 Pays: A1 Bid * A1 Quality / A2 Quality = £6.67
A1 Pays: A3 Bid * A3 Quality / A1 Quality = £3.33
GSP Review
Payment is never more than bid.
Bidders might prefer lowest spots to increase their utility.
Auction does not fully capture preferences, bid is for one slot and not each slot
True Positive
A recommended item is relevant to a user
False Positive
A recommended item is not relevant to a user
True Negative
An irrelevant item is not recommended to a user
False Negative
A relevant item is not recommended to a user
Precision
A measure of exactness
Precision = TP / (TP + FP)
Good Recommendations / All Recommendations
Recall
A measure of completeness
How many recommendations did you recall
Recall = TP / (TP + FN)
Good Recommendations / All Good Items (Even those not recommended)
Accuracy
The accuracy of all classifications i.e. true positives and true negatives
Accuracy = tp + tn / (tp + tn + fp + fn)
Correct Classifications / All Classifications
Crowdsourcing Study Takeaways
- Financial rewards decrease intrinsic motivation
- Positive verbal feedback increases intrinsic motivation
- Performance drops when financial incentive is lowest
- Financial rewards can increase speed of work but no quality as workers always aim for minimum acceptable quality
- Anchoring effect can be harnessed
- Payment method affects factors
Content-based Recommender Systems Advantages
- No community required
Content-based Recommender Systems Disadvantages
- Content-descriptions required
- Cold start issue for new users
- No surprising suggestions
Knowledge-based Recommender Systems Advantages
- Deterministic recommendations
- Guaranteed quality
- No cold-start issue
- Can resemble sales dialogue
Knowledge-based Recommender Systems Disadvantages
- Knowledge engineering effort required
- Fundamentally static
- Cannot react to short-term trends
Collaborative Filtering Recommender Systems Advantages
- No knowledge engineering effort required
- Can produce surprising recommendations
Collaborative Filtering Recommender Systems DIsadvantages
- Requires feedback through ratings
- Cold start issue for new users and items
- Sparsity problems
- No integration of other knowledge sources
Crowdsourcing
obtains services, ideas, or content from a large group of people through an open call rather than from the traditional employee or supplier relationship
Content-based Recommender System Steps
- Identify Features (Title, Genre, Keywords)
- Clean Features (Remove Stop-words, Stemming, Phrase Extraction, TF-IDF)
- Find Similarities
- Produce Reccomendations
Dice’s Similarity Coefficient
Compares number of shared elements to total number of elements in both sets
D(X, Y) = 2 x |X ∩ Y| / (|X| + |Y|)
Fails to consider element frequency
Jaccard’s Similarity Coefficient
J(X, Y) = |X ∩ Y| / |X ∪ Y|
Fails to consider element frequency
Cosine Similarity Coefficient
Measures angle between two vectors