L2 Where data comes from Flashcards
what does one zettabyte = ….
a trillion gigabytes
2024 there was how many volumes of data in zettabytes?
147
where does all of this data now come from?
electronics (phones / wearables / home electronics)
by 2025: what is estimated?
that 80% of global data will be unstructured
what is meant by unstructured data?
it wouldn’t be sorted in fixed known locations - data will be everywhere
what are surveys good at collecting?
large amounts of data
what kind of questions do surveys typically have?
standardised questions
who are surveys typically asked to?
a sample of the population
why are surveys often not asked to more people?
time consuming + expensive
more data = an increase in…
…data scepticism
more data = more space for…
…misreporting and misrepresenting
what in recent years has increased the amount of data produced?
online tools and surveys
more bad data is being produced now because of what?
poor quality bad surveys
what are less people also doing now and what does this lead to?
answering surveys - which causes issues for the data produced
what happens every time we google something on our phones?
data is produced
what percentage of the British people said that they believed television news readers to say the truth?
52%
……. of British people trust journalists to tell the truth
28% (just over 1/4)
Less than …… of British people think that politicians and gov. ministers generally tell the truth
20%
who is widely regarded as individuals who do tell the truth?
scientists and professors
where is the UK ranked in trusting for our media?
very low
what media is trusted the least / the most
least = social media
most = radio
what is administrative data originally used for?
keeping records (by governmental departments and agencies)
administrative data covers…
…entire populations of registered people
what is an example of a record that is administrative data?
health / tax / benefits / car reg / work permits
administrative data is not………..avaliable
publicly
why is administrative data not publicly available?
as much of it is very personal information
what is administrative data often used for when helping with surveys?
as a sampling frame to get a sample
will administrative data be the same as other surveys done?
no
administrative data is of high quality - true or false + why?
true - as the government collected it
what is there when it comes to protecting peoples data?
strict security protocols
how is big data generated?
digitally (through online and transactional data)
big data is in huge………..and high………..
volume
velocity
what are examples of different sources of big data?
clicks
shares
purchases
is big data open access data? and why?
no - as it’s not collected for research purposes and often used commercially
as humans we create d… t…..
data trails
google trends example =
spike in “unemployment” google searches over the pandemic
who was William Petty?
17th century demographer
17th century demographer =
William Petty
what did William Petty do? (3 parts)
- began surveying
- collecting numbers on people living in London
- surveying GDP
what does census in latin mean?
to estimate
is census is essentially a big…
…survey
how many people are in a census? + 2 examples
all people from a population under study
- i.e. all in country / all in education
how often does a census happen in the UK? + why?
every 10 years
- as they are expensive and time consuming
when did a census last not happen in the UK?
during peak of WW2
when was the most recent UK census
2021
when were census’ first conducted?
in BC / early AD
who can census’ now be answered?
one household member for the whole house
primary data =
data collected directly by researchers for a very specific purpose and used for that purpose
example of creating primary data =
conducting a survey + using results in report for a dissertation
primary data is directly for what?
ones own use
pros of primary data =
- up to date / current
- specific to the question
- researcher has full control (can pick the q’s asked)
cons of primary data =
- time consuming
- sometimes impossible
- can be expensive
what happens if conducting primary data is impossible?
have to rely on and use secondary data instead
secondary data =
data that has been previously by someone else for a different purpose - but is available for others to use
example of using secondary data =
viewing others surveys and data collected + analysing it to answer Q’s for a report / coursework
secondary data is the number 1 source of data in the UK - true or false?
true
pros of secondary data =
- affordable
- easily accessible
- longitudinal studies are possible
what does the pro of longitudinal studies (in 2ndary data) mean?
can compare old data to more recent
- see how it has developed and changed over time
cons of secondary data =
- can be outdated
- not specific to your question
- can be time consuming to begin with
why might secondary data sometimes be time consuming?
having to find specific data that relates to what you want
how to tell good from bad sources of data 1 =
sources - who produced the data?
how to tell good from bad sources of data 2 =
purpose - why was it produced?
how to tell good from bad sources of data 3 =
time - when was it produced?
we can trust what data sources?
- public institutions (ONS)
- respected research companies
we should be sceptical of what data sources?
- unknown institutions
- sources with dubious reputations
what data sources can we NEVER trust?
- gov. sources known to use ‘fake news’ to influence opinions
- info published by satirical newspapers
lots of data has………… ………
underlying motives
example of satirical news =
2012: Onion new paper said rural whites prefer the president of Iran to Obama - which some believed
Trump running twitter polls =
2016: where he had underlying motives + leading questions = example of push polling
are push polls real surveys?
no
what is push polling aiming to do?
sway and influence voters
when is push polling most often used?
during political campaigns
surveys should never promote what ideas nor try to…
propaganda ideas / change the mind of respondents
leading questions =
leads respondents to answer in a certain way (unbalanced)
loaded questions =
forces respondents to answer in a way they might not agree (loaded with assumption)
it is important to always consider …….. data was collected
…when…
a good first critical question when looking at data is…
…where did the data come from?