Data Collection Plan Flashcards
Data Collection should follow what model?
ISTAR
I - Intake & Orientation (What is the aim of the investigation, what are you looking for, where are you going to look, how much time do you have? etc)
S - Strategy, Search & Store (come up with a strategy for your searching & how are you going to store and manage your actions & results)
T - Technical Capabilities & Tactical Application (yours & the internet, what apps, software, automation do you have?
A - Analysis (GUIDs) (a continual process - especial when GUIDs are located that may allow attribution to a device or person)
R - Refine Recycle & Reporting (refine and redo seraches as you go bassed on results, then complete a report detailing your findings).
Information Retrieval (Gathering of online info)
Information retrieval comes from information needs.
4 rough types of information needs:
- EVENT information need (about a specific event that has happened e.g a war, accident, natural disaster, financial incident, a crime,)
- THEMATIC information need (based on a specific theme or phenomonom e.g drugs available on the dark web, online gambling, traffiking of stolen goods - are we taking a ‘snapshot’ in time or monitoring continuously for a period of time)
- PERSON information need (about a specific person e.g, a suspect, a misper, witness.. profiling, lifestyle, network, background EVERYTHING )
- ORGANISATION (info about an company / a charity / a group etc such as a criminal organisation, protest groups, football holligans, animal liberation front, neonatzis etc)
Search Methods - SIS v Thorough
SIS - Short Internet Scan. ‘quick & dirty’, no real plan, quick, shallow, google mostly used on one or two key words which will get a result, used by most people. (Men v women in result page)
Thorough Search - with a plan, using differnt methods, based on ISTAR
ISTAR - Intake & orientation
Be a super seracher.
Ask:
- what am I looking for?
- Why?
- Who for?
- What purpose are you going to use the information for
- What info do we already have
- what damage can be done (risk - organisational or personal)?
- How much time do we have?
Remember the 7 golden W’s who, what, why, where, when, what way and with what (how)
ISTAR - S - Strategies, Search & Store. Desribe 5 analytical serach strategies.
Search styles & strategies:
Analytical search strategies
- Building blocks.
Most widely used. Combine serach terms then analyse relevance of results and produce more sets of queries that are more relevant - Pearl growing. Begin with specific set of documents or document that we know is relevant (the pearl) then use characteristics of that document like titles, chapters, quotes, references etc to grow a set of queries.
- Successive fractions. Begin with a large search about a common subject which gives big results. Sucessfully reduce the results by adding specific key words to end up high prescision set of search results
- Interactive scanning. Mostly used when the researcher has not yet got a good understanding of the subject. Begin with a comprehensive set of docs generally related to the problem area. By scanning these documents we note key features of the problem area. Then use these to formulise sucessive queries, quickly scan these results to ID the most important phrases and key words, then use these to create new search queries.
- Berrypicking. Common search strategy. Start search with just one key word on variety of sources. Each search gives new pieces of info, which gives you new ideas and directions to follw and therfore new conceptions of the query (new key words found from the previous one).
ISTAR - S - Search Strategy & Store (cont) Yeild, Precision & Recall.
Not just about quantity of results - how much of them are relevant?
YEILD = quantity of results you get back
PRECISION & RECALL are statistical classifications.
PRECISION is a measure of EXACTNESS (how relevant the results are).
It is calculated by the number of RELEVANT documents retrieved DIVIDED by the TOTAL number of documents retrieved.
A perfect precision score of 1 means all the results that were returned were relevant.
But doesn’t tell us if all relevant documents were retrieved.
RECALL is a measure of COMPLETENESS.
It is calculated by the number of the relevant documents retrieved from a search DIVIDED by the TOTAL number of all existing relevant documents that should have been retrieved.
A perfect recall score of 1 means that all the relevant documents that exist were retrievedd, but it does not tell us how many irrelevant documents were also obntained.
RECALL & PRECISION therefore have a relationship, you can incerasse one at the cost of the other
ISTAR - S - Search Strategy & Store (cont) Yeild, Precision & Recall (Cont)
Our goal is to get as many relevant documents as we can for each serach (fewest irrelevant) AND to get all the relevant documents that exist.
So how do we influence PRECISION to get the best results?
Use serach operators like AND OR QUOTES, especially those where we want the relationship between the 2 - e.g. happy and hour.
Tru to avoid words with double meanings, be as specific as possible.
How do we influence RECALL?
Use variations in spelling, use synonyms, use general terms
ISTAR - S - Search Strategy & Store (cont) Considering WHERE to search
E.g
general search engines on clear web (google, bing etc),
Meta serach engines or combined tools (to serach multiple search engines at once)
Specialised search engines, databases, portals
Social media
Deep web
Multimedia (google earth, other maps, images, youtube etc)
Other sources like Usenet, IRC, P2P Torrent
ISTAR - S - Search Strategy & Store (cont) - STORE
Key considerations.
Internet is dynamic not static.
Need to preserve the dynamic content at that time by downloading the web content accurately to include all data, so it can be accessed offline.
Keep notes, urls, times / dates, screen dumps
ISTAR - T - Technical Capabilities & Tactical Applications
- Automation.
- Tools. think about technical intelligence like IP data, reverse DNS, domian info (use RIPE for European websites), other websites hosted on same web server IP, mail servers etc. Online tools (cehck relaibility and use mulitple sources) or dig command in linux for various DNA related quieries.
- remember risks
- Focus (avoid info overload, try to identify dud info asap)
ISTAR - A - Analyse
- Analyse results as you go
- Locate and use GUIDS to refine searches
- how reliable are your findings?
- Constantly check for new intelligence
- Are you on the right track?
ISTAR - R - REFINE, RECYCLE & REPORT
- Refine your search terms / keyword list constantly.
- Recycle GUIDs if necessary
- Produce your report with all stored data, clarify findings with a glossery / guide on how to interpret