WEEK 1 Flashcards
Have you ever wondered why some online ads seem to make really accurate suggestions or how some websites remember your preferences?
Cookies can help inform advertisers about your personal interests and habits based on your online surfing, without personally identifying you.
Ways to collect and generate data
Forms,
questionnaires and
surveys and
Interviews
Cookies
Data collection considerations
Select the right data
how the data will be collected
the data sources
how much data to collect
Solving your business problem
the time frame for data collection.
population
Refers to all possible data values in a certain data set.
A sample
Is a part of a population that is representative of the population.
Data sources
first-party data. This is data collected by an individual or group using their own resources.
second-party data, which is data collected by a group directly from its audience and then sold.
third-party data or data collected from outside sources who did not collect it directly. This data might have come from a number of different sources before you investigated it. It might not be as reliable, but that doesn’t mean it can’t be useful.
Data formats
Qualitative data is usually listed as a name, category, or description. In our spreadsheet, the movie titles and cast members are qualitative data.
Quantitative data, which can be measured or counted and then expressed as a number.
Discrete data first. This is data that’s counted and has a limited number of values.
Continuous data can be measured using a timer, and its value can be shown as a decimal with several places.
Nominal data is a type of qualitative data that’s categorized without a set order. In other words, this data doesn’t have a sequence (Yes or No)
Ordinal data, on the other hand, is a type of qualitative data with a set order or scale.
Internal data, which is data that lives within a company’s own systems
External data is, you guessed it, data that lives and is generated outside of an organization.
Structured data is data that’s organized in a certain format, such as rows and columns.
Unstructured data. This is data that is not organized in any easily identifiable manner. Audio and video files are examples of unstructured data because there’s no clear way to identify or organize their content.
Unstructured data
Audio files, video files, emails, photos, and social media are all examples of unstructured data.
Data Model
is used for organizing data elements and how they relate to one another.
This makes it easy for analysts to enter, query, and analyze the data whenever they need to.
This also helps make data visualization pretty easy because structured data can be applied directly to charts, graphs, heat maps, dashboards and most other visual representations of data.
Data elements
Are pieces of information, such as people’s names, account numbers, and addresses.
An unfair dataset
Does not accurately represent the population, causing skewed outcomes, low accuracy levels, and unreliable analysis.
Data modeling
Is the process of creating diagrams that visually represent how data is organized and structured.
Levels of data modeling
Conceptual: business concepts.
Logical: data entities
Physical: physical tasks
Conceptual data modeling .
Gives a high-level view of the data structure, such as how data interacts across an organization.
For example, a conceptual data model may be used to define the business requirements for a new database. A conceptual data model doesn’t contain technical details
Logical data modeling
Focuses on the technical details of a database such as relationships, attributes, and entities.
For example, a logical data model defines how individual records are uniquely identified in a database. But it doesn’t spell out actual names of database tables. That’s the job of a physical data model.