Data Identifying, Gathering, & Importation Process Flashcards
Process of Identifying Data
Step 1: Determine the information you want to collect.
Step 2: Define a plan for collecting Data.
Step 3: Determine your data collection methods.
Step 1: Determine the information you want to collect
By making decision regarding the specific information you need.
And the possible sources for this data.
Step 2: Define a plan for collecting Data
- Stablish a timeframe for collecting the data you need. Some of the data needs a time frame or a real-time track.
- How much data is sufficient for a credible analysis. It can be the volume or a dataset (statistical/limit number).
- Define the dependencies, risks, and mitigation plan.
Step 3: Determine your data collection methods
- How you will collect the data from the data source you identified, being internal systems or social media sites.
- Type of data.
- Timeframe over which you need the data.
Volume of data.
Data Quality
Working with data without considering how it measures against the quality metric can lead to failure.
In order to be reliable, data needs to be:
- Free of errors.
- Accurate.
- Complete.
- Relevant.
- Accessible.
Data Governance
Data Governance policies and procures relate to the usability, integrity, and availability of data
Issues pertaining to data governance include:
- Security.
- Regulation.
- Compliances.
Data Privacy
Loss of trust in the data used for analysis can compromise the process, result in suspect findings, and invite penalties.
Data privacy includes issues such as:
- Confidentiality.
- License for use.
- Compliance to mandated regulations.
Data Sources
- Primary Data.
- Secondary Data.
- Third-party Data.
Primary Data
Primary data refers to information obtained directly from the source, this can be from:
- Data from the organization’s CRM, HR, or workflow application.
- Data you gather directly through surveys, interviews, discussions, observations, and focus groups.
Secondary Data
Secondary data refers to information retrieved from existing sources, like:
- External databases.
- Research articles, publications, training material, internet searches, or financial records available as public data.
- Data collected through externally conducted surveys, interviews, discussions, observations, and focus groups.
Third-party Data
Third-party data refers to data purchased from aggregators who collect data from various sources and combine it into comprehensive datasets for purpose of selling the data.
Sources for Gathering Data
- Databases.
- Web.
- Social media sites and Interactive platforms.
- Sensor data.
- Data exchange.
- Surveys.
- Census.
- Interviews.
- Observation studies.
What to use to Gather and import Data?
- APIs.
- Web Scraping.
- Sensor Data.
- Data Exchange.
APIs
- Used for extracting data from a variety of data sources.
- Used for Data validation. A Data analyst may utilize an API to validate postal addresses and zip codes.
Web Scraping
- Used for downloading specific data from web pages based on defined parameters.
- Used to extract data such as text, contact information, images, videos. Podcasts, and product items from a web property.