WBP Questions Flashcards

Question 1

Q

Can you outline the key steps of the data analytics lifecycle? Explain how you applied these principles in your project?

Answer

A

(K3 & S2) The Data analytics lifecycle consists of key stages which are Plan. Prepare. Analyse. Model. Refine. Communicate. I applied these stages in the following ways:
Plan: I started by defining the project scope/goals with the DCEO and identifying stakeholders. A Power-Interest Matrix was created and helped guide stakeholder management ensuring effective comms throughout. The relevant data was gathered and domain context sought from the DCEO.
Prepare: I ensured consistent data structure for the subsequent join and so converted semi-structure data into structured data (family look up). I then thoroughly cleaned the data and effectively addressed data quality issues such as outlier detection and escalation.
Analyse: I undertook Exploratory Data Analysis using Python to join the data on an inner join using org/person ID as the primary key. I aggregated the data on both month and month number to identify patterns and trends in the data.
Modelling: I split the data into 80% training and 20% test data and then built three time series models (Naïve, Holt linear and HWES) to consider which was most effective at forecasting future staffing time.
Refine: I considered the evaluation metric (RMSE and R2) of the three models and realised that while HWES had the best metrics these were still not that good and so reconsidered the seasonal decomposition seasonal periods of 6, 12 and 18 cycles to see which yielded the least residuals. I then re-ran the data on the full data set using the 18 seasonality which significant reduced the average error to 12 and increased the amount of variance accounted for my the model to 31%.
Comm: I tailored communication throughout the project based on the PIM and ensured the primary stakeholders understood the findings and recommendations.

Question 2

Q

What was an example of a pivotal stage of the data analytics lifecycle that if not undertaken would have impacted the analysis significantly?

Answer

A

(K3 & S2) A pivotal stage was the refining stage where I considered alternative model parameters for the HWES model. This directly improved the forecasting accuracy and the overall usefulness of the model. Had this stage been skipped, predictions would have been far less precise impacting the conclusions generated from this analysis.

Question 3

Q

Can you discuss if in your work-based project you identified a quality risk? If so, how did you go about mitigating, escalating and resolving the issue?

Answer

A

(K8 & S6)
1. Identification: I employed rigorous data cleaning which was guided by the principles of accuracy, consistency, completeness, timeliness and uniqueness.
2. Investigation - A prime example of the accuracy exploration was that I used box plots and histograms to consider outliers and noticed particularly large outliers for single contact durations - the largest being 240 hours for a single client contact.
I considered the context and reviewed some of the examples on the CRM and resolved that some resettlement work (namely arrival house set up) could feasibly take a full work day. So I set the maximum threshold to 8 hours and anything exceeding this was adjusted to the medium value to prevent the analysis being skewed.
3. Escalation - These pre-adjusted outliers were documented and saved in an excel file and shared with the DCEO. She facilitated discussion with the resettlement team to address these inaccuracies directly in the CRM to prevent future impact on analysis. This dual approach ensured relevant values were present while tacking the root cause of the issue with the team.

Question 4

Q

How did you outline and apply the principles for defining customer requirements in your work based project? How did you incorporate these findings into your data analytics planning and output process?

Answer

A

(K9 & S7)
1. Defining Customer Requirements - I met initially with the DCEO to establish the project scope and outcomes and their associated KPIs and prioritisation. This enabled me to identify stakeholders and map them onto a power-interest matrix.
2. Stakeholder engagement: The PIM allowed me to group stakeholders into certain categories which guided my engagement with them.
Manage closely - DCEO - we had regular catch ups to discuss updates and interim findings as she did not like email comms
Keep informed - Team Leads - were provided with infrequent detailed interim and final findings after engagement with the DCEO. Their expert insights and feedback helped guide findings and illuminate 18-month trend cycle.
Keep Satisfied - CEO - needed only to know about the existence of the project and to be given executive summary overviews.
Monitor - included the resettlement clients and LA funders who could either benefit or be involved later in the process and so needed to be kept in mind but not actively contacted.
3. Project outcomes aligned to stakeholder requirements
A. Staffing Decisions - the forecast of the staff hours provided the DCEO with an idea of what the next quarter might hold for staff levels and she could use this to plan accordingly.
B Service Management - The trend/seasonality and 3-year client average provided the team leads with useful insights into client support needs which can help them plan staffing resources more effectively. For example knowing there maybe a 28 week bump in support need as clients move out of the support period is important to account for and goes against initial beliefs.
C Funding bid - the 3-year client journey model was saved on an excel file and can be used to inform resettlement funding bids as to how much time per year needs to be requested and to justify these proposed support hours.

Question 5

Q

Could you describe the tools and methodologies your organisation typically uses for data analysis and can you specify which ones you implemented within your project? What was your rationale for choosing these particular tools to achieve the best outcome?

Answer

A

(K11 & S15)
1. Tools used in Project:
- Python - was used due to its access to libraries such as Pandas for the join, matplotlib for the visualisations and sklearn etc for the timeseries forecasting. This allowed be to easily combine the data, visualise it and produce a model for the forecast.
- Excel - was used to store the 3-year client journey due to its ease of reintegration into python if needed and the familiarity of my stakeholders should they need to use it in the interim for further analysis.
2. Advantages/Disadvantages
Off the shelf (excel) - user friendly, accessible and known by my stakeholders BUT lacks advanced analytical features and has scalability issues for my large raw data file.
Coding (python) - offers robust analytical capabilities, can be easily customisable for my projects needs and is essential for the complex analysis I undertook.
BUT it requires high degrees of technical knowledge which can be a barrier to entry for some users to implement its usage.

Question 6

Q

Can you explain the principles of data, including open, public, administrative, and research data, and how they relate to the data used within the project? How did you ensure compliance with these principles when sourcing and handling the data for your project?

Answer

A

(K4)
1. Define Data Principles:
Open: subset of public, freely accessible to anyone without restrictions on its usage
Public: produced by gov/NGOs but often has restrictions placed on how to access and use it.
Administrative: generated through daily business operations and is typically for internal use only and can be high sensitivity data.
Research: Collected for research purposes often with heavy restrictions due to sensitivity of data (survey responses).

Data in my project: uses administrative data that is produced from staff daily interactions with client. It is not research as not produced for the purposes of research nor is it open or public data as it is for internal use only.
Impact on data analysis: Due to its internal and highly sensitive nature it requires strict adherence to data privacy and protection legislation, policy and ethical consideration. such as through secure data storage with access controls and password protection and anonymisation to ensure confidentiality.

Question 7

Q

Can you explain how you applied the principles of data classification in your work based project? How did you reason through the classification process and make informed decisions about how to categorize and label the data? Were there any instances where you applied flexibilities in the application of data classification, and what was the purpose or rationale behind those flexibilities?

Answer

A

(S3)
1. Principles of Data Classification
High: if unauthorised access catastrophic impact on org/client (financial records)
Medium: if unauthorised access, not catastrophic impact on org/client (financial records) but should be for internal use (email without confidential data)
Low: intended for public consumption (website content).

Classification of my data: my data is high sensitivity, administrative, qualitative data.
The raw unprocessed file has identifiable family names and sensitive personal information (i.e. immigration status). If this were accessed by unauthorised personnel it would be a major data breach and significantly impact our client’s confidentiality especially as it could put them into physical harm due to the current negative sentiment towards refugees seen by the summer riots.
Data security measures:
- Raw files immediately saved in access-controlled SharePoint folders
- Raw files only shared through SharePoint rather than email attachments
- Raw file was password protected to add additional encryption feature
- Identifiable data (names, addresses, phone numbers) columns was immediately removed and only org/person ID that would only identify if coupled with CRM access.
- Data was later aggregated by total amount per month and per month number respectively which further reduced its sensitivity as it was no longer referring to a specific individual interaction.
Re-classification:
- when saved “average client journey” could be re-classified as medium risk as it no longer had identifiable data and was only of aggregates by month number and family size category.
- This allows the data to be used for its intended purpose which is to support future resettlement bids.

Question 8

Q

What is your organization’s data architecture? How did it impact your work based project?