Work Based Project Flashcards

1
Q

What Method did you use to combine different data sources?

A

Full Outer Join: keeps all the information from both tables regardless if they are matching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What other methods of joining are there?

A

Left Inner: keeps all the information from the left tables and bring only the matching value from the right table

Inner: keeps only the matching values from both tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a join?

A

combines two or more tables based on a related column, allowing data to be reviewed and analysed together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does combining data refer to?

A
  • Process of integrating 🔄
  • and merging information 🧩
  • from various sources 📚
  • into a unified dataset 1️⃣
  • for analysis or management purposes📊
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the risks of combining data?

A
  1. Select wrong join type❌ 🔗 could lead to data loss (e.g. data loss)
  2. Security risks: table w/PII & final outcome not having right protection🛡️
  3. Large/complex joins = affect performance
  4. Data Quality - consistency/terminlogy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

List in order all the stages of the lifecycle

A
  1. Plan
  2. Data Prep
  3. Analysis
  4. Modelling
  5. Refine & Compare
  6. Communicate & Implement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Provide an example of what you have done for refine & compare for a model you created

A
  1. Error Metrics (RMSE/MAE) 📉
  2. Changed models and parameters to achieve lowest RMSE/MAE values🧬⚙️
  3. Confidence Levels📏
  4. Project Sponsor Feedback💬
  5. Domain context: knew flat line was not realistic🧭
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Privacy by Design?

A
  • Embeds Privacy Protection 🛏️🛡️
  • as part of the design/implementation 🛠️
  • of Systems, Products, Business Practices📦
  • from the start, not an after thought.
    🏁
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How have you applied privacy by design?

A
  1. BOBI acess:🔑
  2. Only used data necessary for project➖
  3. Google Sheets -** JLP Access **Only 👥
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What DQ risks did you come across? (3 examples)

A
  • Completeness: records present in a dataset e.g. missing records against dates, missing matches between the 2 sources
  • Consistency: format of PK different between sources
  • Accuracy: represents the truth e.g. risk of mistypes and I removed zeros that were not genuine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How did you resolve each DQ risk example?

A
  • Completeness: domain context meant I knew the data was meant to have gaps - weekly usage. Removed the NULLs from the join from the analysis
  • Consistency: during data prep I changed the formatting so they were consistent and I could continue the join
  • Accuracy: I removed the zeros from the analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Provide an example in which you acted logical and analytical

A
  • S- Forecast Code Selection for Streamline
  • T - Enough data📊✅, used infrequently 🌜⏳to provide best opportunity for success
  • A- Analysed codes for sufficient data/similar for consolidation
  • R- Selecting a code that could be consolidated, streamlining codeset and meeting project requirement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What was the conclusion of the analysis? (5 things)

A
  1. 125 - low use - removal or consolidation
  2. 134 - increase - not put forward for removal
  3. Histogram emphasised need for code review
  4. Bar charts- several codes not used at all - put forward for removal
  5. Model was not good for seasonal predictions, need more data, but did help to inform for 125 and 134
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What alternative methods or tools did you suggest for the project to be successful?

A
  • Identified low usage code,🔭
  • Completed time series forecast to understand future pattern📉
  • NLP📝
  • Dashboard📊
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What were your customer requirements and how did you define them? (2 answers)

A
  1. Met with project sponsor to go through their challenges⛰️, strategy🎯 and project objectives🎯
  2. Drafted KPIs and ensured clarity via email📏📧
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How did customer requirements shape the project? (7 answers)

A
  1. Project scope🎯
  2. Sponsor updates🆕
  3. Analysis approach🛤️
  4. Bar chart: Overview📊
  5. Histrogram: Skew📊
  6. Time Series for the future trend📈
  7. Inform decisions on streamlining 🏞️
17
Q

What are the differences between open and public data?

A
  • Public: all data in the public domain
  • But…freedom of information request & is not always machine readable
  • Open data is a subset of public data that is freely accessible to everyone without restrictions to use and share, it’s provided in a common, machine readable format
  • Open example: often government portals provide open data on economy or health or environment in a easy to consume format
  • Public example: Office for National Statistics publishes stats about the UK like census data and certain datasets may come with restrictions about how you can use it especially if there is sensitive info
18
Q

What is administrative data?

A
  • Information collected during
  • registration, 📋
  • transactions 💳
  • or record keeping📀
  • To support organisations with regulatory or operational reasons
  • Can be structured or unstuctured (docs) but often stored electronically
19
Q

What is research data?

A

information collected or generated to validate findings in research

20
Q

What data structures did you take into account?

A
  1. Structured administrative data on inventory transactions that was in a pre-defined format
  2. Structured reference data that was captured in a** google sheet **like label, code, definition of codes - contained qualitative free-text data (plus unstructured emails)
21
Q

What different database system designs did you take into account? (What is a relational database?)

A
  • Structured data🏛️
  • organised into tables🪑
  • with pre-defined relationships between them📚🔗
  • to help efficient storage,🗄️
  • retrieval🏈
  • Management👔
22
Q

How do you adapt your communication depending on the audience & situational requirements? (4 answers)

A
  1. Power interest matrix: prioritise stakeholder communication ⚡️
  2. Knowledge Level: Architect needed explanation🧠
  3. Adjust tone: Formal = less familiar 🎵
  4. Technical: business vs data facing💻
23
Q

Can you provide 2 examples of data classification principles you’ve applied in your project?

A
  1. Usage data:
    * internal only🫁
    * administrative
    📝
    * structured/quantitative🏗️🔢
    * low risk🔽🛡️
  2. Metadata
    * internal only🫁
    * reference data🔤
    * structured/qualitative🏗️🔤
    * low risk🔽🛡️
24
Q

Can you think of a time where you need to be flexible with classification?

A

If the usage data was to be combined with customer data or sensitive financial data, the risk would increase as it would include PII or sensitive information

25
Q

What tools/methods that you used for the project? (2 answers)

A
  1. Python: customised options, exponential smoothing/seasonality parameters, needed simple visualisations for forecast and bar charts (less distracting than sharing a dashboard, and less could “go wrong”)
  2. Time series forecast: project needed predictive analysis to understand future trends, my data was temporal and there was a wide range of inventory codes so time is the most common parameter to base predictions on
26
Q

What is data architecture?

A

Incorporates our data assets, management systems and policies to support business needs

27
Q

Describe how data architecture was applied (3 answers)

A
  1. Data Storage and Infrastructure:🗄️ Transaction data from BSM
    Stored in relational database: mainframe tables
    BOBI used to extract that data
  2. Data Security/Compliance: 🔒approval required to access said data, through service now which provides an audit trail of who has access to what
  3. Data Governance/management -👮‍ reference 📚data is maintained manually to improve consistency and support system integration
28
Q

What are the risks of a relational database management system? (2 answers)

A
  1. Complex queries: performance
  2. Maintenance is complex: limited expertise
29
Q

What are the benefits of RDMS? (3 answers)

A
  1. Enforces referential integrity to maintain consistent relationships between tables 🔗📊
  2. Enables integration between systems like BSM🔗
  3. Handle large amounts of data - growing transactions📊📈💾💥
30
Q

What are some benefits of combining data? (3 answers)

A
  1. Enrich analysis by providing context📖
  2. Support decision making by identifying which codes🧠✅
  3. Facilitates stakeholder convos🗣️🤝💬
31
Q

What is BIG DATA?

A
  • Large
  • Diverse
  • Data sets
  • that grows rapidly in volume
  • and variety
32
Q

Compare off the shelf vs own code

A
  • Off the shelf GOOD = user friendly forecast; minimal expertise; less dev time; tested/supported model
  • Off the shelf BAD = limited customisation; costly creator license
  • Own Code GOOD = Customisable; large data sets; resuability
  • Own Code BAD = technical expertise; time; maintenance challenges