Data Architecture Flashcards
Define Data Architecture
A framework ๐๏ธ that
defines how data is
collected,๐ฅ
stored, ๐๏ธ
managed, ๐
and accessed ๐
across an organisation
Designed with business need in mind
Name 3 tools that you work with as part of you data systems architecture
๐ข Collibra - metadata on JLPโs data assets and RoP
๐ค Alfabet - records the applications used in JLP and how they are connected - the IT landscape
โ๏ธ Snowflake - cloud based storage and analytics service that handles both structured and unstructured data
๐ฆ Tableau - JLP tool of choice for data visualisation used for reporting
Why did you use regression analysis to investigate your data?
- Pearsonโs Correlation identified a weak + relationship between Lead Time & Stock Holding
- Regression would show me the linear relationship to understand how stock holding varies with lead time
- Forecast stock holding based on lead time
Why did you use time series forecasting to investigate you data?
- Regression Model:
- not accurate;
- other variables;
- weak relationship
- Time Series Forecast to identify patterns in Stock Holding and forecast using Historical data
How did you access data from your organisationโs data system and how did this impact your tool choice
- Collibra/Alfabet Direct Extract
- Google Sheets
- Tableau: Integrated with GS
Why did you use Tableau?
- Integrates w/ **GS **๐๏ธ
- Interactive๐ฎ
- Secure sharing - Tableau Online, accessible from any device ๐ฑ
- Stakeholders familiar with it - JLP** tool of choice** โ
- User-friendly join feature ๐ค
- Future automation: Snowflake or Collibra Application Programming Interace ๐ค
Why did you use GS?
- User friendly interface - formulas โ
- JLP tool of choice โ
- Collaboration ๐ค
- Tableau Integration/File Import ๐
Why did you use Python? (Hypothesis Testing)
For my hypothesis test (stock holding and lead time relationship):
1. built in libraries
2. visualisation like scatter plot & hypothesis test in one tool
3. handles big data sets
4. reuse the same code for different sets of data
Why did you use Python? (Regression)
- Greater flexibility
- Greater control
- Tableau linear regression operates behind scenes
- E.g. Tableau wouldnโt have split the data into test/train