What is Data Science? Flashcards
IBM Data Science Professional Certificate (Course 1/10)
What is data science?
The translation of data into a story, and then using these stories to generate insights. It is with these insights are you then able to develop strategies for companies, for example.
How does digital transformation affect business operations?
It affects them by updating existing processes and operations and creating new ones to harness the benefits of new technologies (e.g. harnessing the benefits of Big Data).
optical tracking
An example of how Big Data can trigger a digital transformation, not just within an organisation, but within an entire industry
Manchester City has embraced the use of Big Data to improve their game.
They have a team of data analysts who use millions of stats about players’ performance and the upcoming opposition to help the club’s chances of winning.
One of the tools they use is optical tracking, which can be used to pinpoint the position of players on the pitch 25 times a second, in relation to the ball, opposition, and teammates. This data, along with other ball-related data such as passes, shots, and turnovers, is analysed to gain insights into the team’s performance.
These insights can then be used to inform the team’s strategy in future games. For example, they might adjust their formation, change their passing strategy, or alter player positions based on the data.
It’s a great example of how Big Data can transform not just a single team, but the entire sport of football.
What is cloud computing?
The delivery of on-demand computing resources such as:
* Networks
* Servers
* Storage
* Applications
* Services
* Data centres
over the Internet on a pay-for-use basis.
What are some of the benefits of cloud computing?
- Users do not need to purchase and install the software on their local systems, they can just use the online version of the software and pay a monthly subscription.
- This makes everything more cost-effective as well as ensuring you always have access to the most up-to-date version of the software. Think of Microsoft 365, for example.
- Other benefits include saving the user some local storage space as well as encouraging collaboration among colleagues/project teams as the software would be hosted online.
What is cloud computing composed of?
- 5 characteristics
- 3 service models
- 3 deployment models
Only Brave Rabbits Run Marathons
What are the five characteristics of cloud computing?
- On-demand self-service
- this means getting access to cloud resources such as power, storage and network without requiring human interaction with each service provider
- Broad network access
- this means that cloud computing resources can be via the network through standard mechanisms and platforms such as mobile phones, tablets, laptops, and workstations.
- Resource pooling
- this is what gives cloud providers economies of scale, which they pass on to their users, making cloud cost-efficient
- using a multi-tenant model, computing resources are pooled to serve multiple customers, and cloud resources are dynamically assigned and reassigned according to demand without customers needing to know the physical location of these resources
- Rapid elasticity
- this implies that you can access more resources when you need them and scale things back when you don’t, because resources are elastically provisioned and released
- Measured service
- this implies that you only pay for what you use as you go; if you’re not using those resources, you’re not paying
What is cloud computing really about?
It is about using technology “as a service”, leveraging remote systems on-demand over the open Internet, scaling up and scaling back, and only paying for what you use.
What do cloud deployment models indicate?
They indicate where the infrastructure resides, who owns and manages it, and how cloud resources and services are made available to users.
What are the three types of cloud deployment models?
- Public
- this is when you leverage cloud services over the open internet on hardware owned by the cloud provider, but its usage is shared by other companies
- Private
- this means that the cloud infrastructure is provisioned for exclusive use by a single organisation
- it could run on-premises or it could be owned, managed, and operated by a service provider
- Hybrid
- this is when you use a mix of both the public and private deployment models.
What are the three cloud service models based on?
The three layers in a computing stack: infrastructure, platform, and application.
What are the three cloud service models?
- Infrastructure as a Service (IaaS)
- In this model, you can access the infrastructure and physical computing resources such as servers, networking, storage, and data centre space without the need to manage or operate them
- Platform as a Service (PaaS)
- you can access the platform that comprises the hardware and software tools that are usually needed to develop and deploy applications to users over the Internet.
- Software as a Service (SaaS)
- this is a software licensing and delivery model in which software and applications are centrally hosted and licensed on a subscription basis. It is sometimes referred to as “on-demand software.”
Why is the cloud such a positive for data science?
It allows a data scientist to bypass the physical limitations of their computer and the system they’re using.
What is Big Data?
Big Data refers to the dynamic, large and disparate volumes of data being created by people, tools, and machines.
What does Big Data need in order to be effective?
It requires new, innovative, and scalable technology to collect, host, and analytically process the vast amount of data gathered.
What does Big Data aim to do?
It aims to derive real-time business insights that relate to consumers, risk, profit, performance, productivity management, and enhanced shareholder value.
What are the V’s of Big Data?
- Velocity
- This is the speed at which data is accumulated
- Volume
- This the the scale of the data or the increase in the amount of data stored
- Variety
- This is the diversity of the data
- Veracity
- This is the quality and origin of data and its conformity to facts and accuracy
- Value
- This refers to our need and ability to turn data into value
What are the drivers of Big Data Volume?
- The increase in data sources
- Higher resolution sensors
- Scalable infrastructure
What is the difference between structured and unstructured data?
- Structured data fits neatly into rows and columns in relational databases.
- For example, employee details at a company.
- These employee details would include things like job, employee number, age etc. which would be criteria that everyone at the company would have, with all of it being the same data type.
- Unstructured data is data that is not organised in a predefined way.
- For example, this could be tweets, blog posts and videos. - Structured data fits neatly into rows and columns in relational databases.
What does variety reflect?
That data comes from different sources.
What are the drivers of variety?
- Mobile technologies
- Social media
- Wearable technologies
- Geo technologies
- Video
- Many more
CCIA
What are the attributes of veracity?
- Consistency
- Completeness
- Integrity
- Ambiguity
CN
What are the drivers of veracity?
- Cost
- Need for traceability
What is the main reason people take time to understand Big Data?
In order to derive value from it.