Module Two - Languages of Data Science Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Name some key features of Caffe framework

A

High Performance: It can process over 60 million images per day on a single GPU.

Focus on Convolutional Neural Networks (CNNs) which are commonly used in image-related tasks like classification, detection, and segmentation.

Layer-Based Architecture: Models in Caffe are built as a series of layers, with each layer representing a specific operation (e.g., convolution, pooling, activation).

Cross-Platform Compatibility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Python libraries: Pandas

A

Pandas is an open-source Python library that provides data structures and data analysis tools, primarily for manipulating and analyzing structured data in the form of Data Frames and Series.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

TensorFlow

A

TensorFlow is an open-source machine learning framework developed by Google that facilitates the creation, training, and deployment of deep learning models through a flexible and comprehensive ecosystem of tools and libraries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is SQL an American National Standards Institute (or AN-see) standard?

A

SQL is an American National Standards Institute (or AN-see) standard, which means if you learn SQL and use it with one database, you can apply your SQL knowledge to many other databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the WEKA software suite intended for?

A

WEKA (Waikato Environment for Knowledge Analysis) is a popular open-source software suite for data mining and machine learning. Developed at the University of Waikato in New Zealand, WEKA provides a collection of algorithms and tools for various data analysis tasks, making it a widely used tool in both academic and industrial settings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Python libraries: SciPy

A

SciPy is an open-source Python library used for scientific and technical computing, offering a wide range of functionalities for optimization, integration, interpolation, eigenvalue problems, and other advanced mathematical operations built on the NumPy library.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Python libraries: NumPy

A

NumPy is a powerful open-source Python library for numerical computing that provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Python libraries: PyTorch

A

PyTorch is an open-source machine learning library for Python that provides a flexible framework for building and training deep learning models using dynamic computation graphs and tensor operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Can Python be used for NLP?

A

Yes. Python can also be used for Natural Language Processing (NLP) using the Natural Language Toolkit (NLTK).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the main characteristics of the Julia language?

A

Julia is a compiled language designed in MIT for high-performance numerical analysis and computational science.

Julia provides speedy development like Python or R, while producing programs that run as fast as C or Fortran programs.

It’s compiled which means that Julia code is executed directly on the processor as executable code.

It calls C, Go, Java, MATLAB, R, Fortran, and Python libraries, and has refined parallelism.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is JuliaDB?

A

Developed with Julia, JuliaDB is a Data Science package for working with large persistent data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Who are the typical users of R Lang?

A

Statisticians, mathematicians, and data miners use R to develop statistical software, graphing, and data analysis.

R Language’s array-oriented syntax makes it easier to translate from math to code for learners with no or minimal programming background.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Name some frequently used Python libraries for Data Science

A

For data science, you can use Python’s scientific computing libraries like Pandas, NumPy, SciPy, and Matplotlib.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is SQL different from other software development languages?

A

SQL is different from other software development languages because it is a non-procedural language.
SQL stands for Structured Query Language.
It was designed for managing data in relational databases.
SQL is an ANSI standardized language.
If you learn SQL and use it with one database, you can apply your SQL knowledge with many other databases easily.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

As of 2024, which other deep learning framework have superseded Caffe?

A

TensorFlow and PyTorch have largely superseded it for broader machine learning applications.

While Caffe is great for CNNs and computer vision, it lacks the flexibility and ease of use for other types of deep learning models like RNNs, which are more easily handled by frameworks like TensorFlow and PyTorch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Caffe framework and what is intended for?

A

Caffe (short for Convolutional Architecture for Fast Feature Embedding) is an open-source deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC). It is primarily designed for speed and efficiency in training and deploying deep learning models, especially in computer vision tasks like image classification and object detection.

17
Q

What are some typical applications for Caffe?

A

Image Classification: Training models to classify images into categories.

Object Detection: Identifying and localizing objects within images.

Segmentation: Dividing an image into segments or regions of interest.

Feature Extraction: Using pre-trained networks to extract meaningful features from images for downstream tasks.

18
Q

Difference between Open-Source and Free Software Foundation

A

The Open-Source Initiative (OSI) champions open source, while the Free Software Foundation (FSF) defines free software. Open source is more business focused, while free software is more focused on a set of values.

19
Q

Python libraries: Matplotlib

A

Matplotlib is a widely used open-source plotting library for Python that provides a flexible interface for creating static, animated, and interactive visualizations in a variety of formats and styles.

20
Q

Scala strengths for Data Science

A

Scala offers strong support for functional programming, seamless integration with big data frameworks like Apache Spark, and concise syntax, making it a powerful choice for building scalable and efficient data science applications.
It is also inter-operable with Java as it runs on the JVM.

21
Q

Advantages of using Java for Data Science

A

Java provides strong performance, portability, and a rich ecosystem of libraries and frameworks, making it suitable for building scalable, robust data science applications, particularly in enterprise environments.

22
Q

Examples of notable data science tools built with Java

A

Weka for data mining, Java-ML for machine learning, Apache
MLlib makes machine learning scalable, and Deeplearning4 for deep learning.

Hadoop is another application of Java which manages data processing and storage for big data applications running in clustered systems.

23
Q

R programming language

A

R is a programming language and software environment primarily used for statistical computing and data analysis, featuring a wide array of packages for data visualization, statistical modeling, and data manipulation.

24
Q

Python programming language

A

Python is a high-level, interpreted programming language known for its readability, versatility, and extensive libraries, making it popular for web development, data analysis, artificial intelligence, scientific computing, and automation.