ML:Programming Flashcards

1
Q

Q26- How do you handle missing or corrupted data in a dataset?

A

You could find missing/corrupted data in a dataset and either drop those rows or columns, or decide to replace them with another value.

In Pandas, there are two very useful methods: isnull() and dropna() that will help you find columns of data with missing or corrupted data and drop those values. If you want to fill the invalid values with a placeholder value (for example, 0), you could use the fillna() method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Q27- Do you have experience with Spark or big data tools for machine learning?

A

You’ll want to get familiar with the meaning of big data for different companies and the different tools they’ll want. Spark is the big data tool most in demand now, able to handle immense datasets with speed. Be honest if you don’t have experience with the tools demanded, but also take a look at job descriptions and see what tools pop up: you’ll want to invest in familiarizing yourself with them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Q28- Pick an algorithm. Write the psuedo-code for a parallel implementation.

A

This kind of question demonstrates your ability to think in parallelism and how you could handle concurrency in programming implementations dealing with big data. Take a look at pseudocode frameworks such as Peril-L and visualization tools such as Web Sequence Diagrams to help you demonstrate your ability to write code that reflects parallelism.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Q29- What are some differences between a linked list and an array?

A

An array is an ordered collection of objects. A linked list is a series of objects with pointers that direct how to process them sequentially. An array assumes that every element has the same size, unlike the linked list. A linked list can more easily grow organically: an array has to be pre-defined or re-defined for organic growth. Shuffling a linked list involves changing which points direct where — meanwhile, shuffling an array is more complex and takes more memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Q30- Describe a hash table.

A

A hash table is a data structure that produces an associative array. A key is mapped to certain values through the use of a hash function. They are often used for tasks such as database indexing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Q31- Which data visualization libraries do you use? What are your thoughts on the best data visualization tools?

A

What’s important here is to define your views on how to properly visualize data and your personal preferences when it comes to tools. Popular tools include R’s ggplot, Python’s seaborn and matplotlib, and tools such as Plot.ly and Tableau.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly