programming skills Flashcards

1
Q

How do you handle missing or corrupted data in a dataset?

A

You could find missing/corrupted data in a dataset and either drop those rows or columns, or decide to replace them with another value.

In Pandas, there are two very useful methods: isnull() and dropna() that will help you find columns of data with missing or corrupted data and drop those values. If you want to fill the invalid values with a placeholder value (for example, 0), you could use the fillna() method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Do you have experience with Spark or big data tools for machine learning?

A

You’ll want to get familiar with the meaning of big data for different companies and the different tools they’ll want. Spark is the big data tool most in demand now, able to handle immense datasets with speed. Be honest if you don’t have experience with the tools demanded, but also take a look at job descriptions and see what tools pop up: you’ll want to invest in familiarizing yourself with them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Pick an algorithm. Write the pseudo-code for a parallel implementation.

A

This kind of question demonstrates your ability to think in parallelism and how you could handle concurrency in programming implementations dealing with big data. Take a look at pseudocode frameworks such as Peril-L and visualization tools such as Web Sequence Diagrams to help you demonstrate your ability to write code that reflects parallelism.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some differences between a linked list and an array?

A

An array is an ordered collection of objects. A linked list is a series of objects with pointers that direct how to process them sequentially. An array assumes that every element has the same size, unlike the linked list. A linked list can more easily grow organically: an array has to be pre-defined or re-defined for organic growth. Shuffling a linked list involves changing which points direct where—meanwhile, shuffling an array is more complex and takes more memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe a hash table.

A

A hash table is a data structure that produces an associative array. A key is mapped to certain values through the use of a hash function. They are often used for tasks such as database indexing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which data visualization libraries do you use? What are your thoughts on the best data visualization tools?

A

Popular tools include R’s ggplot, Python’s seaborn and matplotlib, and tools such as Plot.ly and Tableau.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Given two strings, A and B, of the same length n, find whether it is possible to cut both strings at a common point such that the first part of A and the second part of B form a palindrome.

A

There are multiple ways to check for palindromes—one way of doing so if you’re using a programming language such as Python is to reverse the string and check to see if it still equals the original string, for example. The thing to look out for here is the category of questions you can expect, which will be akin to software engineering questions that drill down to your knowledge of algorithms and data structures. Make sure that you’re totally comfortable with the language of your choice to express that logic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How are primary and foreign keys related in SQL?

A

Foreign keys allow you to match up and join tables together on the primary key of the corresponding table—but just as useful is to talk through how you would think about setting up SQL tables and querying them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does XML and CSVs compare in terms of size?

A

In practice, XML is much more verbose than CSVs are and takes up a lot more space. CSVs use some separators to categorize and organize data into neat columns. XML uses tags to delineate a tree-like structure for key-value pairs. You’ll often get XML back as a way to semi-structure data from APIs or HTTP responses. In practice, you’ll want to ingest XML data and try to process it into a usable CSV. This sort of question tests your familiarity with data wrangling sometimes messy data formats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the data types supported by JSON?

A

This tests your knowledge of JSON, another popular file format that wraps with JavaScript. There are six basic JSON datatypes you can manipulate: strings, numbers, objects, arrays, booleans, and null values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How would you build a data pipeline?

A

Make sure you’re familiar with the tools to build data pipelines (such as Apache Airflow) and the platforms where you can host models and pipelines (such as Google Cloud or AWS or Azure). Explain the steps required in a functioning data pipeline and talk through your actual experience building and scaling them in production.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly