Revature interview Flashcards
Introductions.
Hi guys, my name is Jarrod Saechao im from Oregon. A little about me is that I am first generation born american in my family, I hold a associates degree from Mt hood CC, a bachelors in Computer Scinence from Oregon State and am certified by Google in Data Analytics. Ive also have worked in internship with Agbiz Logic as a software engineer where I helped enhance their climate module integrating Google API and create custom visualizations in their Angular/Django Framework.
Recently I’ve joined renature as a Big Data Engineer where I got to work on various projects such as creating a backend of a grocery store and processing 60 million uscensus data points. Within these projects I s utilized tools such Hadoop, Spark, MongoDB, Python, and GitHub in an agile method.
I would be happy to go into more detail if you liked.
Python CLI app
This application focused on building the backend of a grocery store. The tools I utilized were python, MongoDB and Git.
In the project I modeled the data, created the CRUD operation endpoints for multiple roles, added logging mechanisms for database interactions and a CLI to test the functionality.
In terms of coding practices I utilized OOP and followed pep8 coding style. The design pattern was structured as a Data Access Object pattern separating the data access layer from the business logic making it more modularized and scalable.
Uscensus Data
In this project My team and i wrangled oveer 60 million data points from the us census website, then proceeded to clean and analyze it gather insights.
For our tech stack we utlized hadoop, spark, git, python, google drive and powerBI
In this project I helped worked with the 2020 data sets. I transfered the files locally with filezilla then loaded them into hdfs. From there I utilized pyspark to process the data.( i can go into more detailed if youd like) once i wrangled the data it was cleaned by another team where they added in mapping for data points for example id90980980 to white population. They also drop unneccsary columns. Once the data was clean I still needed to to process the data future to extract certain fields for visualization.
. The specific trend I looked at was prediction for 2030 regiojn population and the top racial group growth by state in relation to the states total population.
I created visualizations of my findings in Power BI in the form of column charts and conduct online research of possibilities of trend findings.
Conflicts or issues in projects
Me and a coworker got into an
Describe moment where you failed
Yeah I can, so when I was working on the data wrangling, I read in the pl files into spark, dropped unneeded data, then mapped headers to the dataframes. I proceeded to join the data into a single data frame and repatition the data into one partition( we were gonna join the years together by decade). Where I went wrong was writing the output to my parent directory in overwrite mode. This caused the whole directory to be delete erasing my state data, spark code and env. Thankfully I had my most of my code in github, but needed to redownload, create a new environment.
It was a humbling experience and definity showed me the importance of version control
What makes you passionate about big data
I’m passotianate about data because it provides insights that drives decision. I see data as one of the most importatnt tools in business success.
How do you handle team conflicts
I think open commutations is the key. I try to understand all perspectives , seek common grounds and finiding a solution that works for everybody. it’s important to remain calm and professional, even when disagreements arise.
Recently during a project a coworker and I got into a disagreement of where we should store the us census data. I highlighted the benefit of utilizing a lambda function to to gather the data and storing it into s3 making it availbale for all members and processing it with EMR so we dont have to worry about computation power. I chimed in on other coworkers ideas which was plausible in our project as it was the most cost effective. Overall we went with an isolated approach where teams wrangled data based on decades, then a team cleaned the data into a single format for us to use for data analyis.
Ecommerece
In this project my team and I created a data generator and analyzed data to indf
Questions for interviewer
How is the culture at inforsys.
What are some big data engineering projects that infosys have at the moment?
What tis considered a ideal candidate for this position?
What is some are the hardest projects or tasked you have worked with.
What’s the most satisfying project you have worked with?