General / Behavioural + Apple Flashcards
sd
Tell Me More About Yourself
Hi I’m Ze Xuan, a Y2 Computer Science student studying at NUS
I’m someone with experience in a variety of roles, including Backend Engineering & Machine Learning
I most recently interned as a backend engineer at Bytedance last summer and worked on designing and maintaining a highly available and performant system to ensure the quality and stability of TikTok’s core functions.
I previously also spent about 2 years as a machine learning engineer at the Singapore armed forces where I was involved in multiple machine learning projects
I also have a passion for competition and learning. Over the past few years I have achieved a few first place finishes at international ML competitions and I’ve also had some high rank finishes on Kaggle Competitions.
I would say I’m someone who’s always willing to learn and expose myself to new things (be it tech related or not)
What are you Weaknesses
- Lack of patience / Get Frustrated When I’m blocked
- During my time in the Singapore Armed Forces, we collaborated with several external vendors.
- On several occasions, delays from their end would hinder our progress.
- Initially, I found myself getting frustrated and occasionally commenting on their efficiency.
- In retrospect, I realize that while these situations were frustrating, I could have managed my time and reactions better.
- Instead of dwelling on the delay, I could have redirected my efforts towards other productive tasks.
- However I’ve made conscious efforts to address this
- During my Internship at Bytedance I faced similar challenges where I would work with teams in China and sometimes my work would get blocked too
- And so I started by mocking responses to write out as much code as I possibly could
- And When I’m still blocked I would try to look for productive ways to utilize my time.
- So I worked on bettering the onboarding docs for the new interns because the old document was extremely outdated and I found it challenging myself when onboarding.
- This experience highlighted the importance of patience and adaptability
- While I’m still working on it I think I’ve come a long way in turning my impatience into a catalyst for proactive and constructive action
What are your Strengths
I would say my most defining strength would be curiosity and the constant pursuit of learning / The hunger to learn.
- When I’m working on Machine Learning competitions or collaborating with colleagues, I constantly find myself driven to understand the deeper “why” behind certain concepts, strategies and decisions.
- In environments like Kaggle (Machine Learning Competitions), this curiosity pushes me to delve deeper into the broader context of problems, be it medicine, law or biodiversity. This in-depth exploration has often led to unexpected insights and strategies that many competitors may overlook.
- This trait was also evident during my internship at Bytedance where I regularly engaged with my colleagues to gain a deeper understanding of our technological choices, such as our preference for Kitex over gRPC.
- Ultimately I think my curiosity ensures that I am never simply content with surface-level knowledge and always strive to understand problems in detail and to learn more about it.
Where do you see yourself in 5 Years
- In the short term, especially during the remainder of my studies at NUS, I aim to further refine my skills in backend development and also explore a little bit more about parallel computing, distributed systems and the intersections between machine learning and backend systems.
- I hope to build up a good foundation to soon become an effective contributor in larger-scale projects and have an opportunity to work on more interesting problems.
- Five years from now, I see myself as an experienced software/backend engineer, hopefully having been part of and perhaps even spearheaded several complex projects.
- My passion lies at the crossroads of machine learning and software engineering, particularly in enhancing system performances.
- But more than that, I’m deeply interested in parallel computing and distributed systems.
- As data continues to grow exponentially, being able to utilise these data effectively and building systems around it will become paramount, and I aspire to be at the forefront of designing and implementing such solutions.
- In this ever-evolving tech landscape, I think I would be committed to continuous learning.
- In order to stay abreast of emerging technologies, I think it is important to constantly try them out and attend workshops or sharing sessions to refine my technical skills
======== SKIP ========
Considering [Company’s] commitment to innovation and breaking new grounds in tech, I believe my vision aligns perfectly with where the company is headed. The prospect of evolving alongside [Company] and making impactful contributions excites me immensely.”
What are you passionate about
1) Backend Engineering & Distributed Systems:
- I’m fascinated by the robustness and efficiency that backend systems can offer.
- The field of parallel and distributed systems particularly intrigues me.
- The idea of breaking down complex tasks and processing them simultaneously across a distributed environment throws in unique challenges that would be interesting to tackle.
- Having an opportunity to address these challenges, such as consistency, fault tolerance, and scalability, really excites me.
2) Intersection of AI/ML:
- I think I’m also someone interested in AI and Machine Learning
- The world of Machine Learning and AI has an element of the unknown.
- Unlike traditional algorithms with clear logic, ML models have this element of unpredictability.
- Every model built is a journey of exploring this ‘unknown,’ which makes it a thrilling experience.
- This also means alot of problems AI often requires creative, out-of-the-box solutions, and this constant innovation is what keeps me engaged in this field.
3) Problem-solving at its Core:
- But I think beyond specific domains, what really drives me is the allure of difficult and interesting problems.
- It’s not so much about the particular area I’m working in, but more about the complexity of the challenge in front of me.
- The harder and more interesting the problem, the more fulfilling it would feel to devise a solution for it.
Why did you do computer science?
- I’m interested in analytical problems
- I’ve actually always thought about being a forensic pathologist but unforunately … (Singapore doesn’t need it :D)
- I really enjoy working on interesting and complex problems and I think programming / CS provides an avenue to achieve that
- I was previously actually in applied math but I had the opportunity to interview at a Computer Vision Unit in SAF and thats how I found out about programming.
- I realised i really enjoy the problem solving aspect of CS and so I decided to change course to pursue CS instead
- So in some sense it was a little bit of luck, because I likely would not be here if not for that opportunity in NS
Led Team to achieve specific outcome
Situation:
- At the Singapore Armed Forces, I was assigned to lead a team of 2 juniors to work on an Anomaly Detection Project.
Task:
- Our primary objective was to develop an anomaly detection system that would be accurate enough remove the current need for manual monitoring and move to an automated system instead.
Action:
- I tried spending some time on the problem initially and was able to make a simple K means model work
- And so I subsequently delegated tasks based on the strengths and interests of each team member.
- One of them worked on optimising the K Means model that I had quickly drafted up for our use case and the other worked on trying to find other ways to improve our system’s overall detection accuracy
- I then worked on addressing the issue of mean shifts / distribution shifts in our data
- Whenever any of us was stuck, we would come together to discuss possible ways to approach the problem, for example at one point, the person working on trying to optimise performance of our entire system wasn’t too sure of how to proceed, so we had brainstorming sessions and eventually decided to settle on ensembling an Isolation Forest to bring up our accuracy
Result:
- As a result of our combined efforts, we were able to successfully develop an anomaly detection system that met the requirements and reduced operational manpower requirements by over 90%.
- I think this project highlighted the importance of adaptability and teamwork in achieving tangible results.
Overcome Significant Obstacle
Situation:
- During my time with SAF, we were working on an Object detection pipieline where we were using a ResNet-based object detection model for an essential operational process.
- We encountered a significant obstacle, because of our hardware limitations we had low FPS and thus struggled with detecting new targets that move across the screen quickly.
- This became especially concerning given we had an upcoming presentation with the commanding officer in about a month where we had to guarantee these new targets could be detected without sacrificing too much on accuracy in general.
Task:
- My main task together with some of my team members was to address the detection problem our model was facing, and we had to do so within a short timeframe.
Action:
- We began by evaluating potential solutions with the team, and one feasible solution was to transition / downgrade from ResNet to MobileNet.
- This was a major decision that took alot of planning and consideration because it involved a complete change of the model architecture which could mean alot of bugs and problems with our pipeline.
- To combat the drop in accuracy that MobileNet introduced, we also had to implement a good feedback loop system for users to annotate some of the missing targets to quickly improve our model’s accuracy and for continuous model training and refinement.
Result:
- Though this was a huge change in our system, we were able to finish it quickly by splitting the task amongst all the team memberes and significantly improve detection of that new target and eventually approached, if not fully matched, the accuracy levels of our previous ResNet model.
- By the time of the presentation, our solution was well-received, and the experience highlighted the importance of adaptability and agile decision-making in our work.
Conflict with Team Member
Situation:
- During my time at SAF, while working on an Object Detection project, my team members and I were discussing the limitations of our ResNet-based object detection model and how we could fix it.
- Essentially our model wasn’t able to detect certain targets that moved across the screen too fast.
Task:
- Our goal was to optimise our model’s performance and ensure that it could detect all targets with a decent accuracy, to prepare for a presentation to an officer in about a months time.
Action:
- One of the team members was adamant about sticking with the current architecture and enhancing its performance in other ways like reducing resoluion, quantization and better processing techniques.
- He argued that we wouldn’t have the time to downgrade the model and improve its performance to a satisfactory point.
- We sat down together as a team and talked through the pros and cons of each solution. Ultimately he was still worried that we would not be able to create a good feedback loop system to raise our accuracy in time
- And so we decided to each work with a few other engineers to quickly built a proof of concept and prototype for our proposals and we eventually came together a few days later to make a final decision with the team.
Result:
- My final prototyped model had accuracies that was not far off from the original model and I had suggested a rough idea of how we could implement our feedback loop system
- After our discussions and the subsequent prototype, my colleague was more convinced about my proposed solution.
- We implemented the changes and was able to improve our detection capabilities without compromising much on accuracy. Most importantly we finished this change in time for the presentation and the presentation went well.
- This experience not only led to a successful project but also highlighted the importance of open communication and collaboration in a team setting.
Conflict with Manager / Officer
Situation:
- During my time at SAF, I worked on a project that aimed to automate the geo-rectification process to address some of our operational bottlenecks.
- The manual process of geo-rectification using ArcGis was tedious, degraded image quality, and wasn’t always that accurate.
- And so we suggested that we could come up with something to automate this process
- My officer suggested to use an ML solution to automate this process, believing that it would be the most robust and innovative solution for our needs.
Task:
- My colleague and I thought that there would be some challenges in implementing a ML solution for this project and we agreed that a non-ML technique using OpenCV would suffice and potentially perform better than an ML solution for this project
- Of course, we had some disagreements with our officer regarding the approach to the problem and we had to find common ground, ensuring the best solution was chosen for our operational needs.
Action:
==> Research & Prototype Development:
- In order to persuade our officer we simply built a prototype on top of QGis with added OpenCV features for faster auto-geo-rectification, particularly relying on the ORB algorithm for keypoint calculation.
- This helped in quantifying the efficiency of our non-ML approach.
==> Addressing Image Quality Concerns:
- We also ran into the challenge of satellite image blurriness and the lack of distinct features for certain types of images.
- We implemented pansharpening to address this.
- However, pansharpening amplified high-frequency noise.
- Our solution to that was to denoise after pansharpening and by combining both methods, we managed to achieve the best result.
==> Presenting Results:
- We were able to get a prototype working, even though it was not optimised
- We presented our findings to our officer and looked at the potential of this approach
- We also highlighted the relative simplicity and lower overhead of our method compared to a full-fledged ML implementation which we also thought would have a huge bottleneck because of how low quality some of our images are
Result:
- We planned the project milestones with our officer, inviting him to review the progress at multiple stages.
- This approach allowed him to see the progress of our proposed solution and he was eventually convinced and simply just left us to take charge of the project ourselves
- We eventually finished the project and had about a 75% increase in speed in the geo-rectification process and also a significant increase in geo-rectification accuracy
Most Technically Challenging Project (SAF)
SAF ANOMALY DETECTION
Situation:
- During my time as a Machine Learning Engineer in SIngapored Armed Forces, I was leading a team to work on a project to improve operational process efficiency
Task:
- It was an Anomaly Detection Project that essentially aimed to automate some manual tasks in our operational process and completing the project would allow our sister/parent unit to remove their 24/7 shift work
Action:
- I initially worked on prototyping a simple K Means Clustering Algorithm to find the anomalous data points
- One problem I had was finding the balance between FN and FP.
- Although we didn’t really want to have a high FN, because if you cannot even detect anomalies then there is no point implementing an automated system,
- We also didn’t really want a high FP as well, because it also defeats the purpose of using an automated detection system since the human will need to check and verify consistently, and at that point might as well have the human do everything instead.
- The way I decided to tackle this problem without sacrificing too much speed for accuracy in detection is to simply ensemble models which uses different means to detect the anomaly,
- I found that ensembling our K means model with an Isolation Forest algorithm worked especially well in our use-case and we simply just mark an event as anomalous if both algorithms deems it as anomalous (Majority Voting).
- Another Big problem we faced is the shift of mean / distribution.
- So when the distribution of normality shifts, we want to make sure we don’t falsely detect a bunch of events / points as anomalous.
- After some experimenting, we decided to deal with this problem using a Variational Auto Encoder (VAE)
- SO A LITTLE BIT OF BACKGROUND KNOWLEDGE:
==> VAEs are essentially a type of auto encoders that instead of producing a deterministic value from its encoder bottleneck, produces a probabilistic distribution instead
==> This probabilistic bottleneck allows VAEs to generate a variety of plausible outputs for a given input which makes them more robust to changes in input distributions
==> Additionally there is a regularization term in VAE’s loss function which encourages the latent space to follow a specific distribution (typically a mutivariate Gaussian)
==> This regularization ensures that the latent representations are well-distributed and not overly concentrated, making it more resilient to shifts in input data distribution
- So after some testing, we saw a pretty significant improvement in the reduction of FP’s when our data went through a mean shift with the use of VAE processing our data than without
- Also our system was still rather lightweight since we only used VAE to transform our input data then feed it into our model ensemble
Result:
- In the end, we were able to finish the project and help reduce manpower requirements for operational processes by over 90%
========== SKIP START =============
* AutoEncoders are essentially a NN that is trained by unsupervised learning to produce reconstructions that are close to its inputs
* Essentially, auto encoder is simply a process that seeks to produce outputs identical to its input and uses unlabelled data for this task (which is essentially very fitting for an Anomaly Detection Problem)
* AE has 2 parts: Encoder and Decoder, Encoder essentially receives data input x and compresses it into a smaller dimension while feeding it forward to the next layer in the encoder. This can be accomplished for h layers which are referred to as hidden layers
* Final Compression of input occurs at the bottleneck of the AE. The input representation is now referred to as z, which is the latent representation of x
* Now the decoder takes the input’s latent representation z and attempts to reconstruct the original input x by expanding it through the same number of hidden layers with identical corresponding neurons as the encoder and ideally the output x’ will be identical as the input x.
* And the AE would learn a compressed (lower dimensional) version of the identify function
* We then use the reconstruction error (the difference between x’ and x) to detect anomalies
* VAEs are different from a standard AE because its bottleneck at the encoder is a probabilistic distribution rather than a deterministic value.
========== SKIP START =============
Talk about one of your Failed Projects / Failure
Situation & Task:
- During my time as a Machine Learning Engineer with the Singapore Armed Forces,
- one of our sister units approached us with a request to develop a predictive maintenance system for their fleet of vehicles.
- The goal was to use machine learning to analyze data from sensors on these vehicles to predict when parts might fail, which can enable proactive maintenance and reducing downtimes especially when they are being used in operational processes.
- Myself and another colleague took up this project
Action:
- Data Gathering:
==> Collected huge amounts of sensor data from a sample of vehicles.
==> This data encompassed everything from engine temperature to tire pressure. - Model Training:
==> Using this data, we trained a model to recognize patterns and predict potential part failures.
==> Initial tests using our sample data were promising. - However, the real-world application presented unexpected challenges:
- Data Variation:
==> While our sample data was consistent, when we scaled up, we discovered that not all vehicles had sensors calibrated in the same manner.
==> Some of the older vehicles had outdated or malfunctioning sensors, which either sent incorrect data or failed to send data altogether. This further skewed our predictive model’s results.
==> This led to variations in the data, which made our model less accurate. - Over-Promising Results:
==> When the project was proposed, I gave assurances based on our preliminary tests.
==> But, I didn’t account for the aforementioned data variations in a larger fleet of vehicles, and our model couldn’t deliver the accuracy rates I had initially promised which was close to 100%.
Result:
==> Despite refining our model multiple times, we could not achieve the accuracy levels that we promised initially (95%+), we only had about ~80% accuracy.
==> The project was eventually shelved, with the unit reverting to their traditional maintenance schedules.
Learnings:
- This experience was humbling, it taught me the significance of thorough testing before scaling up
- It’s especially easy to get excited about early results and over-promise
- I think this experience has also taught me to be always prepared for unexpected challenges or curveballs that may come my way and the importance of adaptability in my approaches
- While this project didn’t pan out as we hoped, I think I walked away with some invaluable lessons
Took Initiative to Solve Problem
Situation:
- During my time at SAF, we took many initiatives to try to solve certain problems our sister unit was facing
- One of which was to deal with challenges faced from the manual process of geo-rectification using ArcGis
- This method was not only time-consuming but also led to decreased image quality and was often riddled with inaccuracies.
Task:
- We recognised this issue and I believed that we could innovate and develop a more efficient process to handle geo-rectification automatically
Action:
- I took the initiative with a colleague to start a project to try to fix this problem.
- We quickly identified OpenCV as a potential tool to aid in our goal.
- In order to validate our theory, we developed a prototype integrating OpenCV with QGis (open source version of ArcGis), utilizing the ORB algorithm for faster auto-geo-rectification. - This step was pivotal, as it allowed us to demonstrate the feasibility and efficiency of our method in comparison to a more complex ML solution.
- With a functional prototype in hand, we showcased our results to our officer and the potential users, emphasizing not just the efficiency but also the simplicity of our approach.
- To ensure that we were going down the right direction and to keep all stakeholders in the loop, we initiated regular review meetings with them, and presented our progress at each stage.
Result:
- Our initiative paid off. The final product increased the speed of the geo-rectification process by around 75%, and we also saw a very notable improvement in accuracy.
- The project’s success also demonstrated the value of taking proactive measures and thinking outside the box to address challenges.
Made a Mistake at Work
Situation: During my internship at Bytedance, while working on optimizing the loading time for our in-house monitoring tool dashboard, I was tasked with enhancing the caching system to make it more efficient
Task: After my initial implementation of the caching system, my mentor suggested going a step further by introducing a locking mechanism to prevent multiple cache fetches. The objective was to improve efficiency by ensuring that resources weren’t wasted on redundant cache fetches
Action: Without fully grasping the distributed nature of our system, I proceeded to implement a local lock system instead using mutexes. It was only during a group discussion with some of the other engineers where we were discussing API endpoint integrations that I realised my solution wouldn’t really work in a distributed environment. I immediately acknowledged the oversight and quickly settled down with my Mentor post meeting to discuss potential solutions and implementation details to make sure that I can still deliver my project on time.
Result: With some guidance, I was able to re-implement the lock system, transitioning from a local to distributed lock mechanism, and was able to do so within the initially stipulated time frame. This experience reinforced the importance of understanding and clarifying the broader context and always asking questions even when some questions you want to ask may seem stupid.
Most Technically Challenging Project (TIKTOK)
BYTEDANCE DISTRIBUTED LOCK SYSTEM
Situation:
- During my internship as a Backend Engineer at Bytedance, I was working on optimizing our in-house monitoring tool’s loading time.
- This tool was important for various SRE teams to monitor and manage TikTok’s performance, but it was a new tool and had a slow loading time.
Task:
- So I was tasked to try to optimise it’s loading time by implementing a caching and distributed lock system
Action:
- I initially tried to use Redlock to build a distributed locking system for the caching system, but later realised that Redlock had some limitations when you start multiple sessions for one User
- So I decided to drop the idea and simply achieved the goal of a distributed locking system by spinning up a redis DB to call setNX and delete operations which are atomic operations to provide a consistent system to keep track of the locks.
- Also I set a Time to live (ttl) through setNX to make sure our system is more fault tolerant.
- Additionally, I used fencing tokens to make sure the lock is safe and that the right machine is releasing the lock. - This also ensures that if there is a delayed connection to our system, we know that it is an old request and we can simply reject it.
==> I initially ended my project here, but when I was blocked on another project later on, I came back to this project to try to make it better.
- I realised that to ensure stronger consistency, we need to use a consensus algorithm to ensure that any changes made is agreed upon agreed by the majority of the nodes before it’s committed, preventing split-brain scenarios where different parts of the system have a different idea of who is holding a lock.
- This was previously accounted for when using redlock because it had a built in consensus algorithm but because I had a simple Redis implementation now, I could consider working on the consensus algorithm now
- I tried to use a raft consensus algorithm which is initialised with an odd number of nodes, and one of them being a leader.
- Everytime a read is proposed, the leader is going to propose this to all the other nodes, as long as majority of the follower nodes can accept that the lock would be held by this session, then the leader is going to go ahead with the process and tell all the other follower nodes about it.
- This allows us to be sure that lock holders are agreed upon and we wouldn’t have different systems thinking different sessions have the same lock which would defeat the purpose of our implementation.
Result:
- Ultimately although there were some small issues with the raft consensus algorithm, my mentor actually said we didn’t really have to optimise it so much because there wouldn’t be many concurrent users and the downside of 2 locks fetching data in the cache at the same time would be very low and non-detrimental anyways