general questions Flashcards
What are the main differences between SQL and NoSQL databases?
SQL databases are relational, have predefined schemas, and use structured query language. NoSQL databases are non-relational, have dynamic schemas, and scale horizontally more easily.
Name three popular NoSQL databases.
MongoDB, Cassandra, and Redis.
What is PostgreSQL known for?
PostgreSQL is known for its reliability, feature robustness, and support for both relational and non-relational queries.
What are distributed SQL systems? Give two examples.
Distributed SQL systems are databases that appear as a single node but scale out horizontally. Examples include YugabyteDB and CockroachDB.
What is Apache Kafka used for?
Apache Kafka is a distributed event streaming platform used for high-performance data pipelines, streaming analytics, and data integration.
What is Apache Spark?
Apache Spark is an open-source unified analytics engine for large-scale data processing, capable of handling batch and stream processing.
Name two major cloud infrastructure providers.
Amazon Web Services (AWS) and Google Cloud Platform (GCP).
What are key considerations when designing high-performance, low-latency data systems?
Data locality, caching strategies, optimized data structures, efficient algorithms, and minimizing network calls.
What is a microservices architecture?
A microservices architecture is an architectural style that structures an application as a collection of loosely coupled, independently deployable services.
How would you approach managing distributed teams across London and Lisbon offices?
Establish clear communication channels, use collaboration tools, set regular check-ins, foster a unified team culture, and be mindful of time zone differences.
How can you effectively collaborate with data scientists, software engineers, and risk management teams?
Regular cross-functional meetings, clear documentation, shared goals, and fostering mutual understanding of each team’s needs and constraints.
Name three techniques for monitoring system performance.
Use of monitoring tools (e.g., Prometheus, Grafana), log analysis, and performance profiling.
What is GDPR and why is it important in data management?
GDPR (General Data Protection Regulation) is an EU law on data protection and privacy. It’s important because it sets guidelines for collecting and processing personal information from EU citizens.
How do you stay updated with new technologies in data engineering?
Following tech blogs, attending conferences, participating in online communities, and experimenting with new tools in personal projects.
What is ComplyAdvantage’s primary mission?
ComplyAdvantage aims to neutralize the risk of money laundering, terrorist financing, corruption, and other financial crimes.