YDB Platform Team

Our mission is to bring to the world a fully open-source solution for data processing proven at web scale. Nowadays, teams building similar products develop closed-source systems or have limited access to top complex problems. In contrast, YDB is an open-source covering a wide range of complex problems, so proven scalability and openness are two of our key benefits.

Product

Distributed SQL database

Our flagship product

Persistent data streams and queues

Pub sub and queue messaging that scales

Federated query engine

Plug in other data storages and execute queries over them

Continuous stream processing engine

Process data streams in real time with SQL queries

An analytical store

Store data in column oriented format for speed up analytical queries

Vectorized Query Processor

Optimize analytical query processing with SIMD instructions and cache locality

Technology

We believe that C/C++ is the best language for developing system software like OS and databases, so YDB Core is written in C++. Modern C++ is a language that allows you to easily handle memory management with the help of smart pointers, without limiting your ability to get the most out of the hardware you are working with.

YDB has a message-oriented design which reflects the distributed nature of the system, so YDB internals are written in Actor model.

More specifically, we implement a component as a C++ class, that inherits the actor interface and is allowed to send messages to other actors, receive messages from other actors and create new actors. We also have a coroutine engine built on top of the actor system, so we can use coroutines when they better suit the problem.

Our programming language is not limited to C++ at all. We do things like e2e tests, stress tests, and chaos monkey tests in Python, we extend Kubernetes by building YDB operators in Go. We also develop YDB SDK for different programming languages including:

  • Go
  • Python
  • Java
  • Rust
  • Node.js
  • C++
  • PHP

In addition to programming languages, another technology we care about is Kubernetes. Distributed systems are not just about development, but in many ways about setups, updates, and maintenance. We have a long history of managing in-house solutions based on salt, ansible, and even an in-house deployment system, but now we are moving by leaps and bounds towards Kubernetes. Managing a stateful application is always a tough job, and Kubernetes provides us with the flexibility of custom operators for our sophisticated needs.

Cloud Services

We cannot imagine the modern IT world without cloud infrastructure. Cloud services are used everywhere, and they have already changed the way people use computers.

The YDB Platform Team has a unique experience with cloud services. First, before the Cloud era, we provided services for internal company users — in other words, our colleagues. This taught us the difference between just writing code and making a user happy.

Second, the YDB Platform Team participated in the development of a public cloud from the ground up. It was an awesome experience. YDB is used in critical parts of IaaS and PaaS layers:

  • As a storage layer for network block devices attached to virtual machines
  • As a database for a number of services in the cloud.

We run a number of Cloud Services which are based on our technology.

Public cloud services require a special approach to security, personal data management and compliance. They are easier to understand when the team takes part in all these activities from the very beginning. Making our services compliant with the highest security standards is a technical challenge comparable to making the system scalable, reliable, and performant.

Third, our team provides a number of cloud services for end users (listed above). All of them are built as multi-tenant services, available for multiple users, and provide isolation guarantees. The first, Managed Service for YDB, is available in two options:

  • more traditional dedicated virtual machines
  • and as a Serverless service, when a user does not need to provision resources for the database.

All other services are provided in Serverless form. Message Queue and Data Streams implement different types of persistent queues and streams with exactly once delivery guarantees. Federated Query makes it possible to query multiple data sources including Object Storage, PostgreSQL, and ClickHouse.

Team

Expert

We have a friendly team of highly motivated professionals with a passion for databases and distributed systems, and enthusiastic about solving top-level problems. Our core team has a strong background in building databases, decades of experience, MSs and PhDs from leading universities.

Non-hierarchical

Our team is quite non-hierarchical, so everyone can freely reach out to other team members. Everyone can find an area where they can shine. Our environment provides a good opportunity to grow as a professional, because the problems we deal with are complex and our backlog is huge. Leadership skills are highly valued, since our broad area and complex problems require ambitious leaders.

Ambitious

Previous experience developing databases is not as important to us as passion and patience are. We are welcoming experienced software engineers, middle level, and junior developers. Key people in our current team came to us as interns and grew up to senior level over time.

Subteams

We have two major products — YDB itself and Federated Query, which allows us to process cross data sources including infinite queries over data streams. Totally 9 subteams develop and maintain these products.

Product

Distributed Storage Team

The Distributed Storage Team is responsible for storing and replicating data. Teams involved with low-level optimisations, working directly with block devices, network transport.

Tablets Team

The Tablets Team is responsible for YDB’s core functionality: table infrastructure, user data organization, and distributed transactions.

Queues Team

The Queues Team provides customers with multiple queue and publish/subsribe services.

Query Processor Team

The Query Processor Team is responsible for query execution, both OLTP and OLAP. It reuses the query processor from the YQL Core Team, and at the same time this team is the most significant contributor to the query processing facilities.

Analytics Team

The Analytics Team is a new and growing sub-team responsible for building column storage and efficient subquery execution based on columnar indexes.

Application Team

The Application Team makes it easy to use YDB. The team is responsible for all client SDKs.

DevOps Team

The DevOps Team is responsible for running our services in the Clouds and on premises, monitoring, and CI/CD processes. We are also a developing Kubernetes operator for automated deployment and management

Streaming Processing Team

The Streaming Processing Team develops a part of Federated Query which allows us to process continuous queries over streams of data.

YQL Core Team

The YQL Core Team develops core parts of YDB Query Language.

Challenges

Query Processor

  • Add support for PostgreSQL compatibility
  • Performance optimisations, caching programs closer to shards that manages data
  • Add support for cost based optimisation
  • Add support for Common joins
  • Add support for distributed data sorting without size limitations

Distributed Storage and Network Interconnect

  • Seamless storage management, like migrating some part of a cluster from one availability zone to another
  • Actor system optimisations, like dynamic redistribution of CPU cores between thread pools
  • Cluster self heal improvements and optimisations
  • Developing Blob Depot component for seamless blob distribution over a all storage groups. It is useful for instance for node decommission, IOPS and size balancing

Tablets

  • Transactions should see their modifications before commit
  • Distributed transaction protocols optimisations — we know what to do
  • Gather statistics about user data for query optimisations
  • PITR support

Scaling cluster

  • Improvements to internal subscription service to avoid bottlenecks
  • Scaling State Storage

Analytics

  • Optimize columnar store, add indexes
  • Add HTAP support

Cluster Management

  • Automated storage and compute scaling
  • Integration with different cloud platforms (AWS/GCP/Azure)

Open vacancies

YDB Core Software Development Engineer

YDB Query Processor Software Development Engineer

YDB Site Reliability Engineer

Сontact us

You can contact our Recruiter Lead Kristina Sarafannikova directly via: