Database types explained simply

This article covers 4 database types: relational (SQL), NoSQL, NewSQL, and Vector databases. Most databases fall within these categories.

How well do you know databases? Quiz yourself now and find out what you don’t know. Includes interview questions asked by companies like Google, Meta, Apple, and Airbnb.

SQL / Relational Databases

A relational database is a type of database that stores and provides access to data points that are related to one another. 

In a relational database, each table (sometimes called a relation) is made up of columns and rows, with each row representing a record and each column representing an attribute of the data.

Image from Codecademy

Key Features

  1. Tables with Rows and Columns: Data is organized in tables, which are made of rows and columns. Each row is a record with a unique identifier known as a primary key, and each column is an attribute or field of the data.
  2. Data Integrity: The database ensures accuracy and consistency of data through constraints, including primary keys, foreign keys, and rules about data relationships.
  3. Structured Query Language (SQL): SQL is the standard language used in relational databases for querying and manipulating data. It is powerful and flexible, enabling complex queries and data operations.
  4. Normalization: This is a process of organizing data in the database to reduce redundancy and improve data integrity.
  5. ACID Properties: Relational databases adhere to ACID (Atomicity, Consistency, Isolation, Durability) properties, ensuring reliable transactions and data integrity.

Downsides

  1. Complexity: Setting up and maintaining a relational database can be complex, especially for large-scale systems.
  2. Performance Issues: As the size and complexity of the database grow, performance can become an issue, especially with complex queries and large datasets.
  3. Cost: Some relational database systems can be costly, particularly enterprise-grade solutions that require licenses.
  4. Rigidity: The structured nature of relational databases can sometimes be a limitation when dealing with unstructured data types.

Examples of Relational Databases

  1. MySQL: Widely used, open-source database, known for its reliability and ease of use. (MySQL)
  2. Oracle Database: A feature-rich, enterprise-grade database solution used by large organizations for its robust performance and security features. (Oracle Database)
  3. Microsoft SQL Server: Popular in the corporate world, known for its integration with other Microsoft products and services. (MSFT SQL Server)
  4. PostgreSQL: An advanced, open-source relational database, known for its standards compliance and extensibility. (PostgreSQL)
  5. Amazon Aurora: A MySQL and PostgreSQL-compatible relational database built for the cloud, offering the performance and availability of high-end commercial databases at a fraction of the cost. (Amazon Aurora)
  6. MariaDB: An enhanced, drop-in replacement for MySQL, known for its open-source innovation, robust performance, and strong security features. (MariaDB)
  7. SQLite: A lightweight, self-contained, serverless, and configuration-free SQL database engine, widely used in embedded systems and mobile applications for its simplicity and reliability. (SQLite)

According to the most recent Stack Overflow developer survey, Postgres is the most popular database used by professional developers.

Relational Databases in the Real World

In the real world, relational databases are either managed or unmanaged.

Managed Databases

Managed databases, such as Amazon Web Services Relational Database Service (AWS RDS), Google Cloud SQL (part of Google Cloud Platform, GCP), and Microsoft Azure SQL Database, offer a simplified and automated approach to database management.

These cloud-based services take the “DevOps” work out of the equation by handling various administrative tasks which include:

  1. Automated Backups and Recovery: They provide automatic backup solutions, ensuring data safety and facilitating easy recovery in case of data loss.
  2. Scaling: These services allow for easy scaling of database resources, adapting to changing demands without the need for manual intervention.
  3. Maintenance and Updates: Managed databases are regularly updated and maintained by the service provider, ensuring that the database runs on the latest software versions with optimal security.
  4. Performance Monitoring: They offer tools for monitoring database performance, helping in identifying and resolving issues swiftly.
  5. Security: Managed databases typically come with built-in security features like encryption, access control, and network isolation to protect sensitive data.

Unmanaged Databases

Unmanaged databases refer to the traditional approach where databases are deployed and managed manually by the organization’s software or DevOps team. This typically involves:

  1. Manual Setup and Configuration: IT teams are responsible for installing, configuring, and setting up the database on their own servers or cloud infrastructure.
  2. Regular Maintenance: Routine maintenance tasks such as backups, updates, and security patches are managed internally, requiring dedicated resources and expertise.
  3. Customization and Control: Unmanaged databases offer more control over the database environment, allowing organizations to customize configurations and setups based on specific needs.
  4. Resource Management: Organizations are responsible for monitoring and scaling the database resources manually, which requires careful planning and management.
  5. Security Management: Implementing and maintaining security measures, including data encryption and access controls, is entirely in the hands of the organization.

Test your relational database knowledge now and find out what you don’t know. Includes interview questions asked by companies like Google, Meta, Apple, and Airbnb.

NoSQL

NoSQL, standing for “Not Only SQL,” refers to a broad category of database management systems that differ from traditional relational database management systems (RDBMS) in some key ways.

These databases are designed to handle large volumes of data that can be unstructured or semi-structured.

Source: Datacamp

Key Features of NoSQL Databases

  1. Flexibility in Data Models: Unlike relational databases, NoSQL databases can store unstructured and semi-structured data, making them ideal for diverse data types like JSON, XML, and more.
  2. Scalability: NoSQL databases are designed for horizontal scalability, meaning they can handle increased loads by adding more servers in the database cluster.
  3. High Performance: These databases often provide faster read/write operations, particularly beneficial for applications requiring high data throughput.
  4. Schema-less: Most NoSQL databases do not require a fixed schema, allowing for on-the-fly addition of new data types without disrupting existing data.
  5. Variety of Data Store Types: NoSQL encompasses several types of databases, including document stores, key-value stores, wide-column stores, and graph databases, each serving different use cases.

Downsides

  1. Complexity in Data Consistency: Ensuring data consistency can be more complex in NoSQL databases compared to ACID-compliant relational databases.
  2. Limited Query Capabilities: Some NoSQL databases do not support rich query languages like SQL, potentially limiting the complexity of queries.
  3. Management Challenges: Managing and optimizing NoSQL databases can require specialized expertise. For this reason, it’s common to only use NoSQL databases when query patterns are known beforehand.
  4. Transaction Support: Not all NoSQL databases support multi-record ACID transactions which can be crucial for certain applications.

When should NoSQL databases be used?

You should consider using NoSQL when you are in the following scenario:

  • Constant changing of data: when you do not know how your system or applications will grow in the future, meaning that you might want to add new data types, new functions, etc.
  • A lot of data: when your business is dealing with huge data that might grow over time.
  • No consistency: when data consistency and 100% integrity are not your priority. For example, when you develop a social media platform for your business, all the employees seeing your posts at once might not be an issue.
  • Scalability and cost: NoSQL databases allow greater flexibility and can control costs as your data needs change.

Examples of NoSQL Databases

There are four types of NoSQL databases.

  1. Document-Oriented Databases:
    • These databases store data in documents similar to JSON or XML formats.
    • Pros:
      • Flexible schema design.
      • Good for working with JSON, XML, and other semi-structured data formats.
      • Easier for developers familiar with document formats.
    • Cons:
      • Not optimal for complex queries involving multiple document joins.
      • Data redundancy can increase storage requirements.
    • When to use:
      • Dealing with semi-structured or unstructured data.
      • Applications requiring rapid development and iterations with constant schema changes. (Prototyping, for example)
    • Examples: MongoDB, Couchbase.
  2. Key-Value Stores:
    • Simplest type of NoSQL databases where each item contains keys and values.
    • Pros:
      • Highly efficient for lookups, making them suitable for applications that require high-speed data access.
      • Extremely high performance for read/write operations.
      • Simple data model, easy to scale horizontally.
    • Cons:
      • Limited functionality for complex querying.
      • Not suitable for hierarchical data management.
    • When to use:
      • When speed and scalability are priorities over complex data manipulation.
      • For caching, session stores, real-time recommendation engines, and scenarios where data is frequently accessed by key.
    • Examples: Redis, Amazon DynamoDB.
  3. Wide-Column Stores:
    • Store data in tables, rows, and dynamic columns.
    • Efficient for querying large datasets and useful for real-time analysis and high-scale projects.
    • Pros:
      • Highly scalable for large datasets.
      • Efficient in read/write operations across large volumes of data.
      • Flexible in handling varied column sets for different rows.
    • Cons:
      • Complexity in data modeling compared to relational databases.
      • Not ideal for small datasets due to overhead.
    • When to use:
      • When managing very large datasets and high scalability is required.
      • Suitable for big data applications, time-series data, and analytical processing.
    • Examples: Cassandra, HBase.
  4. Graph Databases:
    • Designed to handle data whose relations are well represented as a graph and has elements interconnected with many relationships.
    • Pros:
      • Highly scalable for large datasets.
      • Efficient in read/write operations across large volumes of data.
      • Flexible in handling varied column sets for different rows.
    • Cons:
      • Complexity in data modeling compared to relational databases.
      • Not ideal for small datasets due to overhead.
    • When to use:
      • When managing very large datasets and high scalability is required.
      • Ideal for handling complex hierarchical data.
      • Suitable for big data applications, time-series data, and analytical processing.
    • Examples: Neo4j, Amazon Neptune.

Each type of NoSQL database has its own set of features and is chosen based on the specific requirements and data models of the application or project.

Real-World Applications

Managed NoSQL Databases

Many cloud providers offer managed NoSQL database services, such as Amazon DynamoDB, Google Cloud Datastore, and Azure Cosmos DB. These managed services handle database maintenance tasks like scaling, backups, and updates.

Test your NoSQL database knowledge now and find out what you don’t know. Includes interview questions asked by companies like Google, Meta, Apple, and Airbnb.

NewSQL

NewSQL is a term coined to describe a new generation of relational database management systems (RDBMS) that provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while maintaining the ACID guarantees of a traditional database system.

NewSQL databases are designed to address the scalability challenges of traditional SQL databases, making them suitable for handling the vast amounts of data and high transaction rates.

Key Features of NewSQL Databases

  1. ACID Compliance: NewSQL databases adhere to the ACID (Atomicity, Consistency, Isolation, Durability) properties, ensuring reliable transactions and data integrity.
  2. Scalability: They offer horizontal scalability, similar to NoSQL systems, enabling the handling of large-scale, high-traffic applications.
  3. SQL Support: NewSQL databases support SQL queries, making them accessible to those familiar with traditional relational database systems.
  4. High Performance: They are optimized for high throughput and low latency, particularly in OLTP environments.
  5. Distributed Architecture: Many NewSQL databases use a distributed architecture, which helps in scaling out and balancing loads.

Downsides

  1. Relative Complexity: The blending of old and new database technologies can result in increased complexity in terms of deployment and management.
  2. Cost: Depending on the implementation, NewSQL systems can be more expensive than traditional databases, especially for small-scale applications.
  3. Emerging Technology: As a relatively new technology, NewSQL databases may not have the same level of community support and maturity as more established systems.

Examples of NewSQL Databases

  1. Google Spanner: A globally-distributed database designed for high availability, horizontal scalability, and strong consistency.
  2. NuoDB: A distributed SQL database focusing on scalability and maintaining ACID guarantees across multiple data centers.
  3. CockroachDB: A cloud-native SQL database that provides automated scaling, consistency, and survivability.
  4. VoltDB: Optimized for high-speed data processing, VoltDB is an in-memory NewSQL database designed for fast data analytics.

Real-World Applications

Managed NewSQL Databases

Managed NewSQL services, like Google Cloud Spanner, provide a fully-managed environment where scaling, maintenance, and updates are handled by the cloud provider.

When to use NewSQL databases

Use NewSQL databases when you need:

  1. High Transactional Throughput: NewSQL databases are designed to handle high transaction rates while maintaining strong consistency, making them suitable for systems that require rapid and reliable transaction processing, like financial systems or online retail.
  2. Need for Scalability and Consistency: If your application needs to scale horizontally (i.e., adding more machines to the system) while maintaining the consistency of a traditional SQL database.
  3. Complex Queries with Real-time Performance: Applications requiring complex SQL queries (like JOINs or subqueries) and real-time performance can benefit from NewSQL databases, as they provide SQL-like query capabilities with high performance.
  4. Hybrid Transactional/Analytical Processing (HTAP): NewSQL databases can handle both OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) workloads effectively, making them suitable for applications that require real-time analytics on transactional data.
  5. Systems Requiring Strong Data Integrity: NewSQL databases adhere to ACID properties, ensuring strong data integrity. This is crucial for systems where data consistency and reliability are non-negotiable, such as banking or healthcare systems.

Test your NewSQL database knowledge now and find out what you don’t know. Includes interview questions asked by companies like Google, Meta, Apple, and Airbnb.

Time-Series Databases

Time-series databases are specialized database systems designed for handling time-stamped or time-series data—data points indexed in time order.

This type of data is common in various fields like finance, meteorology, IoT, and more, where the timing of data points is crucial.

Key Features of Time-Series Databases

  1. Time-Stamped Data Entries: They are optimized for storing data points collected over time, each associated with a timestamp.
  2. High-Performance Writing and Querying: These databases are tailored for high-speed data ingestion and rapid querying, crucial for time-sensitive applications.
  3. Data Compression: Time-series data can be highly redundant; these databases often include mechanisms for efficient data compression.
  4. Time-Based Aggregation: They support time-based aggregation queries, which are essential for analyzing trends over time.
  5. Scalability: Many time-series databases are designed to handle large volumes of data and scale as data grows.

Downsides

  1. Specialized Nature: Being specialized for time-series data, these databases might not be suitable for general-purpose database applications.
  2. Complexity in Handling Non-Time-Related Data: They may not efficiently manage data that isn’t time-oriented.
  3. Resource Intensive for Large Datasets: Handling and analyzing vast amounts of time-series data can be resource-intensive.

When to use time-series databases

Use a Time-Series database when you need:

  1. Time-series data management (duh!)
  2. Real-Time Analytics: They enable real-time processing and analysis of time-series data, essential for decision-making in various industries.
  3. Data Insights and Trend Analysis: Time-series databases facilitate the analysis of trends, patterns, and anomalies over time.

Examples of Time-Series Databases

  1. InfluxDB: Popular for IoT and DevOps monitoring, known for its high performance and scalability.
  2. Prometheus: Widely used for monitoring and alerting in cloud-native environments.
  3. TimescaleDB: An extension of PostgreSQL, offering the reliability of SQL with time-series data optimization.
  4. Kdb+: Known for its speed and efficiency, commonly used in high-frequency trading applications.

Real-World Applications

Managed Time-Series Databases

Cloud providers offer managed time-series database services, like Amazon Timestream and InfluxDB Cloud.

Test your time-series database knowledge now and find out what you don’t know. Includes interview questions asked by companies like Google, Meta, Apple, and Airbnb.

Vector Databases

Vector databases are specialized database systems designed to store, manage, and perform operations on vector data—data represented in high-dimensional space. They are particularly adept at handling complex data types such as images, audio, video, and text, which are commonly used in machine learning models.

Vector databases enable efficient similarity searches, where the query involves finding the closest data points in a high-dimensional space to a given vector.

Key Features of Vector Databases

  1. High-Dimensional Data Storage: They are optimized to store and manage data in vector format, often used in machine learning and AI applications.
  2. Efficient Similarity Search: Vector databases excel at performing similarity searches, enabling rapid retrieval of the most similar data points to a given query.
  3. Scalability: Many vector databases are designed to handle large volumes of high-dimensional data and scale as the dataset grows.
  4. Integration with AI and ML Models: These databases often seamlessly integrate with machine learning models, facilitating AI-driven applications.
  5. Advanced Indexing Techniques: Vector databases use sophisticated indexing techniques to efficiently store and query high-dimensional data.

Downsides

  1. Specialized Use Cases: Vector databases are specialized tools and may not be suitable for general-purpose database needs.
  2. Complexity: Vector DBs are relatively new and can be complex to deal with.
  3. Emerging Technology: Vector DBs may lack the maturity and extensive community support of more established database systems, like Postgres.

When to use vector databases

Use vector DBs when you need:

  1. Enhanced ML-Powered Search Capabilities: They offer unparalleled efficiency in searching and retrieving data based on similarity, crucial for AI applications.
  2. AI and Machine Learning Integration: This is because vector DBs work well with high-dimensional data.
  3. Real-Time Processing: Essential for dynamic AI applications.

Examples of Vector Databases

  1. Milvus: An open-source vector database designed for similarity search and AI applications.
  2. Pinecone: A scalable vector database service that focuses on machine learning and similarity search.
  3. Weaviate: An open-source vector search engine with GraphQL and RESTful APIs for ease of integration.
  4. Faiss by Facebook AI: Although not a full database, Faiss is a library for efficient similarity search and clustering of dense vectors.

Test your vector database knowledge now and find out what you don’t know. Includes interview questions asked by companies like Google, Meta, Apple, and Airbnb.