Database Fundamentals: SQL vs NoSQL and Their Business Impact
Database selection — or データベース selection, as it is discussed in Japanese tech due diligence — is not a question of “which tool is convenient.” It is a business decision that simultaneously determines scalability ceiling, infrastructure cost structure, and hiring market depth.
When a company says “we use PostgreSQL” or “we designed it with MongoDB,” that choice directly influences post-investment value-add difficulty and post-acquisition integration costs. The precision of technical due diligence depends on whether M&A professionals and VC investors can evaluate database selection with a structured framework. This article clarifies the essential differences between SQL and NoSQL and provides the questions worth asking in investment and acquisition decisions.
What Is a Database: Beyond “an Organized Data Warehouse”
A database is a system that persistently stores data in a way that maintains consistency even when multiple applications and users access it simultaneously. The decisive difference from simple file storage lies in “concurrent access handling” and “guaranteed recovery after failures.”
Consider an e-commerce site where inventory shows “1 item left” and two users attempt to purchase simultaneously. A naive implementation that simply writes to files would allow both users to read “in stock” simultaneously and complete their purchases — resulting in overselling. Databases solve this problem through “transactions”: one operation completes before the other is allowed to proceed. This mutual exclusion is one of the core functions of a database.
SQL vs NoSQL: A Fundamental Difference in Design Philosophy
SQL Databases (Relational DB): Designed to “Preserve Integrity”
The design philosophy of SQL databases (RDB: Relational Database) is to guarantee data integrity as the top priority.
Data is managed in “tables” similar to spreadsheets — separate customer, order, and product tables linked through relationships (relations). This allows constraints to be enforced at the database level: “every order must have a customer,” “every invoice must link to an order.”
The governing design standard is “ACID properties”:
- Atomicity: A transaction is either entirely successful or entirely fails. No “debit succeeded, credit failed” scenario in bank transfers.
- Consistency: Data is not contradictory before and after processing.
- Isolation: Concurrent transactions do not interfere with each other.
- Durability: Committed data is not lost after failures.
This strictness is a strength, but it comes at a cost. Guaranteeing ACID properties requires many checks at write time. As data volume grows, load increases on a single server. There is a ceiling to what vertical scaling (upgrading server specs) can handle, and horizontal scaling (adding more servers) is difficult with this architecture.
NoSQL Databases: Designed to “Prioritize Scale”
NoSQL databases emerged in the late 2000s from problems faced by Google and Amazon: “It’s impossible to fit hundreds of millions of users’ data on one server. Let’s sacrifice some consistency in exchange for making distribution across multiple servers easier.”
NoSQL is a category name encompassing multiple types:
| Type | Characteristics | Examples |
|---|---|---|
| Document | Stores flexible data structures in JSON-like format | MongoDB, Firestore |
| Key-Value | Simplest structure mapping keys to values | Redis, DynamoDB |
| Column-Oriented | Strong at aggregation over large datasets | Cassandra, BigQuery |
| Graph | Represents relationships through nodes and edges | Neo4j |
What NoSQL gains by sacrificing some consistency is “Eventual Consistency.” When data is distributed across multiple servers, a short delay before a write to one server is reflected on another is acceptable. The mechanism by which a post you publish on a social network appears in your followers’ feeds a few seconds later is built on this eventual consistency model.
Key Database Characteristics: A Deep Dive for Investment and Business Decisions
PostgreSQL and MySQL: The Twin Pillars of Open-Source RDB
PostgreSQL is the most feature-rich open-source RDB. Its strengths include JSON type handling, full-text search, geospatial data (PostGIS extension), and capacity for complex queries. It is widely used in financial systems, SaaS, and data analytics platforms. It requires no commercial license and is available as a managed service from major cloud providers (AWS RDS, GCP Cloud SQL, Azure Flexible Server).
MySQL has long been the standard for web startups. Compared to PostgreSQL, it has a narrower feature set, but is correspondingly simpler with easier initial configuration. Many CMSes including WordPress are built on MySQL, and knowledge is broadly distributed across the web engineer talent pool.
Due diligence note: Which of PostgreSQL or MySQL is chosen matters less than “whether migration management exists” and “whether index design is appropriate.” Careless implementations through ORMs (Object-Relational Mapping) create serious performance problems as data volume grows.
MongoDB: The Synonymous Name for Document-Oriented DB
MongoDB stores data in “documents” — units in a format close to JSON. The biggest difference from RDB is that it is schema-less (no need to define table structure in advance). Being able to change data structure with each spec change makes it valuable during the hypothesis validation phase before product-market fit.
However, schema-less is a double-edged sword. The freedom of “anything goes” easily leads to data quality degradation — situations where the same field contains a string in one document and a number in another make data analysis difficult. When evaluating the data foundation of startups that have used MongoDB for a long time, it is not uncommon to find “a chaotic state with no schema design.”
Due diligence note: Adopting MongoDB is not itself a problem, but you should verify “whether validation, schema definitions, and index management exist as operational rules.” Whether schema libraries like Mongoose are in use, and whether document design change history is maintained, serve as judgment criteria.
DynamoDB: AWS’s Fully Managed Key-Value Store
DynamoDB is AWS’s managed key-value and document database. It is fully serverless, with AWS automatically managing operations and scaling. The ability to automatically expand capacity in response to sudden user growth is a strength, as is low operational overhead.
However, design flexibility is limited. DynamoDB requires deciding “which query patterns will be used” at design time. Unlike RDB, ad hoc SQL queries are not possible, making it difficult to accommodate later requests like “we want to aggregate this” or “we want to filter with complex conditions.”
Due diligence note: DynamoDB is also a choice of “cannot leave AWS.” How to evaluate lock-in risk is a matter of overall cloud strategy. It is effective to read this alongside the vendor dependency risk framework discussed in Cloud Services Introduction.
Redis: An In-Memory DB Specialized for “Speed”
Redis holds data in memory rather than on disk. With no disk I/O required, read/write speed is orders of magnitude faster — achieving response times in the microsecond to millisecond range.
Primary uses are “caching” and “session management.” Temporarily storing the results of heavy processing in Redis and returning them from Redis rather than accessing the DB when the same request arrives again significantly reduces load on the RDB. Many web services adopt Redis as a “cache layer” placed in front of the RDB.
Due diligence note: Redis is used almost exclusively as a cache infrastructure rather than a primary database. “We use Redis” alone does not tell you whether it is functioning as a scale-out measure or simply being used for session management. It is important to verify what layer it is being used at and what problem it is solving.
How DB Selection Affects Scalability and Cost
Database selection is one of the first components to become a bottleneck when startups scale.
The ceiling and cost of vertical scaling: Because RDB has difficulty with horizontal scaling, the only option when data volume and query count increase is upgrading server specs. High-spec DB servers are one of the main drivers of cloud cost spikes. Changing AWS RDS PostgreSQL from db.r6g.xlarge to db.r6g.4xlarge quadruples the monthly cost.
Read replicas and sharding: Common RDB scaling strategies include read replicas (maintaining multiple read-only copies of the DB). For read-heavy services, this has a significant effect. A more complex strategy is sharding (splitting data and distributing it across multiple servers), but the application-layer modification cost is high and not an easily available option for startups.
NoSQL trade-offs: NoSQL, designed with horizontal scaling in mind, has advantages in scalability but is weak at complex data analysis (JOINs, aggregations, data mart construction). As a business grows, costs arise from “building a separate analytics foundation.” Constructing a DWH (Data Warehouse) such as BigQuery or Redshift separately and setting up real-time synchronization of NoSQL data introduces costs in both infrastructure expense and operational work.
Three Questions for Evaluating DB Selection in Portfolio Companies
Question 1: When asked “why did you choose that DB,” do business justifications emerge?
Responses like “it came pre-installed” or “the CTO was used to it” are warning signs. Teams that can explain their choice linked to business requirements, development phase, and risk awareness — “ACID properties were essential because we handle financial data” or “we chose MongoDB to prioritize initial development speed but have established schema management rules” — demonstrate high-quality technical decision-making.
Question 2: Can the current DB accommodate business growth?
When user count and data volume increase tenfold or a hundredfold, can the current DB design handle it? If RDB is in use, check whether there is a roadmap for query optimization, index design, read replica introduction, and sharding plans. If NoSQL is in use, whether there is a plan for an analytics data foundation (DWH) is equally important. High growth without planning creates a technical time bomb in the data infrastructure.
Question 3: Do they understand the cost of DB migration?
DB migration is a higher-risk operation than changing programming languages. Migrating to a new DB without data loss in production requires data migration design, dual-write periods, and rollback plans. Even a migration of one million records can become a months-long project if schema cleanup is insufficient. When a statement like “we plan to migrate to RDB in the future” appears, it is important to verify whether that cost is included in the business plan.
Positioning Database Selection Within the Broader Technical DD
Database selection should not be evaluated in isolation — it must be read in the context of the overall architecture. “What DB is being used” matters less than “whether the selection rationale, operational quality, and scale planning are aligned.”
A summary of checkpoint items for evaluating databases in technical DD:
- Is backup and restore tested on a regular basis?
- Is the migration (schema change) procedure automated?
- Are production, staging, and development DB configurations separated?
- Is DB monitoring (slow queries, connection count, disk usage) in place?
- Are there encryption and access control policies for sensitive data?
These are less about technical capability and more signals of operational culture maturity. Read alongside the evaluation of engineering organization health, they enable organizational assessment that goes beyond mere tool evaluation.
Related Articles
Databases are part of a larger architectural design. For the perspective of overall system structure (monolith, microservices, serverless), Architecture Patterns: Monolith, Microservices, and Serverless as Decision Frameworks provides detailed treatment.
For selection of the infrastructure platform (AWS/GCP/Azure) on which databases run and evaluation of vendor dependency risk, see Cloud Services Introduction: Reading AWS/GCP/Azure as Business Decisions.
For the overall framework of evaluation in technical DD, Technical Due Diligence: Overview and 7 Evaluation Axes functions as the hub article.
Tied Inc. provides technical due diligence support for VC and M&A professionals at corporations. For details on technical evaluation including data infrastructure assessment, see our investor services page or contact us through the inquiry form.