Book Overview

Design Data-Intensive Applications by Martin Kleppmann

Principle and practicality of Data System
Similarity and difference of data systems
- and their high-level implementations (tradeoffs)

3 parts:

Foundations of data system

Reliability, Scalability, Maintainability
data models and query language
storage and retrieval
format of data encoding (serialization) and evolution of schemas

Distributes data across multiple machines

replication
partitioning/sharding
transaction
the trouble with distributed system
consistency and consensus

Derived data

batch process
streaming process
the features of data system

Figure 1.1

scalable
highly available (minimizing downtime) and operationally robust

single-node VS distributed systems
online/interactive VS offline/batch processing systems

free and open source software (FOSS)

infrastructure as a service (IaaS) such as Amazon Web Services

Chapter 1: Reliable, Scalable, and Maintainable Applications

Data System: Database, Cache, Queue, Index
typically built from standard building blocks like:

database
caches: Remember the result of an expensive operation, to speed up reads
search indexes: Allow users to search data by keyword or filter it in various ways
stream processing: Send a message to another process, or be handled asynchronously
batch processing: Periodically crunch a large amount of accumulated data

reliable, scalable, and maintainable data systems

Reliability: System should continue to work correctly even in face of adversity (hardware or software faults, and even human error).
Scalability: As the system grows (in data volume, traffic volume, or complexity), there should be reasonable ways of dealing with that growth.
Maintainability: Over time, different people will work on the system maintaining and adapting the system, and they should all be able to work on it productively.

Reliability

continue to work correctly, even when things go wrong

work correctly:

Function as expected
Tolerate User Error
Good Performance
- Under expected load and data volume
Prevent Bad Actor
- Unauthorized access and abuse

faults -> fault-tolerant or resilient

Tolerate Faults, Prevent Failure

Hardware faults
Software errors
- Bugs causing crash
- Resource hungry process
- Dependency error
  - the 3rd party api isn’t working
  - A service that the system depends on that slows down
- Cascading failure: some service related to it fails
Human errors
- Configuration error

Deal with Faults

Hardware faults
- Redundancy (hardware)
  - redundancy of hardware components
  - multi-machine redundancy
    - to ensure backup machines and data
    - When one component dies, the redundant component can take its place while the broken component is replaced.
- Rolling update (software)
Software error
- Careful reasoning
- Thorough testing
- Process isolation
- Allow crash and restart
- Measuring/monitoring/analyzing system behavior in production
  - sending alert if a discrepancy is found
Human Error
- Minimize opportunities for error through abstraction and interface sandbox
  - Use “design patterns” to structure the system well
  - eg. facade
- Unit test, integration test, manual test
- Quick recovery
  - make it fast to roll back configuration changes
  - roll out new code gradually
- Monitoring
  - performance metrics and error rates
- Management

Scalability

a system’s ability to cope with increased load

How to quantify “scalability”?

Load
- volume of read/write
- volume & complexity of data
- response time
- access pattern
Performance
- throughput
  - the number of records we can process per second
- response time
  - what the client sees
  - service time + network delays + queuing delays
  - varies on every try
- latency
  - the duration that a request is waiting to be handled

twitter’s example

Maintainability

Operability
Make it easy for operations teams to keep the system running smoothly
- Monitoring, recovering, root-causing
- Patching and updating
- Capacity planning
- Best practice deploying
- Security
- Define process (write clear docs)
- Maintain knowledge base
Simplicity
Make it easy for new engineers to understand the system
- Accidental complexity
  - Explosion of state space
  - Tight coupling
  - Tangled dependency
  - Special casing
Evolvability: agility on a data system level
Make it easy for engineers to make changes to the system in the future, adapting it for unanticipated use cases as requirements change
- Easy to make change
- New facts
- Unanticipated use cases
- Business priority shift
- New feature
- New platform
- Legal requirements
- Growth of system

Maintainable Data System

Operability
- Good monitoring
- Avoid dependency
- Good documentation
- Good default & options
- Self-healing & control
- Predicable
Simplicity
- Good abstraction (design patterns)
Evolvability
- Agile

Requirements

functional requirements: what it should do
- allowing data to be stored, retrieved, searched, and processed
nonfunctional requirements
- security, reliability, compliance, compatibility, maintainability

Reference

Martin Kleppmann, Designing Data-Intensive Applications ↩
精读DDIA 第一章 Reliable, Scalable and Maintainable Applications, https://youtu.be/HBuAklMAjaA ↩
DDIA 逐章精读(1), https://ddia.qtmuniao.com/#/ch01 ↩

CS > System Design

#DDIA #System Design

DDIA Chapter1 Notes

https://ruijun-ni.github.io/blog/2022/10/27/DDIA/DDIA-Chapter1/

Author

Ruijun Ni

Posted on

October 27, 2022

Licensed under

Blind75 Part1 Arrays Previous

C++ Basics Next

Reliable, Scalable and Maintainable Applications