SQL入门指南
SQL 从入门到精通 —— 大师手笔,深入浅出
作者:taohuaracing
这不是一本罗列语法的文档。这是一个老程序员手把手带你 理解 SQL 的思维方式。
读完它,你不仅能写 SQL,还能 像数据库一样思考。
📖 目录
- 为什么要学 SQL?
- SQL 的本质——先理解,再记忆
- 环境准备——动手实践才是王道
- 基础篇 —— CURD 四板斧
- 进阶篇 —— 筛选与排序
- 核心篇 —— JOIN 与表关系
- 聚合与分组
- 子查询与 CTE
- 窗口函数 —— 高手的分水岭
- 索引与性能——别让你的查询慢成狗
- 实战篇 —— 从零到一搭一个真实场景
- 常见陷阱与最佳实践
- 给新手的进阶路线图
- 参考与出处
1. 为什么要学 SQL?
一句话:数据是新时代的石油,SQL 是抽油机。
无论你是:
- 后端开发 —— 80% 的接口背后是数据库查询
- 数据分析师 —— SQL 是安身立命的根本
- 产品经理 —— 会 SQL 的产品比别人的需求清晰十倍
- 运维 / DBA —— 你懂的
SQL 不会过时。从 1974 年诞生到今天,它依然是数据领域的通用语言。
2. SQL 的本质——先理解,再记忆
很多初学者死记语法,然后抱怨 SQL 难。问题不在你,在方法。
🧠 核心思维:SQL = 集合论
SQL 的底层是关系代数。说人话:
每一条 SQL 语句,都是对「集合」的操作。
SELECT— 从集合中挑出某些列WHERE— 过滤集合中的行JOIN— 把两个集合拼在一起GROUP BY— 把一个集合拆成多个子集合HAVING— 过滤子集合
一旦你理解了 SQL 操作的是「集合」,而不是「一行一行」的数据,你就打通了任督二脉。
⚡ 执行顺序(背下来,刻进 DNA)
这不是你写的顺序,这是数据库执行 SQL 的顺序:
FROM → 1. 先确定数据来源(表 / JOIN 结果)
WHERE → 2. 过滤行
GROUP BY → 3. 分组
HAVING → 4. 过滤分组
SELECT → 5. 挑选列(可以起别名)
ORDER BY → 6. 排序
LIMIT → 7. 取前 N 行
💡 Why this matters: 为什么
WHERE里不能直接用SELECT中定义的别名?因为SELECT在第 5 步才执行,WHERE在第 2 步。为什么
HAVING可以用聚合函数而WHERE不行?因为WHERE执行时还没分组。理解了执行顺序,这些「为什么」就自然懂了。
3. 环境准备——动手实践才是王道
看一万遍不如写一遍。 以下三个选择,三选一:
🅰 最简单:SQLite + 在线工具
- 推荐: https://sqliteonline.com/ (在线,零安装)
- 或者 https://www.db-fiddle.com/
- 本地装 SQLite:
pip install sqlite3/brew install sqlite3
🅱 真实开发:MySQL / PostgreSQL
# Windows (用 Scoop / Chocolatey)
scoop install mysql
# 或者 Docker
docker run --name mysql-demo -e MYSQL_ROOT_PASSWORD=root -p 3306:3306 -d mysql:8.0
# Mac
brew install mysql
brew services start mysql
🅲 推荐新手方案
用 SQLite 起步,因为它零配置、轻量、语法干净。等 SQLite 玩熟了,再切 MySQL / PostgreSQL 几乎无障碍——核心语法 95% 一样。
📦 本文配套练习数据
建议你直接建一个库,把下面的表和数据跑一遍:
-- ============ 员工表 ============
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
department TEXT NOT NULL,
salary REAL NOT NULL,
age INTEGER,
join_date DATE
);
INSERT INTO employees VALUES (1, '张三', '技术部', 15000, 28, '2021-03-01');
INSERT INTO employees VALUES (2, '李四', '技术部', 18000, 32, '2020-06-15');
INSERT INTO employees VALUES (3, '王五', '市场部', 12000, 26, '2022-01-10');
INSERT INTO employees VALUES (4, '赵六', '市场部', 14000, 30, '2021-08-20');
INSERT INTO employees VALUES (5, '翠花', '人事部', 11000, 35, '2019-11-01');
INSERT INTO employees VALUES (6, '狗蛋', '技术部', 22000, 38, '2018-04-10');
INSERT INTO employees VALUES (7, '小明', '市场部', 9000, 23, '2023-02-28');
INSERT INTO employees VALUES (8, '小红', '人事部', 13000, 29, '2020-09-05');
-- ============ 订单表 ============
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
customer_name TEXT NOT NULL,
product TEXT NOT NULL,
amount REAL NOT NULL,
order_date DATE,
employee_id INTEGER,
FOREIGN KEY (employee_id) REFERENCES employees(id)
);
INSERT INTO orders VALUES (1, '阿里巴巴', '服务器', 50000, '2024-01-15', 1);
INSERT INTO orders VALUES (2, '腾讯', '云服务', 80000, '2024-01-20', 2);
INSERT INTO orders VALUES (3, '阿里巴巴', '数据库', 30000, '2024-02-10', 6);
INSERT INTO orders VALUES (4, '字节跳动', 'AI平台', 120000, '2024-02-15', 1);
INSERT INTO orders VALUES (5, '腾讯', '安全服务', 45000, '2024-03-01', 2);
INSERT INTO orders VALUES (6, '小米', '云存储', 25000, '2024-03-10', 6);
INSERT INTO orders VALUES (7, '阿里巴巴', 'CDN', 35000, '2024-03-20', 1);
INSERT INTO orders VALUES (8, '华为', 'AI平台', 95000, '2024-04-05', 2);
4. 基础篇 —— CURD 四板斧
4.1 SELECT —— 查
-- 查所有列
SELECT * FROM employees;
-- 查指定列
SELECT name, salary FROM employees;
-- 列运算
SELECT name, salary * 12 AS annual_salary FROM employees;
-- 字符串拼接 (不同数据库语法略有不同)
-- SQLite: ||
SELECT name || ' - ' || department AS info FROM employees;
-- MySQL: CONCAT(name, ' - ', department)
-- PG: name || ' - ' || department
-- 常量列
SELECT name, '在职' AS status FROM employees;
4.2 INSERT —— 增
-- 指定列插入 (推荐)
INSERT INTO employees (name, department, salary, age, join_date)
VALUES ('大壮', '技术部', 16000, 27, '2024-05-01');
-- 插入全部列 (顺序必须匹配表定义)
INSERT INTO employees VALUES (10, '小芳', '人事部', 10000, 24, '2024-06-01');
-- 批量插入
INSERT INTO employees (name, department, salary, age, join_date) VALUES
('阿强', '市场部', 11000, 25, '2024-04-01'),
('阿珍', '技术部', 19000, 31, '2023-10-01');
-- 从查询结果插入
INSERT INTO high_salary_employees (name, salary)
SELECT name, salary FROM employees WHERE salary > 15000;
4.3 UPDATE —— 改
-- ⚠️ 一定要加 WHERE!不加就全改了!
UPDATE employees SET salary = 16000 WHERE name = '小明';
-- 更新多列
UPDATE employees
SET salary = salary * 1.1, age = age + 1
WHERE department = '技术部';
-- 用子查询更新
UPDATE orders
SET amount = amount * 0.95
WHERE employee_id = (SELECT id FROM employees WHERE name = '张三');
4.4 DELETE —— 删
-- ⚠️ 一定要加 WHERE!不加就清表了!
DELETE FROM employees WHERE name = '大壮';
-- 删除所有行但保留表结构
DELETE FROM employees; -- 👎 慢,逐行删
TRUNCATE TABLE employees; -- 👍 快,一次性释放
-- 用子查询删除
DELETE FROM orders
WHERE employee_id IN (SELECT id FROM employees WHERE department = '市场部');
💡 黄金法则: INSERT 是原子操作,UPDATE 和 DELETE 一定要先 SELECT 确认再执行:
-- 先看看要改哪些 SELECT * FROM employees WHERE name = '小明'; -- 确认无误再改 UPDATE employees SET salary = 16000 WHERE name = '小明';
5. 进阶篇 —— 筛选与排序
5.1 WHERE 条件筛选
-- ========== 比较运算符 ==========
SELECT * FROM employees WHERE salary > 15000;
SELECT * FROM employees WHERE salary BETWEEN 10000 AND 15000;
-- ========== 字符串匹配 ==========
-- % 匹配任意多个字符,_ 匹配一个字符
SELECT * FROM employees WHERE name LIKE '张%'; -- 张开头
SELECT * FROM employees WHERE name LIKE '%小%'; -- 包含"小"
SELECT * FROM employees WHERE name LIKE '张_'; -- 张+一个字
-- ========== IN 运算符 ==========
SELECT * FROM employees WHERE department IN ('技术部', '人事部');
-- ========== NULL 判断 (不是 = NULL!) ==========
SELECT * FROM employees WHERE salary IS NULL; -- ✅
SELECT * FROM employees WHERE salary = NULL; -- ❌ 永远不成立!
-- ========== 复合条件 ==========
SELECT * FROM employees
WHERE department = '技术部'
AND salary > 15000
AND (age < 30 OR age > 35);
5.2 ORDER BY 排序
-- 升序 (默认)
SELECT name, salary FROM employees ORDER BY salary;
-- 降序
SELECT name, salary FROM employees ORDER BY salary DESC;
-- 多字段排序
SELECT department, salary, name
FROM employees
ORDER BY department ASC, salary DESC;
-- 用数字代表第几列 (不推荐,维护噩梦)
SELECT name, department, salary FROM employees ORDER BY 3 DESC;
5.3 LIMIT 分页
-- 前 3 条
SELECT * FROM employees ORDER BY salary DESC LIMIT 3;
-- 跳过 2 条取 3 条 (第 3~5 名)
SELECT * FROM employees ORDER BY salary DESC LIMIT 3 OFFSET 2;
-- 简写: LIMIT 2, 3 (MySQL only,注意顺序是 offset, limit)
-- 分页通用公式:
-- LIMIT page_size OFFSET (page_number - 1) * page_size
-- 第 2 页,每页 3 条
SELECT * FROM employees ORDER BY salary DESC LIMIT 3 OFFSET 3;
5.4 DISTINCT 去重
-- 有哪些部门?
SELECT DISTINCT department FROM employees;
-- 多列去重 (组合唯一)
SELECT DISTINCT department, age FROM employees;
-- DISTINCT 的性能问题: 它需要对结果排序去重,大表上慎用
-- 替代方案:GROUP BY 通常更快
SELECT department FROM employees GROUP BY department;
6. 核心篇 —— JOIN 与表关系
这是 SQL 最重要的概念,没有之一。
数据库设计的第一课:用关联表,不要把所有东西塞一个表里。
6.1 INNER JOIN
只返回两表中匹配的行:
-- 查询每个订单对应的员工
SELECT o.id AS order_id,
o.product,
o.amount,
e.name AS employee_name
FROM orders o
INNER JOIN employees e ON o.employee_id = e.id;
-- 结果: 只有有订单的员工才会出现。没有订单的员工不显示。
6.2 LEFT / RIGHT JOIN
左连接保留左表所有行,右表不匹配时填 NULL:
-- 显示所有员工及其订单(包括没订单的员工)
SELECT e.name,
e.department,
o.product,
o.amount
FROM employees e
LEFT JOIN orders o ON e.id = o.employee_id;
-- 结果: 所有员工都在,没订单的 product/amount 为 NULL
💡 LEFT JOIN 常用套路:找"没有"的东西
-- 哪些员工没有任何订单?
SELECT e.name, e.department
FROM employees e
LEFT JOIN orders o ON e.id = o.employee_id
WHERE o.id IS NULL;
6.3 CROSS JOIN
笛卡尔积——左表每行 × 右表每行:
-- 每个员工 × 每个部门(别在实际场景乱用)
SELECT e.name, d.department
FROM employees e
CROSS JOIN (SELECT DISTINCT department FROM employees) d;
6.4 SELF JOIN
自己连自己——同一张表起两个别名:
-- 找到同一部门里,工资比我高的人
SELECT e1.name AS employee,
e2.name AS higher_earner,
e1.salary AS my_salary,
e2.salary AS their_salary
FROM employees e1
INNER JOIN employees e2
ON e1.department = e2.department
AND e1.salary < e2.salary;
7. 聚合与分组
7.1 聚合函数
-- 常用聚合: COUNT, SUM, AVG, MAX, MIN
SELECT COUNT(*) AS total_employees,
SUM(salary) AS total_salary,
AVG(salary) AS avg_salary,
MAX(salary) AS max_salary,
MIN(salary) AS min_salary
FROM employees;
-- COUNT(*) vs COUNT(列)
SELECT COUNT(*) FROM employees; -- 8 (所有行)
SELECT COUNT(age) FROM employees; -- 8 (非 NULL 的行数)
SELECT COUNT(DISTINCT department) FROM employees; -- 3
7.2 GROUP BY + HAVING
-- 每个部门的统计
SELECT department,
COUNT(*) AS emp_count,
AVG(salary) AS avg_salary,
SUM(salary) AS total_salary
FROM employees
GROUP BY department;
-- HAVING: 过滤分组 (WHERE 是对行过滤,HAVING 是对组过滤)
SELECT department,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 13000;
-- 完整的 SQL 实例(写上最佳风格)
SELECT
department,
COUNT(*) AS emp_count,
ROUND(AVG(salary), 0) AS avg_salary
FROM employees
WHERE join_date >= '2020-01-01' -- 先过滤行
GROUP BY department
HAVING COUNT(*) >= 2 -- 再过滤组
ORDER BY avg_salary DESC -- 排序
LIMIT 3;
-- 💡 执行顺序: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
8. 子查询与 CTE
8.1 标量子查询
返回单个值的子查询,可以用在 SELECT、WHERE 里:
-- 在 SELECT 中: 每人工资与平均的差距
SELECT name, salary,
(SELECT AVG(salary) FROM employees) AS avg_salary,
salary - (SELECT AVG(salary) FROM employees) AS diff
FROM employees;
-- 在 WHERE 中: 高于平均工资的人
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
8.2 表子查询
返回多行多列,必须起别名:
-- 找出高于部门平均工资的员工
SELECT e.name, e.department, e.salary, dept_avg.avg_salary
FROM employees e
INNER JOIN (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
) dept_avg ON e.department = dept_avg.department
WHERE e.salary > dept_avg.avg_salary;
8.3 EXISTS / IN
-- IN: 有订单的员工
SELECT * FROM employees
WHERE id IN (SELECT DISTINCT employee_id FROM orders);
-- EXISTS: 更高效的写法(尤其大表)
SELECT * FROM employees e
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.employee_id = e.id
);
-- NOT EXISTS: 没订单的员工
SELECT * FROM employees e
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE o.employee_id = e.id
);
-- 💡 EXISTS vs IN:
-- 1. EXISTS 找到第一个匹配就停止,IN 要生成完整结果集
-- 2. 如果子查询结果集很大,EXISTS 通常更快
-- 3. IN 不能处理 NULL(NULL IN (...) 返回 NULL 不是 FALSE)
-- 4. 现代优化器在很多场景下会等价重写,但 EXISTS 写法更清晰
8.4 CTE (WITH) —— 现代 SQL 的基石
CTE (Common Table Expression) 是给子查询起名字。它让你把复杂的 SQL 拆成清晰的步骤。
-- 基础 CTE
WITH high_salary AS (
SELECT * FROM employees WHERE salary > 15000
)
SELECT name, department, salary
FROM high_salary
ORDER BY salary DESC;
-- 多层 CTE (这才是威力所在)
WITH
dept_stats AS (
SELECT department,
AVG(salary) AS avg_salary,
COUNT(*) AS emp_count
FROM employees
GROUP BY department
),
top_employees AS (
SELECT e.name, e.department, e.salary
FROM employees e
INNER JOIN dept_stats ds ON e.department = ds.department
WHERE e.salary > ds.avg_salary
)
SELECT * FROM top_employees ORDER BY salary DESC;
-- CTE 可以复用同一个名字多次引用
WITH department_counts AS (
SELECT department, COUNT(*) AS cnt
FROM employees
GROUP BY department
)
SELECT
(SELECT cnt FROM department_counts WHERE department = '技术部') AS tech_count,
(SELECT cnt FROM department_counts WHERE department = '市场部') AS market_count,
(SELECT cnt FROM department_counts WHERE department = '人事部') AS hr_count;
💡 CTE 的最佳实践:
- 把复杂查询拆成:取数据 → 算中间结果 → 最终输出 三个步骤
- 每个 CTE 只做一件事
- 名字起得有意义(
dept_stats而不是cte1)- 调试时可以先
SELECT * FROM 某个CTE单独跑
9. 窗口函数 —— 高手的分水岭
窗口函数 = 保留所有行 + 在行之间做计算。
普通
GROUP BY会把多行压成一行(丢失细节)。
窗口函数不压缩行数,而是在每行之外额外计算聚合值。
9.1 ROW_NUMBER / RANK / DENSE_RANK
-- 按工资排名
SELECT name, department, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num, -- 1,2,3,4,5...
RANK() OVER (ORDER BY salary DESC) AS rank, -- 1,2,2,4,5... (并列跳号)
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank -- 1,2,2,3,4... (并列不跳号)
FROM employees;
-- 每个部门内部分组排名
SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;
-- 每个部门工资最高的前两名
WITH ranked AS (
SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
SELECT * FROM ranked WHERE rn <= 2;
9.2 LAG / LEAD
前后行引用——用于计算同比、环比:
-- 按工资排序,看前后人的工资
SELECT name, salary,
LAG(salary, 1) OVER (ORDER BY salary) AS prev_salary,
LEAD(salary, 1) OVER (ORDER BY salary) AS next_salary,
salary - LAG(salary, 1) OVER (ORDER BY salary) AS diff
FROM employees;
-- 用金额分析订单增长
SELECT
order_date,
SUM(amount) AS daily_amount,
LAG(SUM(amount), 1) OVER (ORDER BY order_date) AS prev_day,
SUM(amount) - LAG(SUM(amount), 1) OVER (ORDER BY order_date) AS growth
FROM orders
GROUP BY order_date
ORDER BY order_date;
9.3 SUM / AVG 窗口
移动/累计统计:
-- 累计求和(running total)
SELECT
order_date,
product,
amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
-- 分区累计(每个员工的累计订单额)
SELECT
e.name,
o.order_date,
o.amount,
SUM(o.amount) OVER (
PARTITION BY e.id
ORDER BY o.order_date
) AS running_total
FROM employees e
LEFT JOIN orders o ON e.id = o.employee_id;
-- 移动平均(最近 3 天的平均)
SELECT
order_date,
SUM(amount) AS daily_amount,
AVG(SUM(amount)) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3day
FROM orders
GROUP BY order_date
ORDER BY order_date;
💡 窗口函数框架语法:
SUM(…) OVER ( PARTITION BY ... -- 分组(可选) ORDER BY ... -- 排序(可选) ROWS BETWEEN ... -- 窗口范围(可选) )窗口范围选项:
ROWS UNBOUNDED PRECEDING— 从分区第一行到当前行ROWS BETWEEN 2 PRECEDING AND CURRENT ROW— 前两行到当前行ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING— 前后各一行RANGE和ROWS的区别:RANGE按值范围,ROWS按行数
10. 索引与性能——别让你的查询慢成狗
为什么查询慢?
数据库的数据存在磁盘上。没有索引 = 全表扫描 = 从头到尾翻一遍。
有索引 = 像书的目录一样直接翻到那一页。
什么时候建索引?
-- 在经常 WHERE 的列上建索引
CREATE INDEX idx_employees_department ON employees(department);
-- 在经常 JOIN 的列上建索引
CREATE INDEX idx_orders_employee_id ON orders(employee_id);
-- 复合索引(多列联合查询时)
CREATE INDEX idx_employees_dept_salary ON employees(department, salary);
索引的代价
- 写入变慢(建索引的表,INSERT/UPDATE/DELETE 要同时维护索引)
- 占用磁盘空间
- 不是越多越好
怎么知道查询慢不慢?
-- SQLite
EXPLAIN QUERY PLAN SELECT * FROM employees WHERE salary > 15000;
-- MySQL
EXPLAIN SELECT * FROM employees WHERE salary > 15000;
-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM employees WHERE salary > 15000;
💡 索引黄金法则
| 应该建索引 | 不应该建索引 |
|---|---|
| 频繁出现在 WHERE 中的列 | 几乎不查询的列 |
| 频繁 JOIN 的列 | 值非常少的列(如性别、布尔值) |
| 需要排序/分组的列 | 频繁更新的列 |
| 高选择性的列(唯一值多) | 大文本/BLOB列 |
11. 实战篇 —— 从零到一搭一个真实场景
11.1 电商订单系统
-- ========== 建表 ==========
CREATE TABLE customers (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE,
city TEXT,
created_at DATE DEFAULT CURRENT_DATE
);
CREATE TABLE products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
category TEXT,
price REAL NOT NULL,
stock INTEGER DEFAULT 0
);
CREATE TABLE orders_shop (
id INTEGER PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
order_date DATE DEFAULT CURRENT_DATE,
status TEXT DEFAULT 'pending' -- pending, paid, shipped, cancelled
);
CREATE TABLE order_items (
id INTEGER PRIMARY KEY,
order_id INTEGER REFERENCES orders_shop(id),
product_id INTEGER REFERENCES products(id),
quantity INTEGER NOT NULL,
unit_price REAL NOT NULL
);
-- ========== 插入示例数据 ==========
INSERT INTO customers (name, email, city) VALUES
('张三', 'zhangsan@email.com', '北京'),
('李四', 'lisi@email.com', '上海'),
('王五', 'wangwu@email.com', '广州');
INSERT INTO products (name, category, price, stock) VALUES
('iPhone 16', '手机', 8999, 50),
('MacBook Pro', '电脑', 14999, 30),
('AirPods Pro', '耳机', 1999, 100),
('iPad Air', '平板', 4999, 40);
INSERT INTO orders_shop (customer_id, order_date, status) VALUES
(1, '2024-06-01', 'paid'),
(1, '2024-06-05', 'shipped'),
(2, '2024-06-10', 'paid'),
(3, '2024-06-15', 'cancelled');
INSERT INTO order_items (order_id, product_id, quantity, unit_price) VALUES
(1, 1, 1, 8999),
(1, 3, 2, 1999),
(2, 2, 1, 14999),
(3, 4, 1, 4999);
-- ========== 实战查询 ==========
-- 1. 某客户的订单总金额
SELECT c.name, SUM(oi.quantity * oi.unit_price) AS total_spent
FROM customers c
JOIN orders_shop os ON c.id = os.customer_id
JOIN order_items oi ON os.id = oi.order_id
WHERE os.status != 'cancelled'
GROUP BY c.id;
-- 2. 热销产品排行榜
SELECT p.name,
SUM(oi.quantity) AS total_sold,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM products p
JOIN order_items oi ON p.id = oi.product_id
JOIN orders_shop os ON oi.order_id = os.id
WHERE os.status != 'cancelled'
GROUP BY p.id
ORDER BY total_sold DESC;
-- 3. 各城市消费排行
SELECT c.city,
COUNT(DISTINCT c.id) AS customer_count,
SUM(oi.quantity * oi.unit_price) AS total_revenue
FROM customers c
JOIN orders_shop os ON c.id = os.customer_id
JOIN order_items oi ON os.id = oi.order_id
WHERE os.status != 'cancelled'
GROUP BY c.city
ORDER BY total_revenue DESC;
-- 4. 复购分析(购超过1单的客户)
SELECT c.name, c.email, COUNT(os.id) AS order_count
FROM customers c
JOIN orders_shop os ON c.id = os.customer_id
WHERE os.status != 'cancelled'
GROUP BY c.id
HAVING COUNT(os.id) > 1;
11.2 用户行为分析
-- ========== 用户行为日志表 ==========
CREATE TABLE event_logs (
id INTEGER PRIMARY KEY,
user_id INTEGER,
event_name TEXT, -- 'page_view', 'click', 'purchase', 'login'
page_url TEXT,
event_time TIMESTAMP,
duration_seconds INTEGER
);
-- 插入示例
INSERT INTO event_logs VALUES
(1, 1, 'login', '/login', '2024-06-01 09:00:00', NULL),
(2, 1, 'page_view', '/products', '2024-06-01 09:05:00', 30),
(3, 1, 'click', '/products/1', '2024-06-01 09:05:30', NULL),
(4, 1, 'purchase', '/checkout', '2024-06-01 09:10:00', 120),
(5, 2, 'login', '/login', '2024-06-01 10:00:00', NULL),
(6, 2, 'page_view', '/search', '2024-06-01 10:02:00', 15),
(7, 2, 'login', '/login', '2024-06-02 14:00:00', NULL),
(8, 2, 'page_view', '/products', '2024-06-02 14:05:00', 45),
(9, 2, 'purchase', '/checkout', '2024-06-02 14:10:00', 90);
-- 1. 用户会话分析(用户每次登录到购买的事件序列)
WITH user_sessions AS (
SELECT *,
SUM(CASE WHEN event_name = 'login' THEN 1 ELSE 0 END)
OVER (PARTITION BY user_id ORDER BY event_time) AS session_id
FROM event_logs
)
SELECT user_id, session_id,
MIN(event_time) AS session_start,
MAX(event_time) AS session_end,
GROUP_CONCAT(event_name, ' → ') AS event_sequence
FROM user_sessions
GROUP BY user_id, session_id;
-- 2. 转化漏斗(从访问到购买的转化率)
WITH funnel AS (
SELECT
COUNT(DISTINCT CASE WHEN event_name = 'page_view' THEN user_id END) AS visited,
COUNT(DISTINCT CASE WHEN event_name = 'click' THEN user_id END) AS clicked,
COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN user_id END) AS purchased
FROM event_logs
)
SELECT visited, clicked,
ROUND(1.0 * clicked / visited * 100, 1) AS click_rate,
purchased,
ROUND(1.0 * purchased / visited * 100, 1) AS purchase_rate
FROM funnel;
-- 3. 每日活跃用户(DAU)
SELECT DATE(event_time) AS day,
COUNT(DISTINCT user_id) AS dau
FROM event_logs
GROUP BY day
ORDER BY day;
12. 常见陷阱与最佳实践
❌ 坑 1:NULL 参与运算
-- NULL + 10 = NULL (不是 10!)
-- NULL = NULL → 未知(不是 TRUE!)
-- NULL IN (1,2,3) → 未知(不是 FALSE!)
-- ✅ 正确处理
SELECT name, salary + IFNULL(bonus, 0) AS total_salary FROM employees;
SELECT * FROM employees WHERE bonus IS NULL; -- ✅
SELECT * FROM employees WHERE bonus = NULL; -- ❌
SELECT * FROM employees WHERE bonus IN (500, 1000); -- NULL 不会匹配
SELECT * FROM employees WHERE bonus NOT IN (500, 1000); -- ⚠️ NULL 行会被排除!
❌ 坑 2:HAVING vs WHERE
-- ❌ 错误
SELECT department, COUNT(*)
FROM employees
WHERE COUNT(*) > 2 -- WHERE 里不能有聚合函数
GROUP BY department;
-- ✅ 正确
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING COUNT(*) > 2;
❌ 坑 3:GROUP BY 非聚合列
-- ❌ 不严谨(在 ONLY_FULL_GROUP_BY 模式下会报错)
SELECT name, department, AVG(salary)
FROM employees
GROUP BY department;
-- ✅ 正确:name 要么在 GROUP BY 里,要么是聚合的
SELECT department, AVG(salary)
FROM employees
GROUP BY department;
❌ 坑 4:表别名的引用
-- SELECT 中定义的别名不能在 WHERE 里用
SELECT salary * 12 AS annual FROM employees
WHERE annual > 180000; -- ❌ 报错!SELECT 在第 5 步执行
-- ✅ 用子查询/CTE
SELECT * FROM (
SELECT salary * 12 AS annual FROM employees
) WHERE annual > 180000;
✅ 最佳实践清单
| 实践 | 说明 |
|---|---|
| 大写关键字 | SELECT, FROM, WHERE 全大写,表名列名小写 |
| 合理缩进 | JOIN、WHERE、GROUP BY 各占一行 |
| 用别名 | 表名太长时用缩写 e、o |
| 不用 SELECT * | 明确列出需要的列 |
| 优先 CTE | 别写超过 20 行的嵌套子查询 |
| 加注释 | 复杂逻辑一定写注释 |
| 先 EXPLAIN | 慢查询先看执行计划 |
| 事务包裹 | 多个 UPDATE/DELETE 用 BEGIN / COMMIT |
| 外键约束 | 建表时定义外键,保证数据完整性 |
13. 给新手的进阶路线图
基础 (1-2 周)
├── SELECT / INSERT / UPDATE / DELETE
├── WHERE / ORDER BY / LIMIT / DISTINCT
├── LIKE / IN / BETWEEN / IS NULL
└── 理解 SQL 执行顺序
进阶 (2-4 周)
├── JOIN (INNER / LEFT / RIGHT / SELF)
├── GROUP BY + HAVING
├── 子查询 (标量 / 表 / EXISTS)
├── 联合查询 UNION / UNION ALL
└── CASE WHEN 条件表达式
高级 (4-8 周)
├── 窗口函数 (RANK / LAG / SUM OVER)
├── CTE (WITH) + 递归 CTE
├── 索引原理 + EXPLAIN 分析
├── 事务与隔离级别
└── 数据库设计范式
实战 (持续)
├── 每天在 LeetCode / HackerRank 刷 1-2 道 SQL 题
├── 读开源项目的 SQL 迁移文件
├── 尝试优化真实项目中的慢查询
└── 学习 ORM 生成的 SQL,对比手写 SQL
📚 推荐资源
练习平台:
- LeetCode SQL 题库 — 面试刷题首选
- HackerRank SQL — 循序渐进
- SQLZoo — 交互式学习
- SQL Bolt — 适合零基础
- Pgexercises — PostgreSQL 专项
游戏化学习:
- SQLBolt — 闯关式
- Schemaverse — 用 SQL 玩太空策略游戏
- SQL Murder Mystery — 用 SQL 破案
深度阅读:
- 《SQL 必知必会》(Ben Forta) — 入门圣经
- 《高性能 MySQL》(Baron Schwartz) — 进阶必读
- Use The Index, Luke! (https://use-the-index-luke.com/) — 索引深度
14. 参考与出处
本文内容凝聚了以下精华来源,致谢每一位写出这些优秀文章/书籍/工具的人:
📖 书籍
- 《SQL 必知必会》 (Sams Teach Yourself SQL in 10 Minutes) — Ben Forta
- 全球销量最高的 SQL 入门书,没有之一。每节课 10 分钟,4 周拿下 SQL 基础。
- 《高性能 MySQL》 (High Performance MySQL, 3rd Edition) — Baron Schwartz 等
- MySQL 领域的圣经,索引、查询优化、架构设计的终极指南。
- 《SQL 反模式》 (SQL Antipatterns) — Bill Karwin
- 告诉你 SQL 中常见的错误设计模式,每个开发者都应该读。
- 《SQL 经典实例》 (SQL Cookbook) — Anthony Molinaro
- 100+ 个实战 SQL 技巧,随手翻随时学。
🌐 文章与论坛
- Stack Overflow — 任何 SQL 问题,这里几乎都有答案
- Modern SQL — Markus Winand 的 SQL 现代语法解读
- Use The Index, Luke! — Markus Winand 的另一力作,索引的终极指南
- PostgreSQL Tutorial — 最清晰的 PG 教程
- SQLZoo — 交互式 SQL 学习,支持多种数据库
- LeetCode Discussion — 面试 SQL 题讨论区,高手如云
🛠 工具
- db-fiddle.com — 在线写 SQL,支持 MySQL/PG/SQLite
- sqliteonline.com — 零安装的 SQLite 在线工具
- DBeaver — 跨平台数据库管理工具,GUI 良心之作
- SQLite Browser — SQLite 可视化工具
🏆 值得关注的人
- Markus Winand — SQL 性能优化大师,Modern SQL 布道者
- Baron Schwartz — 数据库性能领域泰斗
- Bill Karwin — SQL 反模式作者,Stack Overflow 活跃答主
- Jennifer Widom — Stanford 数据库课程教授,她的课视频在 YouTube 免费看
🎁 最后的赠言
“Talk is cheap. Show me the code.” — Linus Torvalds
对于 SQL 来说,这句话要改成:
“Reading is cheap. Show me the EXPLAIN PLAN.”
别光看教程了。打开一个 SQL 终端,把上面的例子亲手敲一遍。
如果卡住了:
- 先自己查 Stack Overflow
- 查不到再问 ChatGPT
- 还不行就重新问
本文由 taohua 编写,结合了 5+ 年编程经验与 SQL 实战积累。
最后更新:2026-06-09
🚀 如果把 SQL 学好,代码之路就打下了半壁江山。加油!
更多推荐



所有评论(0)