运维¶
每段遇到时直接抄。
1. 查状态¶
curl -s http://112.124.27.213/health
# 期望 {"status":"ok","version":"x.y.z"}
ssh root@112.124.27.213 'systemctl status --no-pager xiangqin'
ssh root@112.124.27.213 'journalctl -u xiangqin --no-pager -n 50'
ssh root@112.124.27.213 'journalctl -u xiangqin -f' # 跟日志
ssh root@112.124.27.213 'tail -50 /var/log/xiangqin-backup.log'
2. 重启 / 停 / 启¶
ssh root@112.124.27.213 'systemctl restart xiangqin'
ssh root@112.124.27.213 'systemctl stop xiangqin'
ssh root@112.124.27.213 'systemctl start xiangqin'
3. 发布新代码(SOP: deploy)¶
前置:
- [ ] 本地
uv run pytest -q全绿 - [ ] epsilon
curl http://112.124.27.213/health基线响应 - [ ] 改了
pyproject.toml依赖 →uv lock已跑 - [ ] 改了版本号 →
src/xiangqin/__init__.py的__version__同步
deploy.sh 4 步:
- rsync 代码到
root@epsilon:/opt/xiangqin/(排除.venv / data / .git / *.db*) - ssh root 跑
uv sync(UV_PYTHON_INSTALL_DIR=/opt/uv-python) chown -R xiangqin:xiangqin让 app user 能读- 复制 systemd unit + daemon-reload + restart
4. 发布 PyPI(SOP: publish)¶
canary(每次开发完都走):
cd ~/xiangqin
bash scripts/publish-pypi.sh --canary
# 干净环境装测
python3 -m venv /tmp/xq-verify
/tmp/xq-verify/bin/pip install --pre acong-tech-xiangqin
live:
vault 里的 pypi.api_tokens[name=acong-tech-publish] 负责鉴权。
5. 备份 + 恢复(SOP: restore)¶
备份:/etc/cron.d/xiangqin-backup 每日 02:00 跑,打包 sqlite → OSS agentaily-backup-xiangqin-prod/daily/。
恢复:
ssh root@112.124.27.213
bash /opt/xiangqin/scripts/restore.sh # 取最新
bash /opt/xiangqin/scripts/restore.sh --date 20260422 # 指定日
恢复演练(不改生产):
bash /opt/xiangqin/scripts/restore.sh --dry-run --target /tmp/restore-test.db
sqlite3 /tmp/restore-test.db 'PRAGMA integrity_check'
sqlite3 /tmp/restore-test.db 'SELECT count(*) FROM users'
6. 凭证轮换(走 vault)¶
见 接入 vault。
7. 紧急回滚¶
# 代码回滚(上一 commit)
ssh root@112.124.27.213
cd /opt/xiangqin
git log --oneline -5
git reset --hard <上个 commit>
systemctl restart xiangqin
# 数据回滚
bash /opt/xiangqin/scripts/restore.sh --date <某天>
systemctl restart xiangqin
8. 退路 —— epsilon 挂了¶
xq health超时 → 客户端卡住- 先
ping 112.124.27.213确认网络 - 再 ssh 上机器(阿里云控制台可以 VNC)
- 起不来就从上次成功的备份恢复到另一台机器 + 改 DNS(xq.agentaily.com)
- 给用户公告:"临时故障,1-6 小时恢复,余额不丢"