跳转至

运维

每段遇到时直接抄。

1. 查状态

curl -s http://112.124.27.213/health
# 期望 {"status":"ok","version":"x.y.z"}

ssh root@112.124.27.213 'systemctl status --no-pager xiangqin'
ssh root@112.124.27.213 'journalctl -u xiangqin --no-pager -n 50'
ssh root@112.124.27.213 'journalctl -u xiangqin -f'              # 跟日志
ssh root@112.124.27.213 'tail -50 /var/log/xiangqin-backup.log'

2. 重启 / 停 / 启

ssh root@112.124.27.213 'systemctl restart xiangqin'
ssh root@112.124.27.213 'systemctl stop xiangqin'
ssh root@112.124.27.213 'systemctl start xiangqin'

3. 发布新代码(SOP: deploy)

前置:

  • [ ] 本地 uv run pytest -q 全绿
  • [ ] epsilon curl http://112.124.27.213/health 基线响应
  • [ ] 改了 pyproject.toml 依赖 → uv lock 已跑
  • [ ] 改了版本号 → src/xiangqin/__init__.py__version__ 同步
cd ~/xiangqin
bash scripts/deploy.sh

deploy.sh 4 步:

  1. rsync 代码到 root@epsilon:/opt/xiangqin/(排除 .venv / data / .git / *.db*
  2. ssh root 跑 uv syncUV_PYTHON_INSTALL_DIR=/opt/uv-python
  3. chown -R xiangqin:xiangqin 让 app user 能读
  4. 复制 systemd unit + daemon-reload + restart

4. 发布 PyPI(SOP: publish)

canary(每次开发完都走):

cd ~/xiangqin
bash scripts/publish-pypi.sh --canary
# 干净环境装测
python3 -m venv /tmp/xq-verify
/tmp/xq-verify/bin/pip install --pre acong-tech-xiangqin

live:

bash scripts/publish-pypi.sh --live

vault 里的 pypi.api_tokens[name=acong-tech-publish] 负责鉴权。

5. 备份 + 恢复(SOP: restore)

备份:/etc/cron.d/xiangqin-backup 每日 02:00 跑,打包 sqlite → OSS agentaily-backup-xiangqin-prod/daily/

恢复:

ssh root@112.124.27.213
bash /opt/xiangqin/scripts/restore.sh              # 取最新
bash /opt/xiangqin/scripts/restore.sh --date 20260422   # 指定日

恢复演练(不改生产):

bash /opt/xiangqin/scripts/restore.sh --dry-run --target /tmp/restore-test.db
sqlite3 /tmp/restore-test.db 'PRAGMA integrity_check'
sqlite3 /tmp/restore-test.db 'SELECT count(*) FROM users'

6. 凭证轮换(走 vault)

接入 vault

7. 紧急回滚

# 代码回滚(上一 commit)
ssh root@112.124.27.213
cd /opt/xiangqin
git log --oneline -5
git reset --hard <上个 commit>
systemctl restart xiangqin

# 数据回滚
bash /opt/xiangqin/scripts/restore.sh --date <某天>
systemctl restart xiangqin

8. 退路 —— epsilon 挂了

  1. xq health 超时 → 客户端卡住
  2. ping 112.124.27.213 确认网络
  3. 再 ssh 上机器(阿里云控制台可以 VNC)
  4. 起不来就从上次成功的备份恢复到另一台机器 + 改 DNS(xq.agentaily.com)
  5. 给用户公告:"临时故障,1-6 小时恢复,余额不丢"