← 返回博客 AI 科技

AI 姿态识别是如何工作的?深度解析

当你在 Fiwa AI 中做深蹲时,手机摄像头能在不到 50 毫秒内识别出你的膝盖角度是否超标、肩膀是否对称、脊柱是否保持中立位。这背后,是一套精密的 AI 管线在实时运转。本文带你拆解它的每一层。

第一层:关键点检测(Keypoint Detection)

Fiwa AI 的姿态识别基于深度学习模型,核心任务是从摄像头画面中找到人体的 33 个关键骨骼点,包括鼻尖、双肩、双肘、双腕、双髋、双膝、双踝等。

模型输入的是每一帧 RGB 图像,输出的是每个关键点的 (x, y) 坐标以及置信度分数。置信度低于阈值的点会被忽略或插值补全,以应对遮挡和模糊场景。

这个步骤完全在设备端运行,不上传任何图像数据,延迟控制在 30ms 以内。

第二层:角度计算(Angle Computation)

拿到关键点坐标后,系统会实时计算各个关节的弯曲角度。以膝关节为例:

  • 取髋关节点 H、膝关节点 K、踝关节点 A 的坐标
  • 计算向量 KH(膝到髋)和向量 KA(膝到踝)
  • 用反余弦函数求两向量夹角:θ = arccos(KH · KA / |KH| × |KA|)

深蹲的标准膝角通常在 90°–100° 之间。Fiwa AI 会根据你选择的动作类型,动态调整各关节的判定阈值。

第三层:动作状态机(Rep State Machine)

检测到角度后,系统需要判断"什么算完成了一次动作"。Fiwa AI 使用一个有限状态机来追踪动作阶段:

  • READY:站立起始位,膝角 > 160°
  • DOWN:下降中,膝角持续减小
  • BOTTOM:到达最低点,膝角 < 100°(深蹲)
  • UP:起身中,膝角持续增大
  • COMPLETE:回到起始位,计一次有效次数

状态转换需要满足角度阈值和时间窗口双重条件,以过滤抖动和误判。

第四层:形态评分(Form Scoring)

每次动作结束后,系统会综合以下指标给出 0–100 的评分:

  • 膝盖追踪一致性(与脚尖方向的偏差)
  • 躯干前倾角度稳定性
  • 左右对称性(两侧关节角度差值)
  • 运动节奏(下降与上升速度比)
  • 最低点深度是否达标

每个维度加权求和,最终得分越高说明动作越标准。低于 80 分会触发具体的改进提示。

为什么不需要上传视频?

整套流程全部在手机本地完成:摄像头 → 骨骼点检测模型 → 角度计算 → 状态机 → 评分。没有任何原始视频或图像数据离开你的设备。这既保护了隐私,也让 Fiwa AI 在无网络环境下同样可用。

未来方向

我们正在研究将 3D 姿态估计引入 Fiwa AI,结合单目摄像头的深度推断,进一步提升侧向运动(如侧弓步、侧平举)的识别精度。敬请期待。

← Back to Blog AI Technology

How AI Pose Detection Works: A Deep Dive

When you perform a squat in Fiwa AI, your phone can detect whether your knee angle is off, your shoulders are asymmetric, or your spine has lost its neutral position — all within 50 milliseconds. Behind that real-time feedback is a sophisticated AI pipeline. Here's how it works, layer by layer.

Layer 1: Keypoint Detection

Fiwa AI's pose recognition is built on a deep learning model whose core task is locating 33 skeletal keypoints on the human body in each camera frame — including the nose, shoulders, elbows, wrists, hips, knees, and ankles.

The model takes each RGB video frame as input and outputs the (x, y) coordinates of each keypoint along with a confidence score. Points with confidence below a threshold are ignored or interpolated to handle occlusion and blur.

This step runs entirely on-device with no image data uploaded, and latency is kept under 30ms.

Layer 2: Angle Computation

With keypoint coordinates in hand, the system calculates joint angles in real time. For the knee joint:

  • Take the coordinates of hip point H, knee point K, and ankle point A
  • Compute vectors KH (knee to hip) and KA (knee to ankle)
  • Apply the arc-cosine formula: θ = arccos(KH · KA / |KH| × |KA|)

A standard squat knee angle falls between 90° and 100°. Fiwa AI dynamically adjusts joint thresholds based on the exercise you've selected.

Layer 3: Rep State Machine

Once angles are available, the system needs to determine "what counts as a completed rep." Fiwa AI uses a finite state machine to track movement phases:

  • READY: Standing start position, knee angle > 160°
  • DOWN: Descending, knee angle steadily decreasing
  • BOTTOM: Lowest point reached, knee angle < 100° (squat)
  • UP: Rising, knee angle steadily increasing
  • COMPLETE: Returned to start — one valid rep counted

State transitions require both an angle threshold and a time window condition, filtering out tremor and false positives.

Layer 4: Form Scoring

After each rep, the system calculates a 0–100 score based on:

  • Knee tracking consistency (deviation from toe direction)
  • Torso forward-lean angle stability
  • Left-right symmetry (angle delta between both sides)
  • Movement tempo (descent-to-ascent speed ratio)
  • Whether sufficient depth was achieved

Each dimension is weighted and summed. Scores below 80 trigger specific improvement cues.

Why No Video Upload?

The entire pipeline runs locally on your phone: camera → skeleton detection model → angle computation → state machine → scoring. No raw video or image data ever leaves your device. This protects your privacy and means Fiwa AI works fully offline.

What's Next

We're actively researching 3D pose estimation for Fiwa AI — using monocular depth inference to dramatically improve accuracy for lateral movements like side lunges and lateral raises. Stay tuned.