A live integrity tool hands you two things: a score and a timeline of flagged moments. Used well, they make you a sharper interviewer. Used badly, they turn you into a prosecutor working from one data point. The difference is entirely in how you read them.
One flag is not proof
Start with the math, because it is unforgiving. When real cheating is rare in your pool, false positives outnumber true positives even with a very accurate detector — a consequence of the base-rate fallacy first described by Tversky and Kahneman. A vivid example: Vanderbilt disabled Turnitin's AI detector after noting that even a 1% false-positive rate, across its 75,000 papers, meant about 750 students wrongly flagged. The detectors themselves are shaky: OpenAI shut down its own classifier for low accuracy, and GPT detectors misflag 61% of essays from non-native English speakers as AI. Treat any single flag as a question, never an answer.
Read the signals together, not one "gotcha"
The reliable read comes from convergence. Selection science is explicit that you should weigh converging lines of evidence rather than a single indicator. The test-security field says the same thing about its own anomaly data: it "does not necessarily confirm cheating" and "must be supplemented with other information". A paste spike on its own means little. A paste spike that lines up with a focus switch, an off-screen gaze, and an answer that suddenly outpaces everything before it is a pattern worth a closer look.
Read the timeline in context
A flag is a timestamp plus what was happening around it. Your job is to separate normal behavior from a real pattern.
- A glance away is not a tell. Breaking eye contact is how people think — averting your gaze actually improves recall and accuracy on hard questions. A pause to think looks nothing like sustained reading off a fixed line.
- A flag can be an artifact. Automated proctoring famously flags some groups far more than others with no underlying difference in cheating. If a signal can fire on appearance or accent, weight it accordingly.
- Read it like a security analyst reads telemetry. Events that look benign alone can matter together — judge each flag against a baseline and across time, not as a single snapshot.
Keep due process
Once you act on a flag, the law expects a human to stay in charge. The EU AI Act requires human oversight that guards against "automation bias" — over-relying on the system's output, and the GDPR gives candidates the right to contest a solely automated decision and obtain human intervention. Practically: do not auto-reject on a score. Give the candidate a chance to explain — a flag can reflect a disability or a tool you did not anticipate, which is why the EEOC and DOJ both stress accommodations and human review. And remember the employer, not the vendor, owns the decision and the liability.
Turn evidence into a fair decision
NIST frames it cleanly: an AI system can defer to a human or serve as "an additional opinion," and the human-AI loop can amplify bias if you let the score do the deciding. So treat the integrity score as one input alongside the strongest predictor you have — the structured interview, validity ~.42 — and watch for confirmation bias once a flag appears. The score tells you where to look. The conversation tells you what it means. You make the call.
That is the whole point of a timeline: not to accuse, but to give you evidence you can actually reason about. Trueyy is built to surface those signals in context, with the timestamp, so the decision stays yours.
Sources
- Judgment under Uncertainty — Tversky & Kahneman, Science, 1974
- Why we disabled Turnitin's AI detector — Vanderbilt, 2023
- OpenAI scuttles its AI-text detector — TechCrunch, 2023
- GPT detectors are biased against non-native writers — Liang et al., 2023
- Validation of AI-based assessments — SIOP, 2023
- Using data forensics to detect potential fraud — ATP, 2025
- Averting the gaze facilitates remembering — Glenberg et al., 1998
- Disparities in automated proctoring — Yoder-Himes et al., Frontiers in Education, 2022
- EU AI Act Article 14 — human oversight · GDPR Article 22 — automated decisions
- NIST AI Risk Management Framework, 2023
- Revised validity of selection methods — Sackett et al., 2022
