Recruiting
REFINe

Reasoning Enhancement With Feedback From a Generative AI in Nephrology (REFINe): A Randomized Evaluation of Generative AI Support in Nephrology Diagnosis

0 criteria met from your profileSee at a glance how your profile meets each eligibility criteria.
What is being tested

AI suggestion

Other
Who is being recruted

Disease

+ Pathologic Processes
+ Pathological Conditions, Signs and Symptoms
Over 18 Years
See all eligibility criteria
How is the trial designed

Diagnostic Study

Interventional
Study Start: November 2025
See protocol details

Summary

Principal SponsorUniversity Hospital, Lille
Study ContactRaphaël BENTEGEAC, MD, MPH
Last updated: January 21, 2026
Sourced from a government-validated database.Claim as a partner
Study start date: November 20, 2025Actual date on which the first participant was enrolled.

This study evaluates whether providing clinicians with real-time diagnostic suggestions from a high-reasoning large language model (GPT-5) improves diagnostic accuracy, confidence, and efficiency when solving nephrology clinical vignettes. Prior to selecting the model for the trial, the research team benchmarked several state-of-the-art models across a pilot set of nephrology cases, including: GPT-5, GPT-5-mini, O3, GPT-4o, Llama-4 Maverick-17B, Gemini-2.5-Pro, Qwen-3 VL-235B Thinking, DeepSeek-V3.2-Exp, MedGEMMA-27B, Claude Sonnet-4.5, and Magistral-Medium-2509. GPT-5 (high-reasoning) demonstrated the highest diagnostic performance, stability, and interpretability, and was selected as the AI system used in the intervention arm. Participants include medical students, residents, fellows, and practicing physicians. After creating an account, participants complete a demographic questionnaire (specialty, years of experience, practice type, age category, AI familiarity) and must explicitly agree to the use of these data for research purposes before accessing the vignettes. No directly identifying information is collected. Participants are randomized (with stratification by professional status) to either the AI-supported arm or the control arm. Each participant is assigned 10 nephrology vignettes in French or English and may complete them over multiple sessions. Once a vignette is submitted, it cannot be revisited ("no backtracking"). Completion time per vignette is automatically recorded. Control Arm Participants view each vignette and provide up to three diagnoses ("Top-3"), followed by a confidence rating (0-10). AI-Supported Arm Participants first enter an initial Top-3 diagnosis and confidence rating without AI assistance. The system then displays GPT-5's diagnostic suggestions, after which participants may revise their diagnoses once. The vignette is locked after submission. The study collects: * initial and final diagnoses, * confidence ratings before and (if applicable) after AI suggestions, * completion times, * participant demographic variables, * and the AI model's own diagnostic outputs. Partial completion is permitted; all completed vignettes contribute to the analysis. Primary and secondary outcomes include diagnostic accuracy (Top-3 and Top-1), accuracy improvement before vs. after AI, changes in diagnostic confidence, AI-induced diagnostic errors, human-versus-AI benchmarking, completion-time efficiency metrics, and the proportion of assigned vignettes completed. The primary analysis will compare diagnostic accuracy between the control arm (physicians alone) and the experimental arm (physicians assisted by the AI model). Accuracy is analyzed as a binary outcome (correct vs incorrect diagnosis). Because each participant evaluates multiple clinical vignettes, accuracy will be modeled using a mixed-effects logistic regression with a fixed effect for study arm and random intercepts for both participant and vignette. This approach accounts for clustering and varying difficulty across cases. The primary hypothesis test uses a two-sided α = 0.05. Effect sizes will be reported as odds ratios with 95% confidence intervals. Secondary analyses will explore whether accuracy varies by demographic factors (e.g., experience level, specialty) using interaction terms. Because each participant evaluates multiple vignettes, the team also performed simulation-based power analyses using mixed-effects logistic regression models with random intercepts for both participant and vignette, assuming an intra-participant ICC of 0.10. Under these assumptions, a total sample of 100 participants (50 per arm) with 10 vignettes per participant provides >99% power to detect a clinically meaningful improvement in diagnostic accuracy. The investigators therefore plan to enroll approximately 100 participants overall. This study aims to quantify whether AI-augmented reasoning meaningfully improves diagnostic performance and decision-making when clinicians evaluate complex nephrology cases.

Official TitleReasoning Enhancement With Feedback From a Generative AI in Nephrology (REFINe): A Randomized Evaluation of Generative AI Support in Nephrology Diagnosis 
Principal SponsorUniversity Hospital, Lille
Study ContactRaphaël BENTEGEAC, MD, MPH
Last updated: January 21, 2026
Sourced from a government-validated database.Claim as a partner

Protocol

This section provides details of the study plan, including how the study is designed and what the study is measuring.
Design Details
100 patients to be enrolledTotal number of participants that the clinical trial aims to recruit.
Diagnostic Study
Diagnostic studies focus on improving how we detect or confirm a disease. They test new tools or techniques that could provide faster or more accurate diagnoses.

How participants are assigned to different groups/arms
In this clinical study, participants are placed into groups randomly, like flipping a coin. This ensures that the study is fair and unbiased, making the results more reliable. By assigning participants by chance, researchers can better compare treatments without external influences.

Other Ways to Assign Participants
Non-randomized allocation
: Participants are assigned based on specific factors, such as their medical condition or a doctor's decision.

None (Single-arm trial)
: If the study has only one group, all participants receive the same treatment, and no allocation is needed.

How treatments are given to participants
Participants are divided into different groups, each receiving a specific treatment at the same time. This helps researchers compare how well different treatments work against each other.

Other Ways to Assign Treatments
Single-group assignment
: Everyone gets the same treatment.

Cross-over assignment
: Participants switch between treatments during the study.

Factorial assignment
: Participants receive different combinations of treatments.

Sequential assignment
: Participants receive treatments one after another in a specific order, possibly based on individual responses.

Other assignment
: Treatment assignment does not follow a standard or predefined design.

How the effectiveness of the treatment is controlled
In a non placebo-controlled study, no participants receive an inert substance (placebo) to compare outcomes. Instead, all participants receive either the experimental treatment or an alternative treatment (often the Standard of Care). This method allows researchers to compare the effects of the experimental treatment with those of a different active intervention, rather than a placebo.

Other Options
Placebo-Controlled
: A placebo is used to compare the effects of the experimental treatment with those of an inert substance, isolating the true treatment effect.

How the interventions assigned to participants is kept confidential
Everyone involved in the study knows which treatment is being given. This is typically used when it's not possible or necessary to hide the treatment details from participants or researchers.

Other Ways to Mask Information
Single-blind
: Participants do not know which treatment they are receiving, but researchers do.

Double-blind
: Neither participants nor researchers know which treatment is given.

Triple-blind
: Participants, researchers, and outcome assessors do not know which treatment is given.

Quadruple-blind
: Participants, researchers, outcome assessors, and care providers all do not know which treatment is given.

Eligibility

Researchers look for people who fit a certain description, called eligibility criteria: person's general health condition or prior treatments.
Conditions
Criteria
Any sexBiological sex of participants that are eligible to enroll.
Over 18 YearsRange of ages for which participants are eligible to join.
Healthy volunteers allowedIf individuals who are healthy and do not have the condition being studied can participate.
Conditions
Pathology
Disease
Pathologic Processes
Pathological Conditions, Signs and Symptoms
Criteria

Inclusion Criteria: Adults aged 18 years or older. Able to read and answer clinical vignettes in English or French. Access to a computer or smartphone with an internet connection. Provides informed consent online. Participants are expected to have at least basic medical training (e.g., medical students, residents, fellows, or practicing clinicians), although no formal verification is required. Exclusion Criteria: Individuals under 18 years of age. Inability to complete online study procedures. Prior involvement in the design, development, or evaluation of the AI system used in this study.


Study Plan

Find out more about all the medication administered in this study, their detailed description and what they involve.
Treatment Groups
Study Objectives
One single intervention group 

is designated in this study

This study does not include a placebo group 

Treatment Groups
Group I
Experimental
Participants in this arm will complete the same clinical case vignettes as the control group. For each case, they will receive a suggested diagnosis generated by a large language model (GPT-5, high-reasoning configuration), which was selected after internal benchmarking. Participants can review the AI suggestion before entering their own final diagnostic answer. No additional information, prompts, or coaching is provided. The intervention consists solely of displaying the AI-generated diagnostic suggestion during the case-solving task.

This intervention consists of displaying an AI-generated diagnostic suggestion during the clinical case-solving task. After reading each vignette, participants see the top diagnostic proposal produced by a large language model (GPT-5, high-reasoning configuration), selected after internal benchmarking. The AI suggestion appears once per vignette and cannot be requested again or modified. Participants may revise their diagnostic answer after viewing the suggestion, but they cannot return to the vignette later. No additional guidance, coaching, or interactive features are provided.
Study Objectives
Primary Objectives

For each participant, proportion of vignettes where the correct main diagnosis is included in the participant's final top-3 diagnoses. Compare final top-3 accuracy between the AI arm (after AI suggestions) and the control arm (no AI). Percentage of correctly diagnosed cases (top-3).

Study Centers

These are the hospitals, clinics, or research facilities where the trial is being conducted. You can find the location closest to you and its status.
This study has 1 location
Recruiting
Lille University Hospital (online study)Lille, FranceSee the location

Recruiting
One Study Center