Does weaker research make weaker claims?: Towards automated detection of linguistic hedging

Questions – What is the relation between the quality of a scientific study, the decisiveness of its results and the linguistic expression of (un)certainty of its scientific claims? What are boundaries of exceptionally little hedging?

Data and Methods – “Hedging” refers to modification of the strength of claims. Natural language programming (NLP) can be used to quantify the amount of hedging in a text. Weaker research should make weaker claims. Therefore, we expected strong associations between methods & findings, and hedging of claims. We assessed if study quality, magnitude and statistical precision of main findings were associated with hedging scores and extracted data from 100 publications on RCTs taken from Cochrane reviews. We assessed RCT quality using the ‘Risk of Bias’ (ROB) tool. We extracted outcome data on main results and their precision. NLP-software determined hedging scores, corrected for word count. (Combinations of) hedging words were assigned a weight between 1 (weak hedge) and 5 (strong). A hedging score of 0.03 means that per 100 words 3 hedges with weight 1 were encountered or 1 hedge with weight 3. We determined the 10th centile reference values for the hedging scores and assessed their relation to quality and decisiveness of results.

Results – We analyzed 98 RCTs published between 2005 and 2013. Word counts varied between 1,061 and 6,523 (mean 3,544). Word-count corrected hedging scores varied between 0.021 and 0.075 (mean 0.045). The proportion of ROB items fulfilled varied between 0 and 100 (mean 56%). Hedging was not associated with ROB. Normalized hedging scores below 0.025 (at a ROB score of 0) and 0.032 (ROB score = 100) seem exceptionally low (below the 10th centile) and may be a reason to check for overstatement of claims in future trial reports.

Implications – Automated detection of overstatement and spin seems useful for authors and editors of manuscripts. Such automated detection requires extension of this work. The absence of an association between study quality & strength of findings and hedging suggests that authors may insufficiently temper the strength of their claims to important study characteristics. Our results were obtained in RCTs. The assignment of hedging scores by the NLP-software is somewhat subjective. We focused solely on the primary outcome of each trial.

SEPTEMBER 21, 2017
Location: Faculty of Electrical Engineering and Computing Date: September 21, 2017 Time: 12:40 pm - 1:00 pm Gerben Ter Riet Sufia Amini Lotty Hooft Halil Kilicoglu