“Reasonable chance” has no hard quantitative definition to my knowledge and I realize I violated my own caveat of using Kent’s estimation language by saying that. So there’s really not a solid platform to discuss that because I messed that up by even using that language. :D

Edit: Turns out I don’t use that language? LMK what you are referencing.

I don’t fully understand the second part of your comment, here’s my response given my interpretation, happy to adjust if I got it wrong.

Given the “88%” likelihood, the panel concedes that they will be wrong 12% of the time. The only real thing to measure is calibration, which is, how closely their accuracy (the % correct) is to their stated confidence (the % the believe they’ll be correct). If that’s what you’re saying: yes! There is nothing to measure on track record, because there isn’t one yet.

Going through this link will make this point of calibration very clear.

If they have never been wrong on 70% or more confidence, then that is an issue with underconfidence and would demand work towards recalibration. 70% should mean 30% failure. So I think my viewpoint of your comment is that it confuses track records with calibration, sort of like the Anchorman joke of “60% of the time it works every time” but not entirely. In this case, if 70% certainty always turns out correct, there’s a larger problem(?).

That said — given enough telemetry to supply a model, yes, this stuff can be modeled automatically. However, there will always be a forecaster to adjust a complex model. Otherwise we wouldn’t have the field of operational meteorology, even data rich models of complex issues need to be adjusted routinely by a human.

This is overall a recent subject for me, so happy to be pointed towards other arguments and not really speaking with authority here.

