One of the impressive recent accomplishments of artificial intelligence (AI) is the use of AlphaFold to predict the 3 dimensional structure of protein folding. Proteins are composed of amino acid chains, that are folded in a specific way into a three-dimensional structure key to the resulting function of the protein. Knowing how proteins fold is thought to be a key in drug development and associated therapeutics.
Prior to AI prediction of protein folding, painstaking experimentation work identified protein structures at the rate of one per five years or so by a given researcher. For example, whole PhD dissertations have been written on one protein’s construction. So, although there are 100’s of millions of different known proteins, only about 100,000 have documented experimental evidence of their folding. Hence, the “protein folding problem” became one of the grand challenges in biological research – a challenge taken up by computational approaches.
One interesting feature of the evolution of Alphafold was that the learning data set on which it was based was rather small by AI standards, roughly those 100,000 records of experimentally validated three-dimensional protein structures.
The sequence of building AlphaFold started with the 100,000’s or so and then assessed how it performed. It did well, but not that well. However, for each output of that first stage of AlphaFold development, it also produced a confidence measure, based on its assessment of how likely it was that the given structure was indeed correct. If I understand this correctly, the confidence measure was a function of the model-based variance, as well as level of match between various features predicted structure and other known proteins. Those predicted protein structures (along with their confidence scores) were added to the learning data base of about 100,000 verified structures and a new learning cycle was conducted. By July 2022, AlphaFold had predicted rather accurately the structure of over 100 million proteins!
Having confidence measures is a valued property of scientific evidence and many other knowledge domains. That property presents both the conclusion/answer and the confidence that the scientist has in the answer. But apparently, confidence measures are currently not a common automatic feature of LLMs.
For example, a question to ChatGPT about how confident it was in an answer about AlphaFold yielded, “As an AI language model, I don’t possess emotions or personal opinions, so I don’t experience confidence in the same way humans do.”
The literature in cognitive and social psychology has examined self-reports about a person’s confidence. It’s a large literature with interesting twists and turns but there seem to be some common findings. Self-confidence seems to vary across persons, with some personality traits being relevant. Social context tends to affect perceived confidence, with others affecting our perceptions. Some confidence measures seem to be related to self-consistency as well as distance to some external truth. In that regard, confidence reports also reflect prior expectations of the person making the confidence assessment. But the correlations between confidence measures and truth seem to be highly variable.
Self-reported confidence among humans thus appears to be difficult to measure and a fallible state at best. In many contexts we’re not very good at generating a confidence assessment that is highly correlated to how correct our judgments actually are. It would be an interesting development if AI might be used to assist humans in assessing the confidence that their given solution or judgment deserves.
Address
ICC 650
Box 571014
37th & O St, N.W.
Washington, D.C. 20057
Contact
Phone: (202) 687.6400
Email: provost@georgetown.edu
Office of the ProvostBox 571014 650 ICC37th and O Streets, N.W., Washington D.C. 20057Phone: (202) 687.6400Fax: (202) 687.5103provost@georgetown.edu
Connect with us via:
It’s amazing stuff, the AI algorithms used in alphafold get “trained” using the entire atomic resolution pdb data base of every known protein 3 dimensional structure determined eperimentally by x-ray or cryo-EM methods. This is computationally intense, yet labs at Georgetown do this for an arbitrary linear protein sequence is < an hour using custom built servers. Yet, alphafold is not modelling, there is an interesting paper this issue of P.N.A.S. that emphasizes how alphafold is very different from homology modelling:
More than just pattern recognition: Prediction of uncommon protein structure features by AI methods
Osnat Herzberg and John Moult. Alphafold is truly revolutionary, but confidence in the method (as is the case for any new method in science) will take time to build in the scientific community. I think scientists have to use it and stare at the data before they fully respect it. Ditto for any other application of AI algorithms, it is not my field but I wonder how often people build confidence based on "homology modelling" relative to other peoples' behavior vs more computationally intense de novo thought patterns that are derived from a training set the way AI does things.
Best
Paul
Evaluating human self confidence !! That’ll be very interesting. I’m confident AI might help us ?!?! Whole new world we live in. We need to learn to have enough confidence to meet new challenges ! Just a human thought.