
I’ve combined text embeddings generated from word co-occurrences within thousands of geological reports for both lithology and minerals in a 3D t-SME plot. Following on from some recent posts I made, it may be interesting to explore similarity (cosine vector similarity) between lithologies-lithologies, minerals-minerals and lithologies-minerals.
This is a technique anyone can conduct on large volumes of reports. I’ve spotted some potentially interesting associations which I’m currently researching.
I have also replicated the mineral-mineral association of Scheelite-Molybdenite, reported by Lawley et al (2022) from Natural Resources Canada (NRC) in their paper “Geoscience language models and their intrinsic evaluation”, using a completely different collection of reports.
They state:
“Word embeddings provide a powerful framework for evaluating and predicting mineral groups based on thousands of observations in nature from multiple trained observers over time. Minerals from disparate classification groups that plot close together provide intriguing evidence for associations that require re-examination (e.g., the lesser known association between scheelite and molybdenite in porphyry-skarn mineral systems).”
My next step is developing an algorithm that automatically looks at such data driven arrays, to detect candidates for closely associated entities that may not be well known. This would include adding some geological model knowledge into the algorithm with some rules. For example, it would discount ‘obvious’ associations from and within different types where the geological structuration, depositional or diagenetic/mineralization mechanism is well known e.g. clays-marls, copper-zinc, chalcopyrite-galena, black shale-pyrite etc. So combining data driven and geological model techniques.
As I mentioned in my previous post, I don’t have a specific question in my mind. I’m just inductively exploring the data visualisation generated from millions of sentences, more than I could ever read, and see what may catch the eye.