A Geodyssey – Enterprise Search Discovery, Text Mining, Machine Learning 02月13日
Text Embeddings for Rock Classifications
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章通过对大量地质报告的分析,利用无监督机器学习生成的文本嵌入模型,比较约2000种岩石类型名称,探讨其差异及关联,还提出一些初步想法及需考虑的因素。

利用文本嵌入模型比较约2000种岩石类型

一些岩石类型如盐类、火山岩等可通过周围文字的相似性区分

提出建立地质嵌入'基线'的初步想法

指出需考虑模型的通用性等因素

I tested if we might differentiate rock types and their associations based on the patterns of words that occur around them in large archives of geological reports. Using a text embeddings model generated through the unsupervised machine learning from thousands of geological survey reports, approximately 2,000 rock type names were compared to each other. The dimensionality was reduced in a t-SNE plot.

Some rock types such as salts, volcanics, organics and glacial deposits appear to be clearly differentiated by the similarity of surrounding words to their name, as are igneous intrusions and metamorphic classifications with some overlap between these two in places. Carbonates seem well differentiated, with a split of some types closely associated to organics. Extra-terrestrial rocks appear split with perhaps micrometeorites differentiated, which requires further investigation.

Clastics seems split into two, at first glance this may be differentiating (left hand group) superficial deposits, unconsolidated sediments and gravels from the larger clastic group on the right. Mudrocks (inc. fine grained clastics) appear split into 4, with one group closely associated with organics, one with carbonates and two groups associated to the two clastic groups previously mentioned.

There is some overlap/lack of differentiation in the middle with many rock types, and plenty of intriguing ‘outliers’ to investigate and try to understand.

I don’t have a very specific use case, question or problem in mind per se. I’m just inductively exploring the data (each data point is specific rock type classification e.g. peridotite, schist, sandstone, marl, tuff etc. that I have not displayed on the chart for readability) and seeing what may be of interest and thinking about what use cases might emerge if any.

One very early poorly formed idea perhaps might be to have a geological embedding ‘baseline’ in which to revisit old reports or even compare newly described rocks in reports. If these new occurrences plot well outside existing similarity tolerances built from vast collections, it might point to something worth examining. Such as a potential misclassification, novel associations or something else that may be significant for some aspect of re-interpretation.

Considerations to consider. Would a single generalisable embeddings model ‘work’ globally, or perhaps more likely several regional ones needed. Another element may be the changing nature of language by different authors and evolving science, and the impact of different languages and nationalities. Many other aspects of course!

hashtag#geology hashtag#lithology hashtag#rock hashtag#geoscience hashtag#earthscience hashtag#machinelearning hashtag#artificialintelligence hashtag#ai hashtag#textembeddings hashtag#naturallanguageprocessing hashtag#analytics hashtag#datascience hashtag#bigdata hashtag#datadiscovery hashtag#mining hashtag#oilandgas hashtag#geothermal hashtag#hydrogeology hashtag#geotechnical hashtag#geologicalengineering hashtag#planetarygeology hashtag#geohazards hashtag#ccs hashtag#research hashtag#datamanagement hashtag#datainnovation hashtag#datamining hashtag#subsurface

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

地质报告 岩石类型 文本嵌入 地质嵌入
相关文章