少点错误 2024年10月12日
AI research assistants competition 2024Q3: Tie between Elicit and You.com
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

作者对多种AI研究工具进行测试,包括查找论文、分析问题等方面。You.com和Elicit在部分任务中表现出色,ChatGPT表现不佳。作者还分享了在水漱口抗病毒和血清铁相关问题上的测试结果,以及对各工具的看法。

🧐You.com在查找论文方面有小优势,Elicit和Google Scholar紧随其后,而ChatGPT表现糟糕。在查找水漱口作为抗病毒的相关论文时,除Elicit和Google Scholar外,其他工具都正确返回了10个结果。

📄Elicit、Perplexity和You.com在被要求进行分析时,都能提供关键信息,其中Elicit的回答最为简洁。在血清铁相关问题上,Perplexity和Elicit的表现也较为突出。

🎨You.com和Perplexity的用户界面受欢迎,You.com的功能用途比Perplexity更广泛,但作者尚未深入体验。

Published on October 12, 2024 3:10 PM GMT

 

Summary

I make a large part of my living performing literature reviews to answer scientific questions. For years AI has been unable to do anything to lower my research workload, but back in August I tried Perplexity, and it immediately provided value far beyond what I’d gotten from other tools. This wasn’t a fair comparison because I hadn’t tried any other AI research assistant in months, which is decades in AI time. In this post I right that wrong by running two test questions through every major tool, plus a smaller tool recommended in the comments of the last post

Spoilers: the result was a rough tie between You.com and Elicit. Each placed first on one task and was among top-3 in the other.

Tasks + Results

Tl;dr:

Finding papers on water gargling as an antiviral

I’m investigating gargling with water (salt or tap) as a potential antiviral. I asked each of the tools to find relevant papers for me.

ChatGPT was asked several versions of the question as I honed in on the right one to ask. Every other tool was asked “Please list 10 scientific papers examining gargling with water as a prophylactic for upper respiratory infections. Exclude nasal rinsing”. This is tricky because almost all studies on gargling salt water include nasal rinsing, and because saline is used as a control in many gargling studies.

Every tool correctly returned 10 results except for Elicit and Google Scholar, which by design will let you load papers indefinitely. In those cases I used the first 10 results.

PaperReal, relevant resultsProbably hallucinationsNotes
Perplexity- initial? The formatting was bad so I asked Perplexity to fix it
Perplexity- asked to format ^42 
ChatGPT 4o asking for “papers” without specifying “scientific”0 unusable
ChatGPT 4o specifying “scientific papers” about gargling as a treatment28 
ChatGPT 4o specifying scientific papers about gargling as a prophylactic0 unusable
ChatGPT o117Citation links went to completely unrelated papers
Claude 3.5 Sonnet22 
Elicit31 
You.com4 + 2 partial credits0 
Google Scholar40Not AI

You can see every response in full in this google doc.

I did not ask You.com for a picture but it gave me one anyway. It did not receive even partial credit for this.

Hepcidin

My serum iron levels went down after a series of respiratory illnesses, and on a lark I asked Perplexity if this could be related. Perplexity pointed me towards the hormone hepcidin and this paper, suggesting that respiratory illness could durably raise hepcidin and thus lower blood iron. Knowledge of hepcidin pointed me in the right direction to find a way to lower my hepcidin and thus raise my iron (this appears to be working, although I don’t want to count chicken before the second set of test results), so I was very impressed. This was one of two initial successes that made me fall in love with Perplexity.

I asked the other AI tools the same question. Elicit gave a crisp answer highlighting exactly the information I wanted and nothing else. Perplexity gave a long meandering answer but included hepcidin in its first bullet point. You.com gave an even longer answer in which hepcidin was included but hard to find. Everyone else gave long meandering answers that did not include hepcidin and so were worthless.

You can see the full results in the same google doc.

(Lack of) Conflict of interest

I received no compensation from any of the companies involved. I have social ties to the Elicit team and have occasionally focus grouped for them (unpaid). Months or possibly years ago I mentioned my desire to do a multitool comparison to an Elicit team member. At the time they offered me a free month to do the comparison, but their pricing structure has since made this unnecessary, sothey’ll find out about this post when it comes out. I have Perplexity Pro via a promotion from Uber.

Conclusions

After seeing these results I plan on playing with You.com more. If the UI and expanded uses turn out like I hope I might be loyal to it for as many as three months before it’s been surpassed.

There are two major features I’m looking for before I could consider giving up reading papers myself (or sending them to my statistician): determining if a statistical tool was appropriate for the data, and if an experimental design was appropriate for the question. I didn’t even bother to formally test these this round, but it wouldn’t shock me if we got there soon.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

You.com Elicit Perplexity AI研究工具
相关文章