Skip to main content

LLM Showdown: Comparing ChatGPT, Gemini, and Grok for Automated News Research

LLM Showdown: Comparing ChatGPT, Gemini, and Grok for Automated News Research with a little help of Grok
The analyst’s day is full of research. Now, this is the age of AI and AI is here to help, isn’t it? As everyone is talking about copilots and AI agents, why not using the tools at hand to do a little research on research.

NB., no one really has a good definition of an AI agent, so this might become an additional topic for research.

But I digress.

Imagine the following project at hand, which is not only interesting for analysts, btw, but also for a variety of roles in the corporate world. Let’s call it vendor (competitor) monitoring. The job is the following:

  • Research reputable sites for news about a number of vendors, relating to a set of keywords. Reputable sites are high quality news sites, high quality tech publications, high quality analyst sites and, of course the news pages of the vendors in question.
  • Limit the time frame of the search matching to the cadence of my information requirement, e.g., “yesterday” for a daily update or “last week” for a weekly update.
  • Provide a summary of the news
  • Give an assessment of how the news affects the positions of the vendors in the marketplace re the key words in question
  • Provide these news with their assessments as a prioritized list, sorted from high impact to low impact
  • Add an executive summary as a preface
  • Send it to me as an email

So far, so simple. After all, a lot of folks, yours truly included, do this every day. And it is taking quite some time. So, this job is a perfect one for an automated update beyond a CSS feed. And it seems like a perfect job for an LLM turned agent – or is it a copilot?

Now, the basic question is: Which one to use? After all, there are plenty, from free to not so free ones. Answering this question turns into yet another interesting experiment: Why not ask some LLMs for their evaluation of suitability? Kinda meta, but an interesting one.

So, I did just that: I asked ChatGPT 4.5, Grok 3, and Gemini in its 4 versions 2.0 Flash Thinking Experimental, 2.0 Flash, 1.5 Pro with Deep Research, 2.0 Pro Experimental for their analysis of which of them is best suited for the research task at hand.

For this, I used the following, simple prompt:

describe the different capabilities and limitations of Gemini 2.0 Flash Thinnking Experimental, 2.0 Flash, 1.5 pro with deep research, 2.0 pro experimental, Grok 3 and chatGPT 4.5, both with and without deep reasoning. Which model is best to support the following use case:   

research the web for news on a given set of companies and a given set of topics. The news shall cover the past 2 days only   

assess the news regarding their impact on the companies' market positions re the given set of topics   

create this in the form of a report   

do this as a daily scheduled task  

accuracy, reasoning and reliability are of high importance. Speed is of lower importance.  

generate a comparison table, give a recommendation and justify the recommendation.

The results are quite interesting. 

  • Gemini 1.5 Pro with Deep Research resolves to Gemini 2.0 Flash because it offers “a balance of reasoning, accuracy, reliability, and tool use necessary for fulfilling the requirements of the specified use case. Its production-ready status, combined with its ability to handle complex analysis and generate comprehensive reports, makes it the ideal LLM for this task.” It looks at Grok as the runner up.
  • Gemini 2.0 Pro Experimental recommends Gemini 1.5 Pro with Deep Research as it “offers the best balance of accuracy, reasoning, and reliability for your demanding research and reporting task, leveraging Google's strengths in both LLMs and web search.”. This is closely followed by Grok 3.
  • Gemini 2.0 Flash Thinking Experimental suggests doing a competitive evaluation of ChatGPT 4.5 with Deep Reasoning and Gemini 2.0 Pro Experimental – as they are a tie. “Which is truly "better" in practice will depend on your specific prompts, data, and tolerance for latency and cost”. 
  • ChatGPT 4.5 (without Deep Research) suggests going for ChatGPT 4.5 with Deep Research as the model “offers enhanced reasoning capabilities, reduced hallucination rates, and a broader knowledge base, aligning well with the requirements for accuracy, reasoning, and reliability in daily scheduled tasks. While models like Gemini 2.0 Flash Thinking Experimental and Grok 3 also provide advanced features, ChatGPT 4.5's maturity and proven track record make it a suitable choice for generating comprehensive and reliable reports.”.
  • Grok 3 in Deep Research mode suggests using Gemini 2.0 Pro Experimental as its “advanced reasoning capabilities, as evidenced by its performance in complex tasks and 2 million token context window, make it ideal for researching news, assessing market impact, and generating daily reports (Gemini 2.0 Pro). The integration with Google Search ensures access to recent news, and as a Google product, it likely offers high reliability for scheduled tasks, aligning with the emphasis on accuracy and reasoning over speed. While Gemini 1.5 Pro with Deep Research is tailored for research, Gemini 2.0 Pro Experimental, being a newer model, likely offers superior capabilities”. Grok looks at Grok as the runner up is it offers advanced reasoning and Deep Search “but potential biases from X data integration”.

So, what does this tell me?

There is probably a bit of self-serving involved in the LLM’s assessments and suggestions. At least Google consistently suggests a Google model and ChatGPT suggests itself. What is a bit confusing is that 

An interesting side remark is that only Gemini 1.5 Pro with Deep Research, ChatGPT 4.5 and Grok 3 provide the sources used for the research. Perplexity does this, too. Providing references is important for validating the results.

It looks like the results delivered by these LLMs seem to favor Gemini 2.0 Pro Experimental and ChatGPT 4.5, though, although I am impressed by Grok 3. On the other hand, one needs to know that “experimental” means exactly that – the models are not yet fully stable.

Having said this, if one needs to perform research tasks, as many of us need to, environment matters. Especially smaller businesses often run Google Workspace. In the case that they subscribed to the Business Standard Edition (like I am doing), Gemini is readily available, there is probably no immediate need to purchase an additional ChatGPT license (I have a pro subscription) or a Grok or Perplexity subscription. This is especially true as most of these tools use a lot of the data that users provide to improve their services, which is especially true for free services. Grok, in its privacy statement explicitly recommends to not input any personal data – as it will be used.

In summary, if and when I need to do research, I’ll use Google Gemini 1.5 Pro with Deep Research and Gemini 2 Pro Experimental as my preferred option, simply because ChatGPT 4.5 with Deep Research only offers limited runs per month. As it doesn’t cost much, additionally running the same research – potentially with a slightly changed prompt to cater for model differences – I will use ChatGPT and (if no sensitive data involved) Grok 3 in addition. Worst case, this gives me additional food for thought.

What do you think? 

Comments

Last Year's Top 5 Popular Posts

SAP CRM and SAP Jam - News from CRM evolution

During CRM Evolution 2017 I had the chance of talking with Volker Hildebrand and Anthony Leaper from SAP. Volker is SAP’s Global Vice President SAP Hybris and Anthony is Senior Vice President and Sales GM - Enterprise Social Software at SAP. Topics that we covered were things CRM and collaboration, how and where SAP’s solutions are moving and, of course, the impact that the recent reshuffling in the executive board has. Starting with the latter, there is common agreement, that if at all it is positive as likely to streamline reporting lines and hence decision processes. First things first – after all I am a CRM guy. Having the distinct impression that the SAP Hybris set of solutions is going a good way I was most interested in learning from Volker about how there is going to be a CRM for S4/HANA. SAP’s new generation ERP system is growing at a good clip, and according to the Q1/2017 earnings call, now has 5,800 customers with 400 new customers in the last quarter alone. Many...

Sweet Transformation: Inside SugarCRM’s New Direction

Fresh from the 2025 SugarCRM Analyst Summit, waiting for my plane home, it is time to sort my thoughts. From Monday, 1/27 evening to Wednesday 1/29 in the morning we had some time jam packed with information and good conversations with SugarCRM execs, customers, and in between analysts. The main summit started with a bang, namely the announcement that industry icon Bob Stutz joins the SugarCRM board of directors , which is something that few of us, if any, had foreseen. This is exciting news.  With David Roberts , who succeeded Craig Charlton in September 2024, SugarCRM itself has a new CEO with a long time CRM pedigree.  As with every leadership change, this promises some change. Every new CEO evaluates what they see vs. where they want their company to go and then, together with the team, establishes and executes a plan to get there. Usually, this involves some change in the structure of the executive leadership team, too.  This is what happened and happens with SugarCR...

SaaS or the Rise of the Undead

SaaS is dead! It will be replaced by agentic systems that replace coded business logic by AI agents that autonomously interact to bring said business logic to life, just smarter. Satya Nadella said it - or at least something in these lines, if I believe all the pundits around. His words lit up the Internet. And Satya Nadella being the CEO of a 3 trillion dollar company is the ultimate fount of truth and wisdom, when it comes to business applications. Is he not? So, what should we take from his statements? After all, the words of the CEO of one of the top 3 valuable companies on this Earth carry some weight. Let me start straight.  I call BS! SaaS, first of all, is a delivery model of logic that also had some implications on vendors‘ business models and their approaches to pricing. For a variety of good and not so good reasons this delivery model succeeded vs. the prevalent model of on-premises software. Some of the more important reasons have been “no lock in by vendors”, “only pay...

Salesforce stock tanks after earnings report - a snap analysis

The news On May 29, 2024, Salesforce reported its results for the first quarter of the fiscal year 2025. Highlights are a total quarterly revenue of $9.133bn US, resembling a year-over-year growth of 11 percent a current remaining performance obligation of $26.4bn US a remaining performance obligation of $53.9B US an operating margin of 18.7 percent. diluted earnings per share of $1.56 The company reported a revenue guidance of $9.2bn - $9.25bn US for the next quarter and a full year guidance of $37.7bn - $38.0bn US, resembling growth rates of 7 – 8 percent and 8 – 9 percent, respectively. With these numbers, Salesforce ended up at the lower end of last quarter’s guidance on the revenue growth side while exceeding the earnings per share projection and slightly lowered the guidance for the fiscal year 2025. The result: The company’s share price dropped from $272 to bottom out at $212. The bigger picture Salesforce is the big gorilla in the CRM and CX industry. The company has surpassed ...

Zoho - A True Unicorn

End of January Zoho held its 2020 Zoho Days, an analyst summit, which I was happy to attend, along with more than 60 colleagues, as the only analyst from Germany, as it seems. Sadly, it took me quite a while to complete this – Zoho deserves a faster commentare. But hey, let’s look forward and get rolling. Zoho is a privately owned enterprise software company that has quietly evolved from a small software company in 1996 to an ambitious global player that serves the SMB- and enterprise CRM market with cloud applications. The company has a set of 45+ business apps with more than 50 million users, 10 data centres and counting, and is available in 180 countries. The company is profitable and maintained a CAGR of more than 30 percent over the past five years. But why quietly? Because Zoho managed its growth pretty unusually (almost) fully organically with only very minor acquisitions. Crunchbase lists one. Following this unique approach, which defies the tradit...