Skip to content

JOBUZO

  • News
  • Indonesia
  • Toggle search form
DeepSeek sheds light on data collection for AI training

DeepSeek sheds light on data collection for AI training

Posted on 3 September 2025 By jobuzo
Chinese artificial intelligence start-up DeepSeek has lifted the veil on how it filters data to train its models, raising red flags about “hallucination” and “abuse” risks.

Advertisement

In a document published on Monday, the Hangzhou-based start-up said it “has always prioritised AI security” and decided to make its disclosure to help people use its models, at a time when Beijing is ramping up oversight over the industry.

The company said data in the pre-training stage was “mainly” collected from publicly available online information as well as authorised third-party data, and DeepSeek had no intention to collect personal data.

DeepSeek said it applied automated filters to remove raw data containing “hate speech, pornography, violence, spam and potentially infringing contents”. Meanwhile, it applied algorithmic detection with human review to identify “inherent statistical biases in large-scale data sets” to mitigate the impact on model values.

The company, founded by computer scientist Liang Wenfeng, said it was committed to reducing the “hallucinations” of its models through research and techniques such as retrieval-augmented generation, but added that it remained an “unavoidable” problem.

Advertisement

News :<div>12 weeks' jail for school IT support technician who took upskirt videos of teachers</div>

“AI is still in its early stages and the technology is still immature … at this stage, we cannot guarantee that our models will not produce hallucinations,” it said, reminding users to seek professional advice when necessary and emphasising that its models predicted rather than retrieved answers based on user prompts.

AI firms like OpenAI and DeepSeek have been criticised for their chatbots’ hallucinations, where they generate incorrect or misleading results. As underlying AI models become more powerful, some worries have arisen about the possibility of AI-induced psychosis and other problems arising from chatbot overreliance.

DeepSeek sheds light on data collection for AI training


News

Post navigation

Previous Post: Vietnam marks 80th anniversary of August Revolution and National Day
Next Post: Search for survivors continues as Afghan quake toll rises

Related Posts

iOS 18.7.1: Hidden Problems You Need to Know iOS 18.7.1: Hidden Problems You Need to Know News
Dodgers vs. Blue Jays, Game 7 tonight: How to watch the 2025 MLB World Series without cable Dodgers vs. Blue Jays, Game 7 tonight: How to watch the 2025 MLB World Series without cable News
Bitcoin drops, driving US trillion slide in crypto market value Bitcoin drops, driving US$2 trillion slide in crypto market value News

Latest

  • Daily roundup: LTA to hike fee for Malaysian cross-border taxis from $2 a month to $15 per trip — and other top stories today
  • DeepSeek on hiring spree – seeks newcomers, not just AI geniuses
  • World Insights: NATO chief in Washington to soothe strains amid persisting rifts
  • India to resume tourist visas for Bangladeshis after nearly two-year freeze
  • Is Intuit’s QuickBooks down? Business owners report issues; company responds widespread outages
  • Supreme Court clears way for Trump administration to revive restrictive immigration policy
  • Massachusetts House passes bill safeguarding libraries from book bans
  • Move Over Ultra: Why the New Samsung Galaxy S27 Pro Is Samsung’s Real Flagship for 2027
  • ‘So lethargic and sleepy’: South Korean netizens bash national team’s performance during World Cup
  • Vatican begins 5-year restoration of Raphael Loggia, used by popes and presidents

Copyright © 2025 JOBUZO. Disclaimers | Privacy Policies

Powered by PressBook Masonry Blogs