{"id":6400,"date":"2025-09-03T11:03:01","date_gmt":"2025-09-03T11:03:01","guid":{"rendered":"https:\/\/jobuzo.com\/en\/deepseek-sheds-light-on-data-collection-for-ai-training\/"},"modified":"2025-09-03T11:03:01","modified_gmt":"2025-09-03T11:03:01","slug":"deepseek-sheds-light-on-data-collection-for-ai-training","status":"publish","type":"post","link":"https:\/\/jobuzo.com\/en\/deepseek-sheds-light-on-data-collection-for-ai-training\/","title":{"rendered":"DeepSeek sheds light on data collection for AI training"},"content":{"rendered":"<div>\n<div><img decoding=\"async\" src=\"https:\/\/cdn.i-scmp.com\/sites\/default\/files\/styles\/og_image_scmp_generic\/public\/d8\/images\/canvas\/2025\/09\/03\/62e49b83-33ff-473d-bcb8-400b5c85cba0_badc407b.jpg?itok=Yrubn8dl&amp;v=1756878033\" class=\"ff-og-image-inserted\"><\/div>\n<div datatype=\"p\" data-qa=\"Component-Component\" class=\"e8zc9q40 css-1xdhyk6 ec74h0k0\" readability=\"10.511111111111\">Chinese artificial intelligence start-up <span data-qa=\"Component-Text\" class=\"css-0 ef9u0v00\">DeepSeek<\/span> has lifted the veil on how it filters data to train its models, raising red flags about &ldquo;hallucination&rdquo; and &ldquo;abuse&rdquo; risks.<\/div>\n<div data-qa=\"InlineAdSlot-Container\" class=\"css-zl1inp e11v3ui14\">\n<div class=\"e11v3ui10 e11v3ui13 css-y2bwcc e1flwkbl0\" data-qa=\"AdSlot-Container\">\n<p>Advertisement<\/p>\n<\/div>\n<\/div>\n<p datatype=\"p\" data-qa=\"Component-Component\" class=\"e8zc9q40 css-1c6uqr6 ec74h0k1\">In a document published on Monday, the Hangzhou-based start-up said it &ldquo;has always prioritised AI security&rdquo; and decided to make its disclosure to help people use its models, at a time when Beijing is ramping up oversight over the industry.<\/p>\n<p datatype=\"p\" data-qa=\"Component-Component\" class=\"e8zc9q40 css-1c6uqr6 ec74h0k1\">The company said data in the pre-training stage was &ldquo;mainly&rdquo; collected from publicly available online information as well as authorised third-party data, and DeepSeek had no intention to collect personal data.<\/p>\n<p datatype=\"p\" data-qa=\"Component-Component\" class=\"e8zc9q40 css-1c6uqr6 ec74h0k1\">DeepSeek said it applied automated filters to remove raw data containing &ldquo;hate speech, pornography, violence, spam and potentially infringing contents&rdquo;. Meanwhile, it applied algorithmic detection with human review to identify &ldquo;inherent statistical biases in large-scale data sets&rdquo; to mitigate the impact on model values.<\/p>\n<p datatype=\"p\" data-qa=\"Component-Component\" class=\"e8zc9q40 css-1c6uqr6 ec74h0k1\">The company, founded by computer scientist Liang Wenfeng, said it was committed to reducing the &ldquo;hallucinations&rdquo; of its models through research and techniques such as retrieval-augmented generation, but added that it remained an &ldquo;unavoidable&rdquo; problem.<\/p>\n<div data-qa=\"InlineAdSlot-Container\" class=\"css-zl1inp e11v3ui14\">\n<div class=\"e11v3ui10 e11v3ui13 css-gy323d e1flwkbl0\" data-qa=\"AdSlot-Container\">\n<p>Advertisement<\/p>\n<\/div>\n<\/div>\n<div class=\"internal-linking-related-contents\"><a href=\"https:\/\/jobuzo.com\/en\/12-weeks-jail-for-school-it-support-technician-who-took-upskirt-videos-of-teachers\/\" class=\"template-1\"><span class=\"cta\">News :<\/span><span class=\"postTitle\">&lt;div&gt;12 weeks' jail for school IT support technician who took upskirt videos of teachers&lt;\/div&gt;<\/span><\/a><\/div><p datatype=\"p\" data-qa=\"Component-Component\" class=\"e8zc9q40 css-1c6uqr6 ec74h0k1\">&ldquo;AI is still in its early stages and the technology is still immature &hellip; at this stage, we cannot guarantee that our models will not produce hallucinations,&rdquo; it said, reminding users to seek professional advice when necessary and emphasising that its models predicted rather than retrieved answers based on user prompts.<\/p>\n<div datatype=\"p\" data-qa=\"Component-Component\" class=\"e8zc9q40 css-1xdhyk6 ec74h0k0\" readability=\"10.791139240506\">AI firms like <span data-qa=\"Component-Text\" class=\"css-0 ef9u0v00\">OpenAI<\/span> and DeepSeek have been criticised for their chatbots&rsquo; hallucinations, where they generate incorrect or misleading results. As underlying AI models become more powerful, some worries have arisen about the possibility of AI-induced psychosis and other problems arising from chatbot overreliance.<\/div>\n<\/div>\n<p><sub>DeepSeek sheds light on data collection for AI training<\/sub><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Chinese artificial intelligence start-up DeepSeek has lifted the veil on how it filters data to train its models, raising red flags about &ldquo;hallucination&rdquo; and &ldquo;abuse&rdquo; risks. Advertisement In a document published on Monday, the Hangzhou-based start-up said it &ldquo;has always prioritised AI security&rdquo; and decided to make its disclosure to help people use its models,&#8230;<\/p>\n<p class=\"more-link-wrap\"><a href=\"https:\/\/jobuzo.com\/en\/deepseek-sheds-light-on-data-collection-for-ai-training\/\" class=\"more-link\">Read More<span class=\"screen-reader-text\"> &ldquo;DeepSeek sheds light on data collection for AI training&rdquo;<\/span> &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":6401,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-6400","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts\/6400","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/comments?post=6400"}],"version-history":[{"count":0,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/posts\/6400\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/media\/6401"}],"wp:attachment":[{"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/media?parent=6400"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/categories?post=6400"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jobuzo.com\/en\/wp-json\/wp\/v2\/tags?post=6400"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}