AI needs better human data, not larger models

Opinion by: Rowan Stone, CEO of Sapien

AI is a paper tiger without human expertise in data management and training practices. Despite massive growth projections, AI innovations will not be relevant if they continue to train models based on poor quality data.

In addition to improving the data standards, AI models need human intervention for contextual understanding and critical thinking to ensure ethical AI development and proper output generation.

AI has a “bad data” problem

People have nuanced consciousness. They draw on their experiences to make conclusions and logical decisions. However, AI models are only as good as their training data.

An AI model accuracy does not quite depend on the technical sophistication of the underlying algorithms or the amount of data processed. Instead, accurate AI-performance depends on reliable high-quality data during exercise and analytical benefit tests.

Bad data has multifold — consequences for training of AI models: They generate prejudiced output and hallucinations from defective logic, leading to lost time in retraining AI models to learn bad habits, thereby increasing the company’s costs.

Partical and statistically under -represented data disproportionately amplifier deficiencies and crooked results in AI systems, especially in healthcare and security monitoring.

E.g. Displays an Innocence Project Report several cases of incorrect identification, where a former Detroit Police Chief, who admits that what is only dependent on AI-based face recognition would lead to 96% incorrect identification. According to a Harvard Medical School Report, an AI model used across US health systems prioritized a healthier white patients rather than sick black patients.

AI models follow “Waste in, Waste out” (GIGO) concept, such as defective and partial data inputs, or “Waste”, generates poor quality outputs. Data on poor input creates operational inefficiencies as project teams face delays and higher costs in cleaning data sets before resuming model training.

In addition to their operational effect, AI models are trained that are trained on low quality data, corporate confidence and confidence to implement them, causing irreparable reputation. According to a research article, hallucination rates for GPT-3.5 were 39.6%, emphasizing the need for further validation of researchers.

Such reputation injuries have far -reaching consequences because it becomes difficult to get investments and affect the model’s market positioning. In a CIO network meeting, 21% of America’s top IT leaders expressed a lack of reliability as the most urgent concern not to use AI.

Poor data for training AI models devalue projects and causes huge financial losses to businesses. On average, incomplete and low-quality AI education data results in incorrectly informed decision making that costs companies 6% of their annual revenue.

Recent: Cheaper, faster, more risky – the increase in DeeSek and its security concerns

Educational data for poor quality affects AI innovation and model education, so it is important to search for alternative solutions.

The bad data problems have forced AI companies to redirect researchers to prepare data. Nearly 67% of data scientists spend their time preparing correct data sets to prevent error information of AI models.

AI/ML models can fight to follow relevant output unless specialists – real people with proper credentials – work to refine them. This demonstrates the need for human experts to guide AI’s development by ensuring high quality curated data for training AI models.

Data about human limit is key

Elon Musk said recently, “The cumulative sum of human knowledge is exhausted in AI training.” Nothing could be further away from the truth as data on human boundary is the key to running stronger, more reliable and impartial AI models.

Musk’s dismissal of human knowledge is a call to use artificially produced synthetic data for fine-tuning AI model education. Unlike humans, however, synthetic data lacks experiences in the real world and has historically failed to make ethical assessments.

Human expertise ensures careful data review and validation to maintain an AI model’s consistency, accuracy and reliability. People evaluate, evaluate and interpret a model’s output to identify parties or errors and ensure that they are in line with societal values and ethical standards.

In addition, human intelligence offers unique perspectives during data preparation by bringing contextual reference, common sense and logical reasoning for data interpretation. This helps solve ambiguous results, understanding nuances and solving problems for high-complexity AI model training.

The symbiotic relationship between artificial and human intelligence is crucial to exploiting AI’s potential as a transformative technology without causing societal harm. A collaborative method between man and machine helps unlock human intuition and creativity to build new AI algorithms and architectures for the public.

Decentralized networks could be the lack of piece to finally solidify this relationship worldwide.

Businesses lose time and resources when they have weak AI models that require constant refinement from staff data researchers and engineers. With the help of decentralized human intervention, companies can reduce costs and increase the efficiency of distributing the evaluation process across a global network of data trainer and contributors.

Decentralized reinforcement learning from Human Feedback (RLHF) makes AI model education a collaborative valve. Everyday users and domain specialists can contribute to education and receive financial incentives for accurate annotation, labeling, category segmentation and classification.

A blockchain-based decentral mechanism automates compensation as contributors receive rewards based on quantifiable AI model improvements rather than rigid quotas or benchmarks. Furthermore, decentralized RLHF data and model education by involving people from different backgrounds, reducing structural bias and improving general intelligence.

According to a gardener survey, companies will abandon over 60% of the AI projects in 2026 due to the inaccessibility of AI clear data. Therefore, human suitability and competence are crucial to preparing AI training data if the industry wants to contribute $ 15.7 trillion to the global economy by 2030.

Data infrastructure for AI model training requires continuous improvement based on new and new data and use cases. People can ensure that organizations maintain an AI-clear database through constant metadate control, observability and governance.

Without human supervision, enterprises will fumble with the huge number of data that is silenced by the cloud and offshore data storage. Companies must adopt a “human-in-the-loop” approach to fine-tuning data sets for high quality building, performing and relevant AI models.

Opinion by: Rowan Stone, CEO of Sapien.

This article is for general information purposes and is not intended to be and should not be taken as legal or investment advice. The views, thoughts and statements expressed here are the author’s alone and does not necessarily reflect or represent the views and opinions from Cointelegraph.