OpenAI is launching a new partnership program to gather data from third parties to train its AI models. The OpenAI Data Partnerships initiative aims to collect large-scale private and public information that is not easily accessible online. The company is looking for data in any format, including text, images, audio, or video, as long as it expresses human intention. This data will help improve tools like its automatic transcription service, ChatGPT, which recently expanded to support voice and image-based queries. OpenAI is already working with organizations like the Icelandic government to improve its AI models’ understanding of queries made in the Icelandic language.
Interested organizations can submit information about the data type and size they want to share on the company’s website. There are two pathways for datasets: the Open-Source archive, which is public and ideal for training language models, and the private dataset pathway, which is for companies or institutions that want to keep their data confidential. OpenAI emphasizes that it is not seeking datasets containing sensitive or personal information.
The company’s ChatGPT tool has a large user base, making data collection and privacy a focal point for OpenAI. While it does not use data generated by its API to train its models unless a user explicitly submits information, the handling of data collected through this new initiative will be closely monitored, especially the private datasets.