Unite.AI 03月20日
Or Lenchner, CEO of Bright Data – Interview Series
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Bright Data的CEO Or Lenchner自2018年以来,带领公司成为领先的网络数据收集平台,年收入超过1亿美元。Bright Data助力全球企业、大学和公共部门实时获取大规模公共网络数据,Lenchner强调开放数据对创新的重要性。他分享了AI团队在数据获取方面面临的挑战,以及Bright Data如何通过提供可扩展、自动化的数据收集解决方案,确保数据质量和合规性,帮助AI团队专注于模型构建。同时,Lenchner还探讨了数据伦理、隐私保护以及AI数据收集的未来趋势,强调Bright Data在推动AI发展的同时,坚持负责任和透明的数据实践。

🔑 Bright Data通过提供可扩展、自动化的数据收集平台,解决了AI团队在获取大规模公共网络数据时面临的挑战,例如数据量大、质量参差不齐以及合规性问题。平台提供结构化的实时数据,并利用AI工具进行数据清洗和验证,确保数据的准确性。

🛡️ Bright Data高度重视数据伦理和隐私保护,严格遵守GDPR、CPRA、CCPA等全球数据隐私法规,并实施“了解你的客户”(KYC)协议,确保只有合法的用户才能访问平台。公司还制定了明确的“可接受使用政策”,明确规定了可以和不可以收集的数据类型。

📈 Bright Data认为,商业增长与维护道德的数据收集实践并不矛盾。公司对用户进行严格的审查,以确保收集的数据被用于合乎道德的目的。同时,投入大量的时间、精力和资源用于合规和安全,以保护客户和公众的利益,从而建立一个透明和负责任的AI生态系统。

🤖 AI驱动的代理和自动化正在改变数据收集的格局。Bright Data已经创建了支持AI代理部署和发展的基础设施,从而能够流畅地访问网络上高质量的实时数据。这项技术使得复杂的AI系统能够持续地与动态的网络数据交互,从中学习并不断成长。

Or Lenchner, CEO of Bright Data, has led the market-leading web data collection platform since 2018, driving its expansion, innovation, and growth to over USD 100 million in annual revenue. Bright Data enables Fortune 500 corporations, leading businesses, renowned universities, and public sector entities to access public web data in real-time and at scale. Lenchner is a strong advocate for keeping public web data open and accessible, emphasizing its critical role in driving innovation.

What inspired your journey into the world of data and AI, and since becoming CEO in 2018, how have you shaped Bright Data’s mission and vision?

I’ve always been fascinated by the power of data, particularly with how it can drive decisions and fuel innovation. When used right, data can also drive transparency in business. Becoming CEO of Bright Data in 2018 gave me an opportunity to help shape how AI researchers and businesses go about sourcing and utilizing public web data.

What are the key challenges AI teams face in sourcing large-scale public web data, and how does Bright Data address them?

Scalability remains one of the biggest challenges for AI teams. Since AI models require massive amounts of data, efficient collection is no small task. And since AI models are only as good as the data they are trained on, ensuring teams have access to fresh, high-quality data is a constant challenge. This is especially true as the web evolves in real time.

Another major concern is compliance. Data privacy laws and requirements continuously evolve, so AI teams need to always be aware of those changes. They also have to understand how to deal with websites that enforce anti-bot mechanisms, which can complicate the data gathering process.

The platform that we’ve built at Bright Data takes care of these challenges. We provide scalable, automated data collection that delivers structured real-time data. Our AI-driven tools clean and validate data to ensure accuracy. We have strict measures in place to ensure legal and ethical data collection for compliance. The idea is to empower AI teams to focus on building great models, while we handle the complexities of data sourcing.

How does high-quality web data contribute to AI model performance, and what are the best practices for ensuring data accuracy?

High-quality data means data that is complete, free from biases, and most importantly, accurate. If data is lacking or mired in inconsistencies and mistakes, the resulting AI model won’t perform according to expectations.

To achieve accuracy, it’s best to source data from a variety of public sources that have established reliability. Using only a few, or worse, a single data source, results in problems such as incompleteness. Having multiple sources provides the ability to cross-reference data and build a more balanced and well-represented dataset. Additionally, organizations should consider automated data validation and cleansing, to efficiently get rid of erroneous and inconsistent data.

At Bright Data, we take all of these factors into account. We provide AI teams with structured and real-time data that has been validated for accuracy. That way, they can train models with confidence.

What are the biggest ethical concerns in public web data collection today?

Privacy remains to be one of the biggest concerns in public web data collection. People worry about their data getting exposed to abuse and misuse. To make sure that data remains private, it is vital to emphasize transparency. Organizations that accumulate data must be upfront regarding the data they collect. It is important to assure the public that their data is used under strict ethical guidelines.

One other major concern is monopolization. Certain large companies have control over a vast amount of data, which creates an uneven playing field wherein only a select few have access to information necessary to train AI models and drive innovation. This is not how things should be. Public web data should remain accessible to businesses, researchers, and developers. That way, AI development is not concentrated in the hands of just a few major players.

Ethics are not an afterthought at Bright Data. They’re embedded into every decision we make. We do not just follow industry standards – we set them. We lead in the data collection industry in defining the right ethical standards. We want to ensure that public web data is accessed responsibly, transparently, and in full compliance with global regulations.

How does Bright Data ensure compliance with global data privacy regulations while still enabling large-scale data collection?

Our organization is committed to adhering to global legal and regulatory requirements on data gathering and utilization. We see to it that we comply with the requirements of GDPR, CPRA, CCPA, and other relevant regulations. Importantly, we strictly follow Know Your Customer (KYC) protocols to ensure that only legitimate users get to access our platform. Our data solutions may only be accessed by legitimate businesses and researchers.

Our Acceptable Use Policy is also clear in defining what data can and cannot be collected. This includes responsible use. We have a dedicated compliance team responsible for the continuous monitoring of regulations to ascertain that we are up to date with the latest legal and regulatory requirements.

Regardless, we still believe that public web data should remain accessible. Our goal is to provide AI teams with the data they need while ensuring compliance with privacy and legal standards.

How do you balance business growth with maintaining ethical data collection practices?

We always think of ethics and growth as not mutually exclusive. The trust of our customers and the relationship we build with them are paramount concerns. We understand that we may only achieve long-term success if we collect data under transparent terms and in accordance with applicable laws.

Thus, we put in place a strict vetting protocol for our users. This is designed to ensure that the data we collect is used ethically. We allocate time, effort, and resources towards compliance and security to protect our customers and the public in general. By observing ethical data collection, we succeed business-wise while contributing to the establishment of a transparent and responsible AI ecosystem.

How does Bright Data stay ahead of regulatory changes in data privacy?

We understand that our data use processes and policies inevitably have to change to reflect changes in relevant laws and regulations. As such, we regularly consult legal experts and communicate with regulatory bodies. We also engage in discussions with legislators and others involved in policy building, providing input in the crafting of meaningful data regulations. We aim to strike a balance between innovation and data privacy.

Our data collection and use framework evolves as new laws are issued and regulations revised. We have a compliance team that proactively updates our data use policies to make sure that our platform is always fully compliant. Moreover, we operate customer education initiatives to promote ethical data use.

What are the emerging trends in AI data collection that companies should be aware of?

Real-time data collection is becoming a must for today’s AI models. It is crucial for them to access the latest or freshest data to deliver a high level of accuracy and provide better user experiences.

Another notable trend is the reliance on synthetic data used for data augmentation, wherein AI generates data that supplements datasets gathered from real-world scenarios.

I’m also seeing strong interest in pursuing explainable AI. Most of the AI models at present suffer from the black box effect, or a lack of transparency in their decision making processes. Companies are seeking to change this paradigm by creating AI models that can detail how they arrived at the outputs or decisions they make.

Lastly, companies are aware of growing data privacy concerns. That’s why AI techniques aimed at preserving data privacy, such as federated learning, are becoming in-demand. Organizations want to maximize AI model training without any user data privacy compromises.

We make sure we are on top of these trends, so we can build solutions that allow AI teams to keep a competitive edge.

How do you see AI-powered agents and automation changing the data collection landscape?

Currently, AI models make use of structured datasets that are mostly collected manually. These datasets also go through preprocessing, cleansing, and other procedures that usually involve human intervention. This is set to change in the near future with the rise of AI agents for autonomous collection and processing of data for AI training. They make it possible to automatically learn from real-time web data at an unprecedented scale.

We have created infrastructure that supports the deployment and evolution of AI agents, enabling smooth access to high-quality, real-time data on the web. This technology allows sophisticated AI systems to continuously interface with dynamic web data, learn from it, and grow bigger and better.

AI agents can transform industries as they allow AI systems to access and learn from constantly changing datasets on the web instead of relying on static and manually processed data. This can lead to banking or cybersecurity AI chatbots, for example, that are capable of coming up with decisions that reflect the most recent realities. This results in massive efficiency advances and more areas for automation.

At Bright Data, we are not only enabling this transformation in the data collection landscape. We believe we are at the forefront, introducing a technology that ushers the next generation of artificial intelligence. We are excited to assist businesses and AI teams as they harness the full potential of AI agents for their operations.

Thank you for the great interview, readers who wish to learn more should visit Bright Data.

The post Or Lenchner, CEO of Bright Data – Interview Series appeared first on Unite.AI.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Bright Data AI数据收集 数据伦理 数据隐私
相关文章