Network Topology

The Stream AI network employs a hierarchical structure to optimize data flow and task distribution:

a) Edge Nodes (ENs): Form the base layer of the network, responsible for executing scraping tasks and initial data processing.

b) Gateway Nodes (GNs): Act as intermediaries between Device Nodes and the wider network, managing task distribution and data aggregation.

c) Validator: Overseeing network integrity and data quality.

d) AI Data Agent: Receive and understand public data requests from AI Models/Applications, make judgement, choose the existing Data Formula or Generate a new Data Formula.

Task Distribution and Execution

The process of data collection and processing in the Stream AI network follows these steps:

1.Task Creation: AI Data agent receive and understand public data requests from AI Models/Applications, create the data collection task.

  1. Data Formula Generation: AI generate the requested Data Formula, including task identifier, target URL and executable parser.

3.Task Distribution: Gateway Nodes break down tasks into smaller, manageable requests and distribute them among available Edge Nodes.

4.Task execution: Edge Nodes execute Formula tasks, extract requested public data from assigned sources.

5.Validation: Validator Nodes verify the integrity and quality of the collected data. The proofs will be processed to the Stream Ledger.

6.Data Registry: Verified data is added to the Stream AI Data Registry, stored in decentralized Vector Database, ready for AI consumption.

7.Data Aggregation: The AI Data Registry aggregates the requested public data and generate AI datasets according to Task ID, empower AI Models/Applications for training or RAG building.

Privacy and Security Measures

Stream AI implements robust privacy and security measures to protect both network participants and data sources:

a) Anonymized Contributions: Device Nodes contribute resources anonymously, preserving user privacy.

b) Encrypted Data Transfer: All data transferred within the network is encrypted end-to-end.

c) Ethical Scraping Practices: Adherence to robots.txt files and website terms of service to ensure ethical data collection.

d) Decentralized Storage: Collected data is stored across the network, minimizing the risk of large-scale data breaches.