Reddit comments scraped in real-time from /r/wallstreetbets. Some comments may be missing.


If you haven't already, install the Beneath library and authenticate your environment by following this guide.

Read the entire stream into memory

This snippet loads the entire stream into a Pandas DataFrame, which is useful for analysis in notebooks or scripts:

import beneath
df = await beneath.load_full("examples/reddit/r-wallstreetbets-comments")

The function accepts several optional arguments. The most common are to_dataframe=False to get records as a regular Python list, filter="..." to filter by key fields, and max_bytes=... to increase the cap on how many records to load (used to prevent runaway costs). For more details, see the API reference.

Replay the stream's history and subscribe to changes

This snippet replays the stream's historical records one-by-one and stays subscribed to new records, which is useful for alerting and data enrichment:

import beneath
async def callback(record):
await beneath.consume("examples/reddit/r-wallstreetbets-comments", callback)

The function accepts several optional arguments. The most common are replay_only=True to stop the script once the replay has completed, changes_only=True to only subscribe to changes, and subscription_path="ORGANIZATION/PROJECT/subscription:NAME" to persist the consumer's progress.

Lookup records by key

Use the snippet below to lookup stream records by key.

import beneath
client = beneath.Client()
stream = await client.find_stream("examples/reddit/r-wallstreetbets-comments")
cursor = await stream.query_index(filter={"created_on": ..., "id": ...})
record = await cursor.read_one()
# records = await cursor.read_next() # for range or prefix filters that return multiple records
You can also pass filters that match multiple records based on a key range or key prefix. See the filter docs for syntax guidelines.

Analyze with SQL

This snippet runs a warehouse (OLAP) query on the stream's records and returns the result, which is useful for ad-hoc joins, aggregations, and visualizations:

import beneath
df = await beneath.query_warehouse("SELECT count(*) FROM `examples/reddit/r-wallstreetbets-comments`")

See the warehouse queries documentation for a guideline to the SQL query syntax.


Consult the Beneath Python client API reference for details on all classes, methods and arguments.