r-trendingreddits-comments
Reddit comments scraped in real-time from /r/trendingreddits. Some comments may be missing.
Setup
If you've already installed the SDK, you can skip these steps. First, install the library:
pip install --upgrade beneath
Then authenticate your environment by running:
beneath auth
Read the entire table into memory
This snippet loads the entire table into a Pandas DataFrame, which is useful for analysis in notebooks or scripts:
import beneathdf = await beneath.load_full("examples/reddit/r-trendingreddits-comments")
The function accepts several optional arguments. The most common are to_dataframe=False
to get records as a regular Python list, filter="..."
to filter by key fields, and max_bytes=...
to increase the cap on how many records to load (used to prevent runaway costs). For more details, see the API reference.
Replay the table's history and subscribe to changes
This snippet replays the table's historical records one-by-one and stays subscribed to new records (with at-least-once delivery), which is useful for alerting and data enrichment:
import beneathasync def callback(record):print(record)await beneath.consume("examples/reddit/r-trendingreddits-comments", callback)
The function accepts several optional arguments. The most common are replay_only=True
to stop the script once the replay has completed, changes_only=True
to only subscribe to changes, and subscription_path="ORGANIZATION/PROJECT/subscription:NAME"
to persist the consumer's progress.
Lookup records by key
Use the snippet below to lookup records by key.
import beneathclient = beneath.Client()table = await client.find_table("examples/reddit/r-trendingreddits-comments")cursor = await table.query_index(filter={"created_on": ..., "id": ...})record = await cursor.read_one()# records = await cursor.read_next() # for range or prefix filters that return multiple records
Analyze with SQL
This snippet runs a warehouse (OLAP) query on the table and returns the result, which is useful for ad-hoc joins, aggregations, and visualizations:
import beneathdf = await beneath.query_warehouse("SELECT count(*) FROM `examples/reddit/r-trendingreddits-comments`")
See the warehouse queries documentation for a guideline to the SQL query syntax.
Reference
Consult the Beneath Python client API reference for details on all classes, methods and arguments.