gigl.scripts.load_gowalla_to_bq#
Script to load the Gowalla bipartite graph dataset to BigQuery.
The Gowalla dataset is a bipartite graph of users and items (locations). This script downloads the data from the neural graph collaborative filtering repository and loads it to BigQuery as an edge table.
Data format: Each line represents a user and the items they have interacted with: <user_id> <item_id_1> <item_id_2> … <item_id_n>
- Example usage:
python -m gigl.scripts.load_gowalla_to_bq –project <gcp_project_id> –dataset <dataset_name> –table <table_name> –data_file_url https://raw.githubusercontent.com/xiangwang1223/neural_graph_collaborative_filtering/master/Data/gowalla/test.txt
Attributes#
Functions#
|
Convert the Gowalla edge list to JSONL format for BigQuery loading. |
|
Download a file from a URL to a local path. |
|
Load the Gowalla dataset to BigQuery. |
|
Main entry point for the script. |
|
Parse the Gowalla edge list format and yield edge dictionaries. |
Module Contents#
- gigl.scripts.load_gowalla_to_bq.convert_edges_to_jsonl(input_file_path, output_file_path)[source]#
Convert the Gowalla edge list to JSONL format for BigQuery loading.
- Parameters:
input_file_path (str) – Path to the input edge list file.
output_file_path (str) – Path to write the JSONL output.
- Returns:
Number of edges and number of unique users processed.
- Return type:
tuple[int, int]
- gigl.scripts.load_gowalla_to_bq.download_file(url, local_path)[source]#
Download a file from a URL to a local path.
- Parameters:
url (str) – URL to download from.
local_path (str) – Local path to save the file.
- Return type:
None
- gigl.scripts.load_gowalla_to_bq.load_gowalla_to_bigquery(project, dataset, table, data_file_url, src_column=DEFAULT_SRC_COLUMN, dst_column=DEFAULT_DST_COLUMN, recreate_table=True)[source]#
Load the Gowalla dataset to BigQuery.
- Parameters:
project (str) – GCP project ID.
dataset (str) – BigQuery dataset name.
table (str) – BigQuery table name.
data_file_url (str) – URL to download the Gowalla data from.
src_column (str) – Name of the source column. Defaults to ‘src’.
dst_column (str) – Name of the destination column. Defaults to ‘dst’.
recreate_table (bool) – Whether to recreate the table if it exists. Defaults to True.
- Return type:
None
- gigl.scripts.load_gowalla_to_bq.parse_gowalla_edges(file_path)[source]#
Parse the Gowalla edge list format and yield edge dictionaries.
Each line in the file represents a user and the items they have interacted with: <user_id> <item_id_1> <item_id_2> … <item_id_n>
This function yields one edge (user -> item) per interaction.
- Parameters:
file_path (str) – Path to the file containing edge data.
- Yields:
dict[str, int] – Dictionary with ‘src’ (user) and ‘dst’ (item) keys.
- Return type:
Iterator[dict[str, int]]
- gigl.scripts.load_gowalla_to_bq.DEFAULT_TEST_URL: Final[str] = 'https://raw.githubusercontent.com/xiangwang1223/neural_graph_collaborative_filtering/master/Data...[source]#