gigl.scripts.load_gowalla_to_bq#

Script to load the Gowalla bipartite graph dataset to BigQuery.

The Gowalla dataset is a bipartite graph of users and items (locations). This script downloads the data from the neural graph collaborative filtering repository and loads it to BigQuery as an edge table.

Data format: Each line represents a user and the items they have interacted with: <user_id> <item_id_1> <item_id_2> … <item_id_n>

Example usage:

python -m gigl.scripts.load_gowalla_to_bq –project <gcp_project_id> –dataset <dataset_name> –table <table_name> –data_file_url https://raw.githubusercontent.com/xiangwang1223/neural_graph_collaborative_filtering/master/Data/gowalla/test.txt

Attributes#

Functions#

convert_edges_to_jsonl(input_file_path, output_file_path)

Convert the Gowalla edge list to JSONL format for BigQuery loading.

download_file(url, local_path)

Download a file from a URL to a local path.

load_gowalla_to_bigquery(project, dataset, table, ...)

Load the Gowalla dataset to BigQuery.

main()

Main entry point for the script.

parse_gowalla_edges(file_path)

Parse the Gowalla edge list format and yield edge dictionaries.

Module Contents#

gigl.scripts.load_gowalla_to_bq.convert_edges_to_jsonl(input_file_path, output_file_path)[source]#

Convert the Gowalla edge list to JSONL format for BigQuery loading.

Parameters:
  • input_file_path (str) – Path to the input edge list file.

  • output_file_path (str) – Path to write the JSONL output.

Returns:

Number of edges and number of unique users processed.

Return type:

tuple[int, int]

gigl.scripts.load_gowalla_to_bq.download_file(url, local_path)[source]#

Download a file from a URL to a local path.

Parameters:
  • url (str) – URL to download from.

  • local_path (str) – Local path to save the file.

Return type:

None

gigl.scripts.load_gowalla_to_bq.load_gowalla_to_bigquery(project, dataset, table, data_file_url, src_column=DEFAULT_SRC_COLUMN, dst_column=DEFAULT_DST_COLUMN, recreate_table=True)[source]#

Load the Gowalla dataset to BigQuery.

Parameters:
  • project (str) – GCP project ID.

  • dataset (str) – BigQuery dataset name.

  • table (str) – BigQuery table name.

  • data_file_url (str) – URL to download the Gowalla data from.

  • src_column (str) – Name of the source column. Defaults to ‘src’.

  • dst_column (str) – Name of the destination column. Defaults to ‘dst’.

  • recreate_table (bool) – Whether to recreate the table if it exists. Defaults to True.

Return type:

None

gigl.scripts.load_gowalla_to_bq.main()[source]#

Main entry point for the script.

gigl.scripts.load_gowalla_to_bq.parse_gowalla_edges(file_path)[source]#

Parse the Gowalla edge list format and yield edge dictionaries.

Each line in the file represents a user and the items they have interacted with: <user_id> <item_id_1> <item_id_2> … <item_id_n>

This function yields one edge (user -> item) per interaction.

Parameters:

file_path (str) – Path to the file containing edge data.

Yields:

dict[str, int] – Dictionary with ‘src’ (user) and ‘dst’ (item) keys.

Return type:

Iterator[dict[str, int]]

gigl.scripts.load_gowalla_to_bq.DEFAULT_DST_COLUMN: Final[str] = 'to_item_id'[source]#
gigl.scripts.load_gowalla_to_bq.DEFAULT_SRC_COLUMN: Final[str] = 'from_user_id'[source]#
gigl.scripts.load_gowalla_to_bq.DEFAULT_TEST_URL: Final[str] = 'https://raw.githubusercontent.com/xiangwang1223/neural_graph_collaborative_filtering/master/Data...[source]#
gigl.scripts.load_gowalla_to_bq.DEFAULT_TRAIN_URL: Final[str] = 'https://raw.githubusercontent.com/xiangwang1223/neural_graph_collaborative_filtering/master/Data...[source]#
gigl.scripts.load_gowalla_to_bq.logger[source]#