E-Commerce Behavior Data
Note
This page is part of the Dataset Import section, which includes examples of loading popular datasets into YDB. Before starting, please review the general information on requirements and the import process.
User behavior data from a multi-category online store.
Source: Kaggle - E-commerce behavior data
Size: 9 GB
Loading Example
-
Download and unzip the
2019-Nov.csvfile from Kaggle. -
The dataset includes completely identical rows. Since YDB requires unique primary key values, add a new column named
row_idto the file, where the key value will be equal to the row number in the original file. This prevents the removal of duplicate data. This operation can be carried out using the awk command:awk 'NR==1 {print "row_id," \$0; next} {print NR-1 "," \$0}' 2019-Nov.csv > temp.csv && mv temp.csv 2019-Nov.csv -
Create a table in YDB using one of the following methods:
Embedded UIYDB CLIFor more information on Embedded UI.
CREATE TABLE `ecommerce_table` ( `row_id` Uint64 NOT NULL, `event_time` Text NOT NULL, `event_type` Text NOT NULL, `product_id` Uint64 NOT NULL, `category_id` Uint64, `category_code` Text, `brand` Text, `price` Double NOT NULL, `user_id` Uint64 NOT NULL, `user_session` Text NOT NULL, PRIMARY KEY (`row_id`) ) WITH ( STORE = COLUMN, UNIFORM_PARTITIONS = 50 );ydb sql -s \ 'CREATE TABLE `ecommerce_table` ( `row_id` Uint64 NOT NULL, `event_time` Text NOT NULL, `event_type` Text NOT NULL, `product_id` Uint64 NOT NULL, `category_id` Uint64, `category_code` Text, `brand` Text, `price` Double NOT NULL, `user_id` Uint64 NOT NULL, `user_session` Text NOT NULL, PRIMARY KEY (`row_id`) ) WITH ( STORE = COLUMN, UNIFORM_PARTITIONS = 50 );' -
Execute the import command:
ydb import file csv --header --null-value "" --path ecommerce_table 2019-Nov.csv
Analytical Query Example
Identify the most popular product categories on November 1, 2019:
SELECT
category_code,
COUNT(*) AS view_count
FROM ecommerce_table
WHERE
SUBSTRING(CAST(event_time AS String), 0, 10) = '2019-11-01'
AND event_type = 'view'
GROUP BY category_code
ORDER BY view_count DESC
LIMIT 10;
ydb sql -s \
'SELECT
category_code,
COUNT(*) AS view_count
FROM ecommerce_table
WHERE
SUBSTRING(CAST(event_time AS String), 0, 10) = "2019-11-01"
AND event_type = "view"
GROUP BY category_code
ORDER BY view_count DESC
LIMIT 10;'
Result:
┌────────────────────────────────────┬────────────┐
│ category_code │ view_count │
├────────────────────────────────────┼────────────┤
│ null │ 453024 │
├────────────────────────────────────┼────────────┤
│ "electronics.smartphone" │ 360650 │
├────────────────────────────────────┼────────────┤
│ "electronics.clocks" │ 43581 │
├────────────────────────────────────┼────────────┤
│ "computers.notebook" │ 40878 │
├────────────────────────────────────┼────────────┤
│ "electronics.video.tv" │ 40383 │
├────────────────────────────────────┼────────────┤
│ "electronics.audio.headphone" │ 37489 │
├────────────────────────────────────┼────────────┤
│ "apparel.shoes" │ 31013 │
├────────────────────────────────────┼────────────┤
│ "appliances.kitchen.washer" │ 28028 │
├────────────────────────────────────┼────────────┤
│ "appliances.kitchen.refrigerators" │ 27808 │
├────────────────────────────────────┼────────────┤
│ "appliances.environment.vacuum" │ 26477 │
└────────────────────────────────────┴────────────┘