COVID-19 Open Research Dataset

Note

This page is part of the Dataset Import section, which includes examples of loading popular datasets into YDB. Before starting, please review the general information on requirements and the import process.

An open dataset of COVID-19 research.

Source: Kaggle - COVID-19 Open Research Dataset Challenge

Size: 1.65 GB (metadata.csv file)

Loading Example

Download and unzip the metadata.csv file from Kaggle.
The dataset includes completely identical rows. Since YDB requires unique primary key values, add a new column named row_id to the file, where the key value will be equal to the row number in the original file. This prevents the removal of duplicate data. This operation can be carried out using the awk command:
```
awk 'NR==1 {print "row_id," \$0; next} {print NR-1 "," \$0}' metadata.csv > temp.csv && mv temp.csv metadata.csv
```

Create a table in YDB using one of the following methods:

Embedded UI

YDB CLI

For more information on Embedded UI.

CREATE TABLE `covid_research` (
    `row_id` Uint64 NOT NULL,
    `cord_uid` Text NOT NULL,
    `sha` Text NOT NULL,
    `source_x` Text NOT NULL,
    `title` Text NOT NULL,
    `doi` Text NOT NULL,
    `pmcid` Text NOT NULL,
    `pubmed_id` Text NOT NULL,
    `license` Text NOT NULL,
    `abstract` Text NOT NULL,
    `publish_time` Text NOT NULL,
    `authors` Text NOT NULL,
    `journal` Text NOT NULL,
    `mag_id` Text,
    `who_covidence_id` Text,
    `arxiv_id` Text,
    `pdf_json_files` Text NOT NULL,
    `pmc_json_files` Text NOT NULL,
    `url` Text NOT NULL,
    `s2_id` Uint64,
    PRIMARY KEY (`row_id`)
)
WITH (
    STORE = COLUMN
);

ydb sql -s \
'CREATE TABLE `covid_research` (
    `row_id` Uint64 NOT NULL,
    `cord_uid` Text NOT NULL,
    `sha` Text NOT NULL,
    `source_x` Text NOT NULL,
    `title` Text NOT NULL,
    `doi` Text NOT NULL,
    `pmcid` Text NOT NULL,
    `pubmed_id` Text NOT NULL,
    `license` Text NOT NULL,
    `abstract` Text NOT NULL,
    `publish_time` Text NOT NULL,
    `authors` Text NOT NULL,
    `journal` Text NOT NULL,
    `mag_id` Text,
    `who_covidence_id` Text,
    `arxiv_id` Text,
    `pdf_json_files` Text NOT NULL,
    `pmc_json_files` Text NOT NULL,
    `url` Text NOT NULL,
    `s2_id` Uint64,
    PRIMARY KEY (`row_id`)
)
WITH (
    STORE = COLUMN
);'

Execute the import command:

ydb import file csv --header --null-value "" --path covid_research metadata.csv

Analytical Query Example

Run a query to determine the journals with the highest number of publications:

Embedded UI

YDB CLI

SELECT
    journal,
    COUNT(*) AS publication_count
FROM covid_research
WHERE journal IS NOT NULL AND journal != ''
GROUP BY journal
ORDER BY publication_count DESC
LIMIT 10;

ydb sql -s \
'SELECT
    journal,
    COUNT(*) AS publication_count
FROM covid_research
WHERE journal IS NOT NULL AND journal != ""
GROUP BY journal
ORDER BY publication_count DESC
LIMIT 10;'

Result:

┌───────────────────────────────────┬───────────────────┐
│ journal                           │ publication_count │
├───────────────────────────────────┼───────────────────┤
│ "PLoS One"                        │ 9953              │
├───────────────────────────────────┼───────────────────┤
│ "bioRxiv"                         │ 8961              │
├───────────────────────────────────┼───────────────────┤
│ "Int J Environ Res Public Health" │ 8201              │
├───────────────────────────────────┼───────────────────┤
│ "BMJ"                             │ 6928              │
├───────────────────────────────────┼───────────────────┤
│ "Sci Rep"                         │ 5935              │
├───────────────────────────────────┼───────────────────┤
│ "Cureus"                          │ 4212              │
├───────────────────────────────────┼───────────────────┤
│ "Reactions Weekly"                │ 3891              │
├───────────────────────────────────┼───────────────────┤
│ "Front Psychol"                   │ 3541              │
├───────────────────────────────────┼───────────────────┤
│ "BMJ Open"                        │ 3515              │
├───────────────────────────────────┼───────────────────┤
│ "Front Immunol"                   │ 3442              │
└───────────────────────────────────┴───────────────────┘

Was the article helpful?

E-Commerce Behavior Data

Netflix Movies and TV Shows