Skip to content

Add DuckDB zipfile support example #22

@OlivierBinette

Description

@OlivierBinette

DuckDB can read zipped tsv file directly from PatentsView bulk data download URLs using the new zipfs extension.

For example, using the DuckDB CLI:

INSTALL zipfs FROM community;
LOAD zipfs;

SELECT * FROM read_csv("zip://https://s3.amazonaws.com/data.patentsview.org/download/g_patent.tsv.zip/g_patent.tsv") LIMIT 5;
┌───────────┬─────────────┬─────────────┬───┬────────────┬───────────┬───────────────┐
│ patent_id │ patent_type │ patent_date │ … │ num_claims │ withdrawn │   filename    │
│   int64   │   varchar   │    date     │   │   int64    │   int64   │    varchar    │
├───────────┼─────────────┼─────────────┼───┼────────────┼───────────┼───────────────┤
│  10000000 │ utility     │ 2018-06-19  │ … │         20 │         0 │ ipg180619.xml │
│  10000001 │ utility     │ 2018-06-19  │ … │         12 │         0 │ ipg180619.xml │
│  10000002 │ utility     │ 2018-06-19  │ … │          9 │         0 │ ipg180619.xml │
│  10000003 │ utility     │ 2018-06-19  │ … │         18 │         0 │ ipg180619.xml │
│  10000004 │ utility     │ 2018-06-19  │ … │          6 │         0 │ ipg180619.xml │
├───────────┴─────────────┴─────────────┴───┴────────────┴───────────┴───────────────┤
│ 5 rows                                                         8 columns (6 shown) │
└────────────────────────────────────────────────────────────────────────────────────┘

We could use this to simplify the duckdb read-in example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions