Skip to content
Change the repository type filter

All

    Repositories list

    • A robust web archive analytics toolkit
      Cython
      Apache License 2.0
      1813600Updated Apr 24, 2026Apr 24, 2026
    • Python
      0000Updated Apr 18, 2026Apr 18, 2026
    • ChatNoir Web Frontend
      Python
      Apache License 2.0
      11012Updated Apr 18, 2026Apr 18, 2026
    • 🔍 Simple, type-safe access to the ChatNoir search API.
      Python
      MIT License
      1825Updated Mar 1, 2026Mar 1, 2026
    • 🔍 Use the ChatNoir search engine in PyTerrier.
      Python
      MIT License
      0416Updated Feb 9, 2026Feb 9, 2026
    • web-content-extraction-benchmark

      Public
      Web Content Extraction Benchmark
      Python
      Apache License 2.0
      72430Updated Dec 16, 2025Dec 16, 2025
    • Jupyter Notebook
      0420Updated Dec 21, 2023Dec 21, 2023
    • This pipeline allows extracting data from WARC files on a CPU cluster and streaming it to a GPU server, where it is processed.
      Python
      MIT License
      3710Updated May 7, 2023May 7, 2023
    • ChatNoir Indexer
      Python
      0000Updated Dec 2, 2022Dec 2, 2022
    • ChatNoir Web Frontend
      Java
      MIT License
      6800Updated Mar 25, 2022Mar 25, 2022
    • chatnoir2-indexer

      Public archive
      ChatNoir Indexer
      Java
      MIT License
      2900Updated Nov 5, 2021Nov 5, 2021
    • CopyCat is a resource for deduplication in TREC-style experimental setups.
      Arc
      MIT License
      0710Updated Nov 3, 2021Nov 3, 2021
    • webis-uuid

      Public archive
      Webis UUID Generation Tool
      Java
      MIT License
      0200Updated Jun 23, 2020Jun 23, 2020
    • ChatNoir HDFS Map File Generator
      Java
      Apache License 2.0
      2500Updated Dec 14, 2018Dec 14, 2018
    • Java
      0100Updated Nov 16, 2017Nov 16, 2017
    • aitools3-ie-stopwords

      Public archive
      Java
      0100Updated Nov 16, 2017Nov 16, 2017
    • Java
      0000Updated Nov 16, 2017Nov 16, 2017
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.