Getting Started

Running an Ethereum full node

To run the crawler effectively, you’ll either have to:

  • make an account with an Ethereum full node provider in the cloud

  • run a full node yourself.

But to make this decision, please consider that crawling Ethereum through a cloud provider is considerably slower than co-locating the crawler and the Ethereum full node on one machine. Co-location is faster simply through mere proximity of the processes and the lack of having to transport over a big network that can produce latency and failures.

Internally, we’ve successfully executed the crawler over > 1TB data sets by running it on the same machine as a fully-sync’ed Erigon Ethereum full node. But for testing things and playing around, Infura or Alchemy are more than sufficient!

Installation

First, we’re downloading the source code:

# EITHER: Clone the repository if you want to use the CLI
git clone git@github.com:attestate/crawler.git

# OR: install the dependency via npm
npm install @attestate/crawler

# Copy the example .env file
# ⚠️ Be sure to update the variables in `.env` with the appropriate values!
cp .env-copy .env

# Install the dependencies
npm i

Before we can run the crawler, however, we’ll have to make sure all mandatory environment variables are set in the .env file.

Configuring environment variables

The following environment variables are required:

RPC_HTTP_HOST=https://
DATA_DIR=data
IPFS_HTTPS_GATEWAY=https://
ARWEAVE_HTTPS_GATEWAY=https://

For more details on the crawler’s environment variable configuration, head over to the environment variable reference docs.

Using the command line interface

Alternatively, the crawler can be used on a UNIX-compatible command line interface. You can find the crawler.mjs file in the root of the source code directory.

crawler.mjs [command]

Commands:
  crawler.mjs run [path]                  Run a crawl given a path
  crawler.mjs range [path] [table] [key]  Query an LMDB key range in a table

Options:
  --help     Show help                                                 [boolean]
  --version  Show version number                                       [boolean]

Configuring Your First Crawl

To run a crawl, next up, configure the crawl path.