How to use the Ethereum Blockchain as a File system

By: Omar Metwally, MD

Principal Investigator

Analog Labs, Logisome, Inc. R&D

omar@analog.earth

redwoods.jpg
 

Purpose

To securely and permanently store information and make it available to anyone in the world.

Technical Objectives

  1. To encrypt data using the AES-256 cipher.

  2. To upload data to the public Ethereum blockchain.

  3. To download data from the public Ethereum blockchain.

Prerequisites

You’ll need to download and install the Ethereum client written in the Go programming language, also known as the “Go Ethereum client.”


Introduction

Using a public blockchain as a filesystem is a relatively costly method of storing data that confers the advantages of permanence and global accessibility by any computer with an internet connection [1]. A public blockchain is perpetuated by all nodes, which total an estimated 8,194 nodes at the time of writing, of which 2,675 (32.65%) are located in the United States, 1,068 nodes (13.03%) are located in Germany, and 661 nodes (8.07%) are located in China [2]. Because each node stores an identical copy of the blockchain either in its entirely or a cryptographic summary of the blockchain’s state, data stored in this manner are not susceptible to loss and corruption, as is the case with data stored on centralized servers [3]. Ensuring the permanence and authenticity of data is critical to the operation of corporations and governmental bodies, for instance to store information about customers, internal documents, and for enforcement of intellectual property laws. Analog Labs, the Research and Development Division of Logisome, Inc., benefits from the blockchain data structure’s properties in order to securely archive information generated by research activities and to make this information publicly accessible.

Methods

All data on a public blockchain can be freely downloaded by anyone running the client software. Files optionally can be encrypted using encryption ciphers before being uploaded to the blockchain as a way of restricting access to individuals capable of decryption. To encrypt data using the AES-256 cipher as implemented in the OpenSSL library:

openssl aes-256-cbc -a -salt -in file.txt -out file.txt.enc

Where file.txt is the name of the input file and file.txt.enc is the encrypted output file.

The code contained in the Analog Labs Github repository chunks an input file into a sequence of 32 byte segments and uploads them one by one to the public Ethereum blockchain [4]. Each transaction is temporally separated by a user-defined time interval (120 seconds by default) to ensure that the chunks are mined in the correct order.

A file can be downloaded from the public blockchain, once again in 32 byte chunks, and reconstituted in a similar manner.

Usage

Interaction with the public blockchain requires an internet connection and a running client. The following command entered in a Linux or Mac OS terminal starts the Go Ethereum client in interactive mode:

geth --syncmode light console

Clone or download the Blackswan repository and navigate to the local copy on your machine. In this example, the repository was cloned to the Desktop:

mkdir ~/Desktop/blackswan
cd ~/Desktop/blackswan
git init
git remote add origin https://github.com/AnalogLabs/blackswan
git pull origin master

Upload

To upload file.txt in unencrypted form to the public blockchain, the Ethereum account that will be used must have sufficient funds and must be unlocked prior to running the script that uploads the file in 32 byte chunks. Run these commands at your own discretion only when you understand what they entail [5]. The zero numeral in this command specifies the zero-indexed Ethereum account on the local host:

> eth.unlockAccount(eth.accounts[0])

In a separate Terminal window, run the bytelock.py script to chunk and sequentially upload the file, transaction by transaction, to the blockchain. Note that this requires a variable amount of funds and time and is generally a far more expensive, albeit permanent, way of storing information compared to the Interplanetary Filesystem (IPFS) [6].

python3 bytelock.py

The script will prompt the user to specify a file path and name as well as the Ethereum account password for the zero-indexed account. Note that this is not a secure way of handling passwords and is provided for demonstration purposes only. A more secure way would be to unlock the account using the client, as described above, and leaving the prompt blank (pressing Enter).

Download

Downloading information from the blockchain does not require an unlocked account or any funds. The Blackswan contract (bxs.sol) utilizes the following methods to query the number of records in the ledger and to retrieve a record:

get_record(address _author, uint _index) public view returns(string memory _desc)
get_num_records(address _author) public view returns (uint)
update_record(uint _index, string memory _desc, uint start_index, uint endindex) public returns (bool) 

These are called pythonically when the following script is run and need not be executed outside this context:

python3 download.py

The above script prompts the user for an output file name, a start index, and a terminal index, the latter two corresponding to the index of the first and last chunks on the blockchain. The get_chunk method is iteratively called to download each chunk sequentially, and the chunks are concatenated and saved as the output file name.

Conclusion

Storing information directly to the public Ethereum blockchain makes it permanently and universally accessible to anyone. The blockchain data structure ensures the authenticity of the information to the extent that a majority of nodes cannot collude to execute a 51% attack and overwrite the blockchain. The slowness and high cost of this approach to using a public blockchain as a filesystem limits its utility to situations in which a cost-benefit analysis justifies the use of the blockchain in this manner.

References

  1. Trolling for a Wealthier World. Omar Metwally. October 16th, 2017. https://omarmetwally.blog/2017/10/16/trolling-for-a-wealthier-world/

  2. Ethereum Node Tracker. https://etherscan.io/nodetracker

  3. On the Economics of Knowledge Creation and Sharing. Omar Metwally. September 12th, 2017. http://adsabs.harvard.edu/abs/2017arXiv170907390M.
    arXiv:1709.07390
    .

  4. Black Swan Intellectual Property Ledger. Analog Labs. https://github.com/AnalogLabs/blackswan.

  5. Go-Ethereum Command Line Options. https://github.com/ethereum/go-ethereum/wiki/Command-Line-Options.

  6. Interplanetary Filesystem. https://ipfs.io/.