Fedora Magazine: Get started with Apache Cassandra on Fedora

NoSQL databases are every bit as popular today as more conventional, relational ones. One of the most popular NoSQL systems is Apache Cassandra. It’s designed to deal with big data, and can be scaled across large numbers of servers. This makes it resilient and highly available.

This package is relatively new on Fedora, since it was introduced on Fedora 26. The following article is a short tutorial to set up Cassandra on Fedora for a development environment. Production deployments should use a different set-up to harden the service.

Install and configure Cassandra

The set of database packages in Fedora’s stable repositories are the client tools in the cassandra package. The common library is in the cassandra-java-libs package (required by both client and server). The most important part of the database, the daemon, is available in the cassandra-server package. Some more supporting packages may be listed by running the following command in a terminal.

dnf list cassandra*

First, install and start the service:

$ sudo dnf install cassandra cassandra-server
$ sudo systemctl start cassandra

To enable the service to automatically start at boot time, run:

$ sudo systemctl enable cassandra

Finally, test the server initialization using the client:

$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> CREATE KEYSPACE k1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> USE k1;
cqlsh:k1> CREATE TABLE users (user_name varchar, password varchar, gender varchar, PRIMARY KEY (user_name));
cqlsh:k1> INSERT INTO users (user_name, password, gender) VALUES ('John', 'test123', 'male');
cqlsh:k1> SELECT * from users;

 user_name | gender | password
-----------+--------+----------
      John |   male |  test123

(1 rows)

To configure the server, edit the file /etc/cassandra/cassandra.yaml. For more information about how to change configuration, see the the upstream documentation.

Controlling access with users and passwords

By default, authentication is disabled. To enable it, follow these steps:

  1. By default, the authenticator option is set to AllowAllAuthenticator. Change the authenticator option in the cassandra.yaml file to PasswordAuthenticator:
authenticator: PasswordAuthenticator
  1. Restart the service.
$ sudo systemctl restart cassandra
  1. Start cqlsh using the default superuser name and password:
$ cqlsh -u cassandra -p cassandra
  1. Create a new superuser:
cqlsh> CREATE ROLE <new_super_user> WITH PASSWORD = '<some_secure_password>' 
    AND SUPERUSER = true 
    AND LOGIN = true;
  1. Log in as the newly created superuser:
$ cqlsh -u <new_super_user> -p <some_secure_password>
  1. The superuser cannot be deleted. To neutralize the account, change the password to something long and incomprehensible, and alter the user’s status to NOSUPERUSER:
cqlsh> ALTER ROLE cassandra WITH PASSWORD='SomeNonsenseThatNoOneWillThinkOf'
    AND SUPERUSER=false;

Enabling remote access to the server

Edit the /etc/cassandra/cassandra.yaml file, and change the following parameters:

listen_address: external_ip
rpc_address: external_ip
seed_provider/seeds: "<external_ip>"

Then restart the service:

$ sudo systemctl restart cassandra

Other common configuration

There are quite a few more common configuration parameters. For instance, to set the cluster name, which must be consistent for all nodes in the cluster:

cluster_name: 'Test Cluster'

The data_file_directories option sets the directory where the service will write data. Below is the default that is used if unset. If possible, set this to a disk used only for storing Cassandra data.

data_file_directories:
    - /var/lib/cassandra/data

To set the type of disk used to store data (SSD or spinning):

disk_optimization_strategy: ssd|spinning

Running a Cassandra cluster

One of the main features of Cassandra is the ability to run in a multi-node setup. A cluster setup brings the following benefits:

The following sections describe how to setup a simple two-node cluster.

Clearing existing data

First, if the server is running now or has ever run before, you must delete all the existing data (make a backup first). This is because all nodes must have the same cluster name and it’s better to choose a different one from the default Test cluster name.

Run the following commands on each node:

$ sudo systemctl stop cassandra
$ sudo rm -rf /var/lib/cassandra/data/system/*

If you deploy a large cluster, you can do this via automation using Ansible.

Configuring the cluster

To setup the cluster, edit the main configuration file /etc/cassandra/cassandra.yaml. Modify these parameters:

Configuration files for a two-node cluster follow.

Node 1:

cluster_name: 'My Cluster'
num_tokens: 256
seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        - seeds: 10.0.0.1, 10.0.0.2
listen_address: 10.0.0.1
rpc_address: 10.0.0.1
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap: false

Node 2:

cluster_name: 'My Cluster'
num_tokens: 256
seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        - seeds: 10.0.0.1, 10.0.0.2
listen_address: 10.0.0.2
rpc_address: 10.0.0.2
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap: false

Starting the cluster

The final step is to start each instance of the cluster. Start the seed instances first, then the remaining nodes.

$ sudo systemctl start cassandra

Checking the cluster status

In conclusion, you can check the cluster status with the nodetool utility:

$ sudo nodetool status

Datacenter: datacenter1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns    Host ID                               Rack
UN  10.0.0.2  147.48 KB  256          ?       f50799ee-8589-4eb8-a0c8-241cd254e424  rack1
UN  10.0.0.1  139.04 KB  256          ?       54b16af1-ad0a-4288-b34e-cacab39caeec  rack1
 
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

Cassandra in a container

Linux containers are becoming more popular. You can find a Cassandra container image here on DockerHub:

centos/cassandra-3-centos7

It’s easy to start a container for this purpose without touching the rest of the system. First, install and run the Docker daemon:

$ sudo dnf install docker
$ sudo systemctl start docker

Next, pull the image:

$ sudo docker pull centos/cassandra-3-centos7

Now prepare a directory for data:

$ sudo mkdir data
$ sudo chown 143:143 data

Finally, start the container with a few arguments. The container uses the prepared directory to store data into and creates a user and database.

$ docker run --name cassandra -d -e CASSANDRA_ADMIN_PASSWORD=secret -p 9042:9042 -v `pwd`/data:/var/opt/rh/sclo-cassandra3/lib/cassandra:Z centos/cassandra-3-centos7

Now you have the service running in a container while storing data into the data directory in the current working directory. If the cqlsh client is not installed on your host system, run the one provided by the image with the following command:

$ docker exec -it cassandra 'bash' -c 'cqlsh '`docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' cassandra`' -u admin -p secret'

Conclusion

The Cassandra maintainers in Fedora seek co-maintainers to help keep the package fresh on Fedora. If you’d like to help, simply send them email.


Photo by Glen Jackson on Unsplash.


Source From: fedoraplanet.org.
Original article title: Fedora Magazine: Get started with Apache Cassandra on Fedora.
This full article can be read at: Fedora Magazine: Get started with Apache Cassandra on Fedora.

Advertisement
Boost your Website Traffic with iNeedHits Today! Shop Now!


Random Article You May Like

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*