PythonEnthusiast PythonEnthusiast - 4 years ago 125
Python Question

Cassandra Replicates data on all nodes when RF = 2

I've set up a Cassandra Cluster with 4 nodes in total with 2 nodes being seed nodes and the other 2 being normal nodes. I've set replication factor as 2.

Here is my cassandra.yaml. Apart from the following values, every setting remains the same default value.

rpc_address: 0.0.0.0
endpoint_snitch: GossipingPropertyFileSnitch


I installed cassandra on all my 4 nodes with the above configuration (ofcourse having different listen_address)

Next, I ran the
sync_tables.py
file on all the 4 nodes.

Following is the
sync_tables.py
file

connection.setup(CLUSTER_NODES_LIST, "mad")

create_keyspace_simple("mad", replication_factor=2)
models_list = []
sync = True
if sync:
for model in models_list:
sync_table(model)


It created KEYSPACE 'mad' and N column families.

Now when I fired a query to insert a data on
seed1
, it replicates the data on all the 4 nodes. Why is this? I set my RF as 2, then also its getting replicated on all the 4 nodes.

when I did
DESCRIBE KEYSPACE mad;
, it results as follow:-

CREATE KEYSPACE mad WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': '2'
};


which clearly shows that RF has been set as 2. Is this a normal behaviour. Why is it getting replicated on all the 4 nodes even when RF is set as 2.

Answer Source

The data is only replicated to two nodes, but you can read/write from any in the cluster. See here for more information on request coordination.

To check which nodes in the cluster have replicas, you can use nodetool:

nodetool getendpoints <keyspace> <table> <key value>
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download