07:28:36 GMT Hi , Is this error harmful , I am getting this in my test env . class Redis { public $socket = resource(21) of type (Redis Socket Buffer) 09:36:24 GMT dudeji: it is your test environment, how should we know? 11:12:38 GMT Hi, my slave got out of sync from master, and master does not have enough disk space to write the RDB file, and due to this Slaves are not able to get in sync back with the master. 11:13:10 GMT Can I use the partial replication to get the slaves back in sync? 11:25:00 GMT Any one here? 11:34:20 GMT is there a good redis book? 12:34:26 GMT gautamsomani: nope 12:34:48 GMT mstevens: Redis in Action 12:35:47 GMT badboy_: it's 2013, things haven't changed too much? 12:36:47 GMT well, it can't cover more recent additions, but it's still a pretty strong book on what was there in 2013 13:07:46 GMT badboy - so what other option do I have? 13:08:32 GMT get a bigger disk onto the live system 13:08:40 GMT or mount one over the network^^ 13:09:16 GMT But then I have to restart Redis or changing the "dir" option using Redis-CLI would make redis dump files there? 13:09:27 GMT In case I can mount, that is. 13:10:37 GMT yes 13:10:53 GMT Which one? Restart? Or Runtime change via redis-cli? 13:10:53 GMT restarting seems not to be an option 13:10:58 GMT No, not at all. 13:10:58 GMT runtime change 13:11:00 GMT Okay 13:11:04 GMT Thanks, will try and update 13:11:05 GMT :) 13:11:18 GMT once you've done that get a strategy so it does not happen again 13:18:27 GMT Have told the management, their call, my job is to fix (and learn in process ). Though I sometimes I really do want to enforce things 13:28:01 GMT Hi , Is there any chance of redis getting crashed after a high load ? What will happen in case of very high load for redis ? 13:28:59 GMT I don't see how that would happen 13:36:51 GMT I have cloned a redis cluster of 6 nodes (copied block devices) and I can't get the clone cluster in a healthy state.. the 3 slaves seem to get data from the master but every instance seems to be in "fail?" state even though theyre all connected to each other... any idea? 13:40:12 GMT badboy - yeah its working, thanks a lot man :) 13:40:16 GMT Have more questions though 13:40:34 GMT Master is dumping data in chunks, can I increase the size of these chunks? 13:45:53 GMT The cluster bus port (the client port + 10000) must be reachable from all the other cluster nodes. 13:45:56 GMT This was the issue 13:48:02 GMT dudeji: high load in what regard? 13:48:12 GMT gautamsomani: what chunks do you mean? 13:48:27 GMT In logs I see the line "RDB: 157 MB of memory used by copy-on-write" 13:48:36 GMT command at very high pace , in that case too many connection open close 13:48:43 GMT Also, when redis master finishes dumping data, it is then that slave starts to read 13:48:46 GMT and catches up 13:48:50 GMT and once it catches up 13:49:09 GMT Redis master again starts - "Starting BGSAVE for SYNC with target: disk" 13:49:12 GMT dudeji: it should not crash, no, that would be a serious bug 13:49:32 GMT This means, that master is dumping data to be replicated in Chunks 13:49:46 GMT the line you posted just says the additional memory that was used due to how copy-on-write on linux works 13:49:55 GMT Oh 13:50:07 GMT But I ahve 20Gb of Free RAM more 13:50:09 GMT normally it should just dump the rdb once, send that over and once the slave is synced it will get a stream of write commands 13:50:14 GMT no chunking 13:50:20 GMT gautamsomani: so? 13:50:36 GMT redis didn't need that, it only needed 157 MB on top of what it is already using 13:50:51 GMT Okay, so it is not related to chunking - i got confused there 13:50:57 GMT right 13:51:07 GMT So the question then becomes - how do I make redis slave stream all the data from redis master 13:51:09 GMT if it immediately wants a full sync again, something else is wrong 13:51:22 GMT it does that by itself under normal circumstances 13:51:31 GMT can you upload log of both the master and slave? 13:51:33 GMT Okay, let me check what things I had changed while troubleshooting, will get back in 5 mins 13:51:40 GMT (only the parts starting from the sync) 13:51:44 GMT Okay 13:51:46 GMT Doing 13:51:53 GMT on pastebin, right? 13:52:18 GMT right 13:55:52 GMT Master - http://pastebin.com/4SF2MBsv 13:56:29 GMT Slave - http://pastebin.com/ua2Y68fB 13:58:28 GMT uhm, that's weird 14:00:15 GMT slow or broken disk on the slave side? 14:01:06 GMT Nope 14:01:08 GMT Not at all 14:02:37 GMT not even sure how it could get a EINPROGRESS at that point in the code path 14:04:10 GMT vmstat and top command shows no WAIT, which means disk is not an issue, right? 14:04:50 GMT weeeell, 14:08:42 GMT Okay, one thing that I had done was - config set save "" 14:09:11 GMT Had done this to disable to RDB to force Redis to switch to Stream based replication 14:09:22 GMT Would a stack trace help here? 14:10:25 GMT redis always uses a stream-based replication except for the initial sync 14:10:32 GMT the background saving has nothing to do with the replication 14:21:05 GMT So in that case why does Slave does some catching up, then stops, and then when master log says "DB saved on disk", then slave starts again to catch up? 14:21:29 GMT Can I increase verbose logging in runtime in master? 14:32:07 GMT Increased that 14:43:52 GMT 21663:S 13 Jan 19:22:03.939 # Write error or short write writing to the DB dump file needed for MASTER <-> SLAVE synchronization: Operation now in progress 14:43:56 GMT because of that 14:44:09 GMT for some reason it fails to write, thus aborts the sync, then retries a sync 14:55:24 GMT So you mean its not progressing in the first place? 15:10:52 GMT Found the issue, disk issue on slave as well :P 15:11:02 GMT Mounted more there too, started working 15:11:17 GMT But, partial replication or stream replication is still not happening 17:02:31 GMT aaand that's why I asked ... 17:02:53 GMT partial replication can only happen if it synced once and then briefly loses connection with the master 17:03:06 GMT how do you know the streaming after initial sync is not working? 17:03:12 GMT oh, not around anymore 17:49:49 GMT how long a connection loss can it survive with partial replication, badboy_? 17:50:15 GMT depends on your configuration 17:50:27 GMT is there some backlog setting? 17:51:00 GMT https://github.com/antirez/redis/blob/unstable/redis.conf#L395-L419 17:51:42 GMT gautamsomani: 17:51:45 GMT 18:02 @ badboy_> aaand that's why I asked ... 17:51:45 GMT 18:02 @ badboy_> partial replication can only happen if it synced once and then briefly loses connection with the master 17:51:48 GMT 18:03 @ badboy_> how do you know the streaming after initial sync is not working? 17:53:08 GMT So I know cause the logs say so … "Partial resynchronization not possible (no cached master)" 17:53:25 GMT Sorry 17:53:34 GMT This was not about Streaming 17:53:46 GMT 16:11 gautamsomani> But, partial replication or stream replication is still not happening 17:53:53 GMT yes 17:53:55 GMT Gimme a min 17:53:59 GMT Will get the logs 17:55:44 GMT 27319:S 13 Jan 20:38:12.252 * Connecting to MASTER 10.85.139.202:6379 17:55:44 GMT 27319:S 13 Jan 20:38:12.252 * MASTER <-> SLAVE sync started 17:55:44 GMT 27319:S 13 Jan 20:38:12.253 * Non blocking connect for SYNC fired the event. 17:55:44 GMT 27319:S 13 Jan 20:38:12.254 * Master replied to PING, replication can continue... 17:55:44 GMT 27319:S 13 Jan 20:38:12.254 * Partial resynchronization not possible (no cached master) 17:55:45 GMT 27319:S 13 Jan 20:38:12.805 * Full resync from master: 834ab3878b72145a774cb38db7ff4f97142d9e6e:372136784998 17:55:45 GMT 27319:S 13 Jan 20:39:34.179 * MASTER <-> SLAVE sync: receiving 5455810147 bytes from master 17:55:46 GMT 27319:S 13 Jan 20:40:25.946 * MASTER <-> SLAVE sync: Flushing old data 17:55:46 GMT 27319:S 13 Jan 20:41:15.204 * MASTER <-> SLAVE sync: Loading DB in memory 17:55:47 GMT 27319:S 13 Jan 20:42:07.559 * MASTER <-> SLAVE sync: Finished with success 17:56:01 GMT please don't paste logs in the channel 17:56:04 GMT Sorry 17:56:06 GMT also that looks just fine. 17:56:57 GMT So after I added space on the slave, it is then that slaves caught up with the master, which means they used disk space, right? 17:57:26 GMT And that would mean that streaming, which should not use or depend on disk space at all, did not happen. 17:57:29 GMT yes. the slave received the full rdb from the master, wrote it to disk, then loaded into memory and is now receiving a stream of write commands to stay in sync 17:58:00 GMT I still don't know why you think "streaming did not happen" 17:58:07 GMT Oh Okay 17:58:26 GMT I am assuming that initial sync should also happen via Streaming only. 17:58:54 GMT well, the rdb dump is "streamed", but then stored away 17:59:31 GMT there's diskless replication, but that only affects the master for now 18:01:47 GMT Meaning master does not need to store data on disk for replication to happen for slaves, but slaves have to? No way to stop slave from storing on disk at all? 18:01:58 GMT no 18:02:07 GMT afaik there's code for that, but it was never merged 18:03:28 GMT Okay 18:13:49 GMT thanks a lot badboy_ for all the help, sorted out all, thanks a lot :) 18:14:29 GMT you're welcome 20:35:21 GMT +1 to badboy_'s recommendation of Redis in Action