01:43:26 GMT This channel active? 05:23:44 GMT Antirez: about that 32bit struct padding issue, i'm sorry to say, "your compiler fooled you" 05:24:00 GMT C standard requires that any native type supported by the language be memory aligned to its size. (regardless of the fact that 'long long' on 32bit arch is implemented with 32bit ASM instructions. 05:24:17 GMT you can see that in https://en.wikipedia.org/wiki/Data_structure_alignment: "32-bit x86".... "A long long (eight bytes) will be 8-byte aligned." 05:24:33 GMT it also requires that the struct size will be padded to aligned with the size of it's largest member (so that they can be put in an array and all instances will obey the rules) similarly it requires that malloc always return addresses aligned to the largest size the language support. 05:25:00 GMT in your experiment you allocated the struct on the stack and did no other use with it. 05:25:23 GMT the compiler saw that and decided to pack that struct (to be 16 bytes size rather than 24) and allocate it on the stack in a *non-8byte aligned address*, in a way which will cause the 64bit member in that struct to reside in an 8 byte aligned address. 05:25:34 GMT if you change your experiment to use malloc, i'm sure you'll find different results. 05:25:51 GMT never underestimate the power of the force (or in this case the compiler) 05:26:02 GMT 8-) 05:41:21 GMT oranagra: you are right! It's kinda strange since my guess was that anyway it's accessed as 2 32-bit halves, but apparently it's not the case if mmx instructions are used even with 32 bit processors. 05:41:32 GMT well anyway, we have the proof that's worth to change it. 05:42:07 GMT oranagra: btw thanks for the DEBUG LOAD PR :-)) 05:43:10 GMT it's an old C spec thing (largest native type supported by the compiler) thing... i guess they invented it before the MMX existed.. 05:43:24 GMT but it's amazing to see what the compiler can do 05:43:41 GMT i.e. put the struct in a non aligned address so that a certain member will be aligned. 05:44:58 GMT i wasn't able to do 32 bit compilation myself (some trouble with my Ubuntu), but i remember being overwhelmed but this exact discovery few years ago. 05:45:43 GMT (the stack thing) 05:46:45 GMT i tested that DEBUG LOAD pr on both aof and rdb, and wrote you some notes in the PR. 06:30:04 GMT hey guys. anyone know when the redisconf videos are getting published? 06:45:47 GMT oranagra: indeed sizeof(struct ...) tells the truth about the alignment 06:45:58 GMT slact: no idea, but I guess a matter of 1/2 weeks? 06:46:44 GMT checking the LOAD PR 06:49:46 GMT oranagra: in the current incarnation DEBUG LOAD is too much "powerful" IMHO. It will crash the server if there are duplicated keys 06:50:10 GMT If we want to do it powerful, the way to go is probably to provide options to DEBUG RELOAD 06:50:22 GMT like "NOSAVE", "NOFLUSH", ... 06:50:34 GMT in the case of "NOFLUSH" however we need to make sure we can pass flags to loading functions 06:50:40 GMT so that no crash will happen on duplicated keys 06:50:53 GMT just the keys already existing could be ignored 07:16:55 GMT oranagra: btw pausing this LOAD thing since I'm focusing on your other memory PR that looks like having interesting ideas especially around dict.c/h 07:22:19 GMT antirez: Even though what oran claimed seems logical and is backed by that wikipedia page, a test Oran and I just did using malloc (and also printing dictEntry size with a 32bit redis build) shows the struct is 16bytes and the 64bit union is 4 bytes aligned. So in practice there's no need for any change (tested with gcc 5.3.1). 07:23:15 GMT yoav_s: yes this is because of what I said (that I was not aware of): requirements for alignment between base types and structures members is different according to C standard 07:23:27 GMT minimum alignment required for 8 bytes structure fields: 4 bytes 07:23:38 GMT minimum alignment required for 8 bytes types: 8 bytes 07:23:46 GMT Dont Ask Me Why (TM) 07:24:03 GMT But after all given that it's so fucked up, maybe it's better to follow Oran advice? 07:24:18 GMT So that it makes sense at least for the limitations of our poor non-ANSI minds 07:33:52 GMT i'm very confused.. i sware i saw that a few years ago.... and as you said, the rule you mentioned above doesn't make any sense. 07:34:23 GMT i feel the urge to get to the bottom of that, but there are currently too many other things to do (as always). 07:37:57 GMT regarding the DEBUG LOAD, in the light of what you said, i suggest to keep the 2 minutes of work guideline we had last night.. i's not a core feature / command of redis, let's just add the emptyDb call back.. 07:38:45 GMT if you aggree i can update the pr, or you can make that change when merging (i don't think it needs testing) 08:18:47 GMT @oranagra at this point I've the feeling that there is "in the air" some willingness to simplify the black parts of the ANSI standard. For example there are people that are starting to advocate strongly that the fact signed overflow is unspecified is pretty bad... Let's hope we'll see some simplification. Btw I think that it's best to follow your initial advice and align the structure more naturally, so that 08:18:48 GMT there are no doubts on the behavior. After all it should not make any difference. 08:20:26 GMT But actually... it also makes sense to leave things as they are 08:20:41 GMT given that the effect we wanted (memory efficiency) is already achieved, better if we don't tough anything? 08:21:29 GMT Or, given we are in a dict.c/h restructuring phase, we may just change the code in order to group dict entries together and get more memory efficiency, less misses, and so forth, very easily. 08:38:53 GMT i don't see any reason to keep it as is.. (e.g. we don't need binary compatibility with old version in that struct)... let's change it even if it's just because we are ignorant. 08:40:26 GMT just pity i wasted so much of our time on nothing. (although i do feel i learned something... but not sure what that is) 08:41:55 GMT btw, after spotting these issues by chance i went looking for a tool that can find them. 08:42:29 GMT found http://linux.die.net/man/1/pahole, which uses the debug info in an executable to locate padding and even cache line inefficiencies. 08:44:51 GMT it did find some holes in areas that we don't care (server struct, client struct, etc).. and also in slowlog and some other structs of which we have several hundred instances.. but it really didn't think it indicated any problem in dictEntry. 08:46:15 GMT ohh, i was running it on a 64bit instance.. 08:47:01 GMT yoav: can you try it on a 32bit binary? 08:49:04 GMT actually never mind.. we already know what the compiler generates for it. 08:56:06 GMT antirez: I found a bug and optimization in module automemory: 08:56:32 GMT if I'm in a tight loop and I'm freeing objects manually, there are two problems with autoMemoryFreed: 08:57:07 GMT 1. it counts from 0 to the top of the "stack", which means if I'm releasing the latest element allocated (which is usually the case), I'm traversing it for nothing 08:57:36 GMT 2. it had no break after we've found the element freed, causing it to slow down proportionally to the size of the stack 08:58:02 GMT after reversing the loop and fixing the bug, it accelerated a use case in a very tight loop by ~X1000 08:58:28 GMT well X100 08:58:30 GMT but still 08:59:18 GMT I'll make a PR 09:20:29 GMT antirez, enjoy :) https://github.com/antirez/redis/pull/3244 09:52:39 GMT dvirsky__: good work, I agree that to optimize to release the latest element is the way to go 09:53:35 GMT thanks. BTW a future optimization might be to detect cases where the stack is huge but I'm releasing one of the latest elements, not necessarily the lates 09:53:59 GMT and in this case it might be worth switching the latest element and the deleted element and reducing the size 09:54:23 GMT actually, it's an O(1) action because we've already found the deleted element. 09:54:41 GMT let me try to just do that :) 09:57:44 GMT antirez: how about this 09:58:21 GMT http://pastebin.com/bNx25aDK 09:58:22 GMT dvirsky__: I think it's not the case to over complicate it, who wants performances free stuff ASAP 09:58:35 GMT dvirsky__: the only thing I would do, is to always check first and last element as a first try 09:58:49 GMT so that your current patch does not lower the performance in the other use case whcih is obvious that is... 09:58:51 GMT for (...) alloc 09:58:53 GMT for (...) dealloc 09:59:04 GMT so that elements are allocated and dellocated in the same order 09:59:12 GMT true 09:59:21 GMT other than that IMHO it's the module writer that should care about not using a huge pull 09:59:23 GMT it's also a good heuristic 09:59:32 GMT pool 09:59:46 GMT actually, if I would have freed everything manually this wasn't a problem 09:59:58 GMT and if I used auto completely, it also wasn't a problem 10:00:07 GMT the problem is only when mixing them together :) 10:00:39 GMT ok, let me add the heuristic you suggested. WDYT about the idea in the pastebin? 10:00:42 GMT yep, but most of the times, now that you fixed that break bug, and if we add the head/tail first check, it will be hard to do wrong things 10:00:57 GMT it will reduce realloc of the whole stack 10:01:08 GMT because mixing will most likely be in the flavor of "let's free it at least inside this loop" hopefully :-) 10:01:30 GMT exactly what I did - I wrote a scan iterator of the keys 10:01:51 GMT I wanted to trim the inverted index buffers to their actual size after indexing is done (saves ~15% ram) 10:01:54 GMT yep, IMHO totally makes sense to always use automatic memory management *but* during big loops 10:02:12 GMT otherwise why to bother releasing memory :-) 10:02:21 GMT I even demonstrated it in my talk 10:02:27 GMT in the zset iter example 10:02:52 GMT GC + tight loop = BAD. always :) 10:03:07 GMT so let me revise my PR 10:07:00 GMT Ok, I'll merge in a latter time anyway since I'm trying to focus a bit on modules work 10:07:11 GMT need to limit interruptions for some time to get this done 10:18:14 GMT antirez - cool. hope I didn't complicate it too much 10:18:33 GMT worst case - revert to my original PR 10:40:30 GMT antirez, one last interrupt about this - adding your heuristic really complicates things because for it to be optimal I now have to remember the bottom element in the queue 10:40:40 GMT So I'm making 2 PRs, simple and super optimized 10:40:43 GMT you choose :) 10:56:55 GMT the second approach - https://github.com/antirez/redis/pull/3245 10:57:13 GMT I think we should go with the first, it will cover most cases and is much simpler 12:45:35 GMT dvirsky__: I'll make sure to analyze both approaches 12:46:06 GMT antirez: cool. there's a third one as well, but really... :P 12:46:07 GMT btw modules types need to export also a 'digest' method that I forgot initially, for DEBUG DIGEST concerns 12:46:19 GMT what does this mean? 12:46:23 GMT not too bad but we are at 5 methods now :-) 12:46:37 GMT rdb_save, rdb_load, aof_rewrite, digest, free 12:46:44 GMT digest is a signature of the value? 12:46:56 GMT yes, that depending on the type is an unordered or ordered signature 12:47:15 GMT ordered signatures are SHA1(element || SHA1(element || ...) 12:47:18 GMT while unordered use XOR 12:47:43 GMT yep, and I can keep a state somewhere if I don't want to calculate that 12:47:56 GMT like in a trie update the digest on writes 12:48:37 GMT maybe have it default to just SHA1 on the serialized data if not specified? 12:48:49 GMT nah, bad idea 13:01:51 GMT antirez: I think I'm going to open source redisearch at the end of the day today 13:02:11 GMT a few tiny fixes left for version 0.0001 :) 13:43:55 GMT dvirsky__: cool! 13:44:00 GMT happy hacking 14:23:07 GMT antirez: the unstable redis scli, if you specify a port, and that server is not running, will connect silently to 6379, and not tell you that 14:23:16 GMT are you aware of that or should I open a bug? 14:23:33 GMT also it prints a bunch of stuff 14:24:41 GMT in fact it always connects to 6379 no matter what port you asked, but writes the desired port in the prompt. quite the mind fuck! 14:41:05 GMT is it safe to copy a large-ish rdb file out from a running redis instance? 14:41:25 GMT is there some sort of file locking risk that may be introduced? 14:55:18 GMT is there anythng i can do to improve redis performence? i can only get to 40 k ops/sec(no pipelining)? 14:57:50 GMT BrownPanick, no risk at all. a dump.rdb file is closed. when writing a new one it gets written as a temporary file then swapped 14:58:36 GMT ws2k3, provided you are running it on a decent machine that is not overloaded, this is probably a network issue and not a redis issue 14:58:48 GMT what results are you getting with redis-benchmark? 15:00:44 GMT thanks dvirsky__ 15:01:22 GMT of course you are better off using replication, but what replication sync does basically is write an rdb then send it to the slave 15:12:53 GMT RediSearch is now open. A high performance full text search engine as a redis module. Still not stable or production ready, but it works and fast :) https://github.com/RedisLabsModules/RediSearch 15:28:12 GMT minus: yeah, that's a good point. sorry, i stepped off yesterday before you responded. i'll look into using the monitor command for now until we can get downstream more testable; thanks!