Issue 652: Make repository smaller by rewriting its history

Title	Make repository smaller by rewriting its history
Priority	wish	Status	resolved
Superseder		Nosy List	atorralba, florian, gabi, jendrik, malte, patfer, silvan
Assigned To	patfer	Keywords
Optional summary

Created on 2016-05-03.13:42:03 by jendrik, last changed by patfer.

Files
File name	Uploaded	Type	Edit	Remove
README	jendrik, 2016-05-03.13:54:17	application/octet-stream
analyze_repo_stats.py	jendrik, 2016-05-03.13:53:59	text/x-python
generate_repo_stats.sh	jendrik, 2016-05-03.13:54:09	application/x-shellscript
repo_stats_20_12_20.txt	patfer, 2019-12-20.10:19:30	text/plain

Messages
msg9711 (view)	Author: malte	Date: 2020-08-06.13:45:56
Thanks! As a final step, can we also create a tag, for example 1.1? Otherwise it is very inviting to download the only tag (1.0), and that version doesn't work. Once this is done, I would suggest marking this issue as resolved.
msg9710 (view)	Author: patfer	Date: 2020-08-06.13:22:08
issue5 is merged into the main branch. all branches except the main branch are deleted (empty, pull-request-target, 2019-12-Convert2Git, 2019-12-Cleanup, issue5)
msg9709 (view)	Author: florian	Date: 2020-08-06.13:16:57
> Florian has approved the branch on github, though I don't know if this approval > is current. Yes, nothing changed after my approval.
msg9708 (view)	Author: malte	Date: 2020-08-06.13:06:06
Hi Patrick, are you OK to merge the "issue5" branch of the convert repository now and tag a new release? I don't think the current master version (without the reordering of commits) really works, it would be good to fix this ASAP. Florian has approved the branch on github, though I don't know if this approval is current. BTW, to keep things tidy I would also recommend deleting the following branches on github: empty, pull-request-target, 2019-12-Convert2Git, 2019-12-Cleanup.
msg9697 (view)	Author: malte	Date: 2020-07-29.09:40:59
Yes, I can see them now.
msg9696 (view)	Author: patfer	Date: 2020-07-29.09:27:36
For me the changes show up in the pull request (both in the commit of the branch and I see the changes in the file diff view). Can you see the changes now?
msg9687 (view)	Author: malte	Date: 2020-07-28.19:37:47
The version with my changes works in my tests. I tested it on one repository, and the history was compatible both after Mercurial clean-up and after git conversion. I've also tested the two-step and single-step way of conversion as well as the redirect option, and all resulting repositories look good. I've pushed the change to the convert-downward repository, and the commit shows up in the commit list, but not in the pull request. Perhaps the repository owner needs to do something for this?
msg9686 (view)	Author: malte	Date: 2020-07-28.19:11:55
My understanding from the discussion between Patrick, Silvan, Jendrik and me was that Silvan, Jendrik and I thought that the reordering should be an implementation detail, so I agree with Florian's comment. I also agree with the other things he mentioned on the pull request. I've worked on a small revision that hopefully addresses these points and am currently testing it. If it works, I'll push it to github.
msg9684 (view)	Author: patfer	Date: 2020-07-28.15:46:34
The current signature of our run-cleanup.sh script is: ./run-cleanup.sh MERCURIAL_REPOSITORY ORDERED_REPOSITORY CLEANED_MERCURIAL_REPOSITORY with ORDERED_REPOSITORY is created during the script and internally used to order the commits. I added it (when merging run-order.sh and run-cleanup.sh) to the signature, such that the user has the full control where to store it and to investigate it (it won't be deleted), especially if something is wrong. As we enforce now that the user's repository contains already all commits from the latest Mercurial master, I do not know what could go wrong in the ordering. One of Florian suggestions was it to make this a temporary directory for the 'run-cleanup.sh' script (and remove it from its signature). I also dislike the current signature, but like to have every intermediate directory available to investigate if something is faulty. What are your opinions?
msg9683 (view)	Author: florian	Date: 2020-07-28.14:20:28
I left a few comments on GitHub. One thing I find a bit strange is the way how the two steps (ordering and cleanup) were merged into one. The script for this merged step now takes two parameters for the two intermediate results. This makes it still feel like two steps instead of a single step. For me this was the reason to argue for separate steps: each step would have one version of the repository as its output. In the current state, the first step generates two versions of the repository. I like this less than the previous version but from msg9669 it sounds like you already discussed this. If so, I'm fine with leaving things as they are.
msg9682 (view)	Author: silvan	Date: 2020-07-27.22:07:05
I tested the version from the branch for converting a repository and found no problems.
msg9680 (view)	Author: malte	Date: 2020-07-27.18:57:57
> Has someone done a review for the modification: "merge run-order.sh > and run-cleanp.sh" > > Can I merge this into the main branch and tag version 1.1? I think Florian said he'd have a look.
msg9679 (view)	Author: patfer	Date: 2020-07-27.17:23:18
I think all desired changes are incorporated into the pull request, but I do not know what the state of the reviews is. Has someone done a review for the modification: "merge run-order.sh and run-cleanp.sh" Can I merge this into the main branch and tag version 1.1?
msg9678 (view)	Author: malte	Date: 2020-07-27.17:22:05
With issue950 marked as resolved, I think this one should also be marked as resolved. I haven't looked at the code again and haven't tested the script, but given that nobody complained after the last round of revisions, I assume everybody is happy with the current version. If not, please speak up. One final request from me: can we tag the current code as version 1.1 of convert-downward on github? The only tagged version is 1.0, so people might be tempted to use this. But I think we've found that it doesn't work reliably enough.
msg9669 (view)	Author: patfer	Date: 2020-07-23.12:29:10
We decided today that the steps 'ordering of the commits' and 'cleaning up the repository' will be merged into one step.
msg9661 (view)	Author: florian	Date: 2020-07-20.23:11:16
Fine with me. I don't think it's such a big deal either way.
msg9660 (view)	Author: malte	Date: 2020-07-20.15:08:31
We had a brief discussion over Zoom and decided to only support converting repositories that include all commits from hg.fast-downward.org. This will be checked at the start of the script to avoid long waiting times before an error is reported. @Florian: if you would like this to be more general, also supporting repositories that don't have all commits, we can always develop the script further after our initial announcement and make further script releases at a later time. In our discussion, Jendrik, Silvan and I were against supporting this.
msg9659 (view)	Author: malte	Date: 2020-07-20.13:32:04
Florian is on holiday, and it doesn't look like we agree on how to proceed. Can we perhaps schedule a Zoom call today between those who are available and have participated in the discussion so far (Silvan, Jendrik, Patrick, Malte)? Of course others are also welcome to join. I'll write a message on Discord to schedule the call.
msg9657 (view)	Author: patfer	Date: 2020-07-20.10:37:56
If I understand this correct. You concluded that you do need no changes (multiple heads warning is already in the readme; missing branches-> readme tells to pull from hg.fast-downward.org). I added the test suggested by Florian for the issue323 and ipc-2011-fixes branches. @Florian: you said issue323 is not required? Does this mean, we could remove the check for this and skip the strip is issue323 is not present or shall we still enforce that issue323 is present?
msg9656 (view)	Author: florian	Date: 2020-07-18.21:49:10
Ok, I understand that view and I think the readme already covers it by saying that if you run into problems you should pull from official repository before conversion. I wasn't suggesting to add any functionality beyond that, just to add a check at the beginning that the required branches are there and to exit if they are not. So instead of running for 5 minutes and producing an incompatible repository we could immediately tell the user "this is not going to work". This would safe people with repositories like mine some work. Its not that important, though, so I'm fine with not having the check there.
msg9655 (view)	Author: malte	Date: 2020-07-18.15:57:39
> About the first issue: I don't think this has to do with whether the repository is > behind our final hg version. The situation I mean can be created by > > hg clone hg.fast-downward.org -r default I would view such a repository as behind our repository. It lacks commits from our repository. Basically, if $ hg incoming http://hg.fast-downward.org lists something, you're behind in the sense I meant it. In this case, you're missing 16 commits from the branches you mention as well as from the two release branches.
msg9654 (view)	Author: florian	Date: 2020-07-18.15:51:55
About the first issue: I don't think this has to do with whether the repository is behind our final hg version. The situation I mean can be created by hg clone hg.fast-downward.or -r default This would give you a perfectly up-to-date repository but the conversion would fail because it relies on the branches issue323 and ipc-2011-fixes being there. They are open branches in the official repository that we delete after the conversion but the hashes are different if those branches are present during the conversion (well, the ipc branch at least, the other one doesn't matter). Thanks for the advice on the other case. I'll give renaming the branch a try. I will try it with an extension to hg convert first and see if I can fix this on the Mercurial side. I already wrote an extension for hg convert, so I know where to look there and don't have to read up on the fast-export plugins. But it seems there are a lot of options if this fails.
msg9653 (view)	Author: malte	Date: 2020-07-18.15:35:11
> (as in 1.). I meant "as in 2.".
msg9652 (view)	Author: malte	Date: 2020-07-18.15:30:24
> * If the two branches (issue323, ipc-2011-fixes) that will be deleted are not in the > repository, the command to delete them will fail. For issue323, we could delete the branch > only if it is present but ipc-2011-fixes has to be there, otherwise the converted > repository will be incompatible. I suggest we check at the start of the cleanup step if > both branches exist and exit with a suitable explanation if they do not. I wouldn't > automatically pull them in the source repository because so far we do not modify the > source of the conversion and I think that is a good property to have. We could pull them > in an additional intermediate step or exclude them from the strip but I don't think it's > worth the effort. As I said before, I think it's a bad idea to try to support converting repositories that are behind our final hg repository. If you (and Patrick, I think?) think it's important, I don't have a preference regarding what exactly it does. So in that sense the suggestion works for me. > In my case this was the repository of Metis where the code builds on some implementation of > strong stubborn sets which was started on the default branch and only later on used its own > branch. I actually do not know how to fix this. Is there a way to rename a branch for only > some commits? There is no particularly easy way to actually change the branch of these commits because you'd have to rewrite history for all descendants of the changed commits. Three options: 1. You could use a filter in fast-export to change the branch name of the relevant commits before the conversion tool sees it. It doesn't look very complicated (see "Plugins" on https://github.com/frej/fast-export). 2. Before conversion, you could merge the additional head of default into the head of default you like, keeping all code of the head you like. If the head you like is the first parent, you can use the merge-tool ":local" to keep the version you like for all modified files (analogous to git's "ours" merge strategy). Something like (only tested a little): $ hg update the-head-i-want $ hg merge the-head-i-don't-want --tool :local This will still add the files that only exist in the other head to the merge commit, so you will have to "hg rm --force" remove them before committing. (Similarly for files that are deleted without conflict in the other head, but that's perhaps less likely.) use hg diff -r the-head-i-want to make sure you have an empty diff before committing the merge. 3. There must be some cases of multiple heads on the same branch that don't bother fast-export because our cleaned up hg.fast-downward.org repository actually has multiple heads on several branches, and these convert without issue: [on the cleaned-up repository] $ hg heads --closed \| grep ^branch \| sort \| uniq -cd \| sort -n 2 branch: issue104 2 branch: issue114 2 branch: issue120 2 branch: issue139 2 branch: issue170 2 branch: issue172 2 branch: issue181 2 branch: issue203 These branches are all closed, and they are not topological heads. Not sure which of these two properties fast-export cares about; perhaps both. If your additional head is not closed, just update to it and close it with "hg commit --close-branch." Unfortunately this definitely creates a new topological head, so if fast-export doesn't like additional topological heads even when they are closed, you'll have to merge it into something (as in 1.).
msg9651 (view)	Author: florian	Date: 2020-07-17.23:10:52
I tested the code on 17 of my repositories and found two more issues: * If the two branches (issue323, ipc-2011-fixes) that will be deleted are not in the repository, the command to delete them will fail. For issue323, we could delete the branch only if it is present but ipc-2011-fixes has to be there, otherwise the converted repository will be incompatible. I suggest we check at the start of the cleanup step if both branches exist and exit with a suitable explanation if they do not. I wouldn't automatically pull them in the source repository because so far we do not modify the source of the conversion and I think that is a good property to have. We could pull them in an additional intermediate step or exclude them from the strip but I don't think it's worth the effort. * The problem that cost way more time to track down was that two of my repositories have an additional head on default. The ordering and cleanup worked as expected there but the git conversion does something with the additional head that I don't understand. Some issue branches were duplicated and could then not be removed because they were "not fully merged". In my case this was the repository of Metis where the code builds on some implementation of strong stubborn sets which was started on the default branch and only later on used its own branch. I actually do not know how to fix this. Is there a way to rename a branch for only some commits? The problematic thing about the second case is that the conversion script finishes fine and reports that the conversion was successful. The errors that happen on the way are the ones we tell people to ignore. If multiple heads on the same branch are not supported as the readme says, we should check for this and exit the script. Also, I mentioned this by email already but for the record: Ceterum censeo refs/original esse delendam. https://stackoverflow.com/a/7654880/892961
msg9646 (view)	Author: patfer	Date: 2020-07-17.13:58:30
The -at that time- comments from Florian and Jendrik have been incorporated. The code is ready for the next and hopefully last round of reviewing.
msg9640 (view)	Author: malte	Date: 2020-07-16.17:07:25
Any volunteers to review and/or test the code? We want to point to the conversion script when we announce the new release, so this should be looked at for the release.
msg9639 (view)	Author: patfer	Date: 2020-07-16.13:27:58
Indeed we used the GitHub issuetracker (mostly to get an issue number:). https://github.com/aibasel/convert-downward/issues/5 We detected that the order in which the changesets are in the history is important. Thus, we need to ensure that the repository to convert has all commits in a valid order. 1. clone the Mercurial master repository anew (do not allow the user to use a local version, because until recently the clone from hg.fast-downward.org was also in a wrong order) 2. strip away new commits from master repository (important if the user has made commits on the default branch) 3. pull the user repository in the stripped master repository PR: https://github.com/aibasel/convert-downward/pull/6
msg9635 (view)	Author: malte	Date: 2020-07-16.12:39:05
We've detected some further things we would like to change in the clean-up (this issue) and conversion (issue950) scripts to ensure that the clean-up results in compatible history. I see that Patrick is using the github issue tracker of convert-downward to track this, but I think not all of you receive notifications from this, so perhaps you want to have a look and hop over. I'd be happier to keep things in one place, but that's not for me to decide. But we should always tell people when we're moving the discussion to a new place.
msg9563 (view)	Author: malte	Date: 2020-07-09.14:52:50
Thanks for all your work on this, Patrick, and also thanks to the others who contributed!
msg9547 (view)	Author: patfer	Date: 2020-07-09.11:21:32
I added a tag for version v1.0
msg9536 (view)	Author: malte	Date: 2020-07-08.12:45:05
Patrick has made the convert-downward repository public, and we have agreed to tag a release for the version we have used for the conversion. Patrick, can you create the tag?
msg9534 (view)	Author: silvan	Date: 2020-07-08.10:52:58
I fixed the address and I'm happy to leave the rest as it is.
msg9533 (view)	Author: malte	Date: 2020-07-08.10:50:26
Good catch, we should change Manuel's email address to the one without "stud". Regarding which email addresses to use in general and more specifically for Manuela: I think it's not a problem to use a legacy email address that doesn't work any more for people that contributed in the past. For Emil, it's a bit different because we previously had no email address mentioned at all for him. The email address we have now added for him is also the one we would have used back when he made his commits. I'm not strongly against contacting Manuela, but it would open a bit of a can of worms, as I think the same rationale would apply to Cedric, Manuel (even after removing "stud"), Martin and Moritz Gö. The used email address for Manuela reflects the situation when she contributed to Fast Downward, so from my perspective it's fine.
msg9532 (view)	Author: silvan	Date: 2020-07-08.10:42:13
The comparison of manifests looks good to me. Yesterday, I left two comments regarding the authors which seem to got lost: I saw that Manuel's email address is mapped from manuel.heusner@unibas.ch to manuel.heusner@stud.unibas.ch and found this surprising. Is this intended? For Manuela, we use ortlieb@informatik.uni-freiburg.de where I'm not sure if this is a valid email address anymore (of course, it doesn't need to be, but for others like Emil we contacted people and asked, so we could do the same here).
msg9531 (view)	Author: malte	Date: 2020-07-07.22:50:18
Patrick, Florian, Jendrik and I discussed this further on Discord and Zoom, and Florian and Jendrik have pushed some further commits afterwards. We now additionally exclude all the files from msg9530 except for results/preprocess/PROBLEMS (which I had in the "maybe" list) and the last list of files which I listed as candidates but recommended keeping. Jendrik changed the handling of branches merged into other branches not merged into main. I again compared "hg manifest --all" before and after cleanup in meld, and I think it now looks very nice. You can have a look with meld <(hg manifest --all -R master) <(hg manifest --all -R master-cleaned-up) assuming the before/after cleanup repositories are "master" and "master-cleaned-up" in the current directory. The list of users with commits in master-cleaned-up looks clean, too. You can check it with hg log \| grep ^user \| sort \| uniq -c from the cleaned-up hg repository. The code makes sense. The working copy on the head of default/main before/after cleanup only differs in ~/.hgtags, which is at it should be. I didn't do much to check the branch structure or commit messages, but at a glance they make sense. Compared to yesterday's script version, the final .git directory size is down from 17232 KiB to 15060 KiB, which is nice. I didn't do much to test the git repository, but if everyone else is happy with it, then I am too. Some statistics: $ git-sizer --threshold=0 Processing blobs: 28632 Processing trees: 37451 Processing commits: 22441 Matching commits to trees: 22441 Processing annotated tags: 0 Processing references: 1310 \| Name \| Value \| Level of concern \| \| ---------------------------- \| --------- \| ------------------------------ \| \| Overall repository size \| \| \| \| * Commits \| \| \| \| * Count \| 22.4 k \| \| \| * Total size \| 6.14 MiB \| \| \| * Trees \| \| \| \| * Count \| 37.5 k \| \| \| * Total size \| 49.2 MiB \| \| \| * Total tree entries \| 1.22 M \| \| \| * Blobs \| \| \| \| * Count \| 28.6 k \| \| \| * Total size \| 213 MiB \| \| \| * Annotated tags \| \| \| \| * Count \| 0 \| \| \| * References \| \| \| \| * Count \| 1.31 k \| \| \| \| \| \| \| Biggest objects \| \| \| \| * Commits \| \| \| \| * Maximum size [1] \| 1.50 KiB \| \| \| * Maximum parents [2] \| 2 \| \| \| * Trees \| \| \| \| * Maximum entries [3] \| 171 \| \| \| * Blobs \| \| \| \| * Maximum size [4] \| 460 KiB \| \| \| \| \| \| \| History structure \| \| \| \| * Maximum history depth \| 4.54 k \| \| \| * Maximum tag depth \| 0 \| \| \| \| \| \| \| Biggest checkouts \| \| \| \| * Number of directories [5] \| 220 \| \| \| * Maximum path depth [6] \| 6 \| \| \| * Maximum path length [7] \| 72 B \| \| \| * Number of files [8] \| 1.50 k \| \| \| * Total size of files [9] \| 6.80 MiB \| \| \| * Number of symlinks \| 0 \| \| \| * Number of submodules \| 0 \| \| [1] fba4b4dd03e1193ab081a2a1a7f0bbe053d5bb38 [2] 5c9e027316e88ef6eee251c907e506218f4c4324 [3] 526c630c44922a716f2fae47a79b9dad820207c7 (refs/heads/main:experiments) [4] 28a41ec16ee92fcd72f4389f697f6c47b5f9b30d (b28f422155c919a0ab2b7a1c057d703c19fab125:src/dist/data/doc/fast-downward.pdf) [5] 43d7171afef8fcc22ce88567ce27fe97fd503427 (refs/heads/main^{tree}) [6] af0d711aaf96396d8b1ba71cd5122d7c4cf75fc6 (effc886eb88892099e13283fc6b61213531faf65^{tree}) [7] ec21e3c1f977abf9a360534ecf90e167cf140288 (b502abec56be63fa100966ae9bcde269a545e266^{tree}) [8] d0bb5c4140228aa3c29c7437e8330611e0cf0d2d (5c9e027316e88ef6eee251c907e506218f4c4324^{tree}) [9] 0644851e90592fdbfac7fbe69885cd850325d7e5 (cc2d79c787382edef1446da766add08af188f49c^{tree}) top 20 largest files by packed size, bearing in mind that for deltified files, we only see the size of the delta, not the size of the file itself: All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file. size pack SHA location 459 384 28a41ec16ee92fcd72f4389f697f6c47b5f9b30d src/dist/data/doc/fast-downward.pdf 278 240 163a1ceeb56a870fdf9ce11ad6cc72b746f21aeb src/dist/data/doc/translator.pdf 231 57 27aaf7516e9a117b1110f6510221c08ee9da8d0b misc/autodoc/external/txt2tags.py 177 50 b5493b6d986443c7650e169eb160d0c4642d86a2 misc/cpplint.py 61 29 fa451a18288c242ab9dbc99e300ab5078dc624e9 misc/autodoc/external/txt2tags.py 80 17 e78b49bdc05e17d76429e10cb28276976c4e76a5 src/search/ext/btree/btree.h 55 15 4937766fdd188bf13f510b9e9d1c5c2fa57e4b5c downward/search/Doxyfile 87 14 e58a508f48231009276a9aa106046aca205bc9bb src/search/raz_abstraction.cc 81 12 b2f300535a7febfd36f8c605c2ad89c5a637d474 src/search/ext/tree.hh 63 12 fbea03f3dfebc74d03bea3957a644cac83b3706a src/search/merge_and_shrink/transition_system.cc 57 11 d84bc16fc57c23cfa7e8dada24185c835cae94ba src/search/landmarks/landmarks_graph.cc 34 11 2fb2e74d8d7fa1c9286b18af0afa5c00402f56e3 LICENSE.md 82 11 15302fb44458148674d67d1e15e45eaef68561ce src/search/ext/optional.hh 42 9 89c6f4292e3a73db554bc392bdee2f26aa807d96 src/search/landmarks/landmark_factory.cc 34 8 c7b764240bf12690cf5cd332d005c8c473f475e0 src/translate/translate.py 38 8 642f109eadcd0295dc52c85d08053c2816152841 src/search/landmarks/h_m_landmarks.cc 36 8 66b86f656a53f4904bf212ea0eabdbba160d8e1b src/search/cegar/abstraction.cc 32 7 fc89e588d16fdf09b6c2c685ab67f1023e0adf45 src/search/merge_and_shrink/merge_and_shrink_heuristic.cc 35 7 af25e562c9016ed0891e9605acbe025cb0f1c840 src/search/landmarks/landmark_factory_h_m.cc 14 7 526b9556dcdc3b49403242d132198d78dbcf87af misc/cpplint.py top 20 largest files by unpacked size, bearing in mind that for deltified files, we only see the size of the delta, not the size of the file itself: All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file. size pack SHA location 459 384 28a41ec16ee92fcd72f4389f697f6c47b5f9b30d src/dist/data/doc/fast-downward.pdf 278 240 163a1ceeb56a870fdf9ce11ad6cc72b746f21aeb src/dist/data/doc/translator.pdf 231 57 27aaf7516e9a117b1110f6510221c08ee9da8d0b misc/autodoc/external/txt2tags.py 177 50 b5493b6d986443c7650e169eb160d0c4642d86a2 misc/cpplint.py 87 14 e58a508f48231009276a9aa106046aca205bc9bb src/search/raz_abstraction.cc 84 5 201f338aa38eca7cf1a4fd1ccb75654b1e497c86 src/translate/regression-tests/issue49-orig-domain.pddl 82 11 15302fb44458148674d67d1e15e45eaef68561ce src/search/ext/optional.hh 81 12 b2f300535a7febfd36f8c605c2ad89c5a637d474 src/search/ext/tree.hh 80 17 e78b49bdc05e17d76429e10cb28276976c4e76a5 src/search/ext/btree/btree.h 72 5 5f1c513dc656fba6c9179cec29b29d8c88938225 src/bugs/psr-strips-derived-predicates/domain.pddl 67 4 845ccc0d02af34c5c25b12745c991b09daab011b driver/portfolios/seq_sat_remix.py 66 2 fc153719b7bafadf96c949828f9e51dae9c16fdb src/translate/regression-tests/issue73-domain.pddl 63 12 fbea03f3dfebc74d03bea3957a644cac83b3706a src/search/merge_and_shrink/transition_system.cc 61 29 fa451a18288c242ab9dbc99e300ab5078dc624e9 misc/autodoc/external/txt2tags.py 57 11 d84bc16fc57c23cfa7e8dada24185c835cae94ba src/search/landmarks/landmarks_graph.cc 55 15 4937766fdd188bf13f510b9e9d1c5c2fa57e4b5c downward/search/Doxyfile 53 5 4a9f7ed1ae9a3cb2b9dd0c880b4bf214b493fb77 src/search/successor_generator.cc 42 9 89c6f4292e3a73db554bc392bdee2f26aa807d96 src/search/landmarks/landmark_factory.cc 38 8 642f109eadcd0295dc52c85d08053c2816152841 src/search/landmarks/h_m_landmarks.cc 36 8 66b86f656a53f4904bf212ea0eabdbba160d8e1b src/search/cegar/abstraction.cc This looks all looks fine for me. From my side, this is ready to use. To emphasize, I haven't really tested the actual hg to git conversion in terms of investigating the resulting git repository. But if the others are happy to move forward, so am I. We've also discussed what to do about our current hg repository on airepos and on hg.fast-downward.org once we convert. The idea is to leave hg.fast-downward.org alone for quite some time to give people time to transition. We haven't settled on an exact amount yet, and perhaps we want to discuss this in tomorrow morning's sprint meeting. Keeping hg.fast-downward.org around for a while doesn't cause us any effort, and the repository is read-only anyway. But it will also be nice to be able to switch it off eventually and have one less thing to maintain. We will set the repository on airepos to read-only. This is already prepared, we just need to uncomment a hook in ~/.repo/repositories/ai/downward/.hg/hgrc to flip the switch.
msg9530 (view)	Author: malte	Date: 2020-07-07.17:18:24
I looked at "hg manifest --all" after the cleanup to look for additional files that I would remove. To also check that we are not deleting too much, I also looked at "hg manifest --all" before the cleanup and looked at the diff between the two manifests in meld. I think was interesting, so perhaps others want to try that too to double-check. But I didn't find any deletes that I wouldn't delete. List of files in the cleaned-up repository that I recommend removing: - the whole docs directory, consisting of: - docs/GA-NOTES - docs/Haslum_Bemerkungen.txt - docs/Haslum_Vergleich-mit-Paper - docs/Pattern-Selektion (Edelkamp) - docs/Profiling_Haslum - docs/more iPDB questions 1 - docs/more iPDB questions 2 - docs/more iPDB questions 3 - docs/prof_logistics_7-0_actual - docs/prof_logistics_7-0_v3 - docs/prof_logistics_v2 - docs/prof_logistics_v3 (Not to be confused with downward/doc and some other "doc" directories.) I think these are all remnants of the iPDB code integration that should not have been merged, similarly to the iPDB evaluation results that we are already patially removing. Note: quotes are needed in the config file to deal with the filenames in this list that contain spaces. I don't know. - downward/dist/archive/downward-2006-09-26.tar.gz - downward/dist/archive/downward-2006-09-29.tar.gz - We already discused these. Previously I said I'd look into possible inclusion in "downward-ancient", but based on the commit message for a3fd3d95fca2 (in the unconverted repository), I think they can just go. - temp files that were accidentally committed: - src/all.groups - src/output - src/output.sas - src/sas_plan - src/test.groups - src/validate: a VAL wrapper script that we used temporarily; doesn't make much sense any more if we remove VAL, I think - downward/bugs/ff-no-preconditions/search - downward/bugs/safety-net/search - src/bugs/ff-no-preconditions/search - src/bugs/safety-net/search - We already discussed two of these, and the other two are identical. - Under experiments and results_overview, I think that most or all of the files related to the original iPDB experiments should not be retained. If you look at "hg manifest --all" before and after the clean-up, you'll see all the partial ignores for these. I think it's better to remove these altogether rather than in the current more selective way. (It's not that these experiments could be easily rerun; I think they depend on the "scripts" or "new-scripts" that we remove.) Here is the list of additional files I would prune in this category: - experiments/IPDB-Vergleiche/IPDB_Vergleich.tex - experiments/IPDB-Vergleiche/IPDB_max_cliques.tex - experiments/IPDB-Vergleiche/get_tex_table.py - experiments/IPDB-Vergleiche/get_tex_table_downward_only.py - experiments/IPDB-Vergleiche/longtable_ipdb.tex - experiments/IPDB-Vergleiche/longtable_ipdb_max_cliques.tex - experiments/IPDB_Vergleich.tex - experiments/PDB-Experimente-Auswertung/PDB_Experimente.tex - experiments/PDB_Experimente.tex - experiments/gapdb_comparison_20110530/INFO - experiments/gapdb_comparison_extended_20110531/INFO - experiments/gapdb_comparison_haslum_20110604/INFO - experiments/gapdb_v0_20110414/INFO - experiments/gapdb_v0_20110505/INFO - experiments/gapdb_v1_20110421/mo-gapdb_1.sh - experiments/gapdb_v1_20110505/INFO - experiments/gapdb_v2_20110421/mo-gapdb_2.sh - experiments/gapdb_v2_20110505/INFO - experiments/gapdb_v3_20110421/mo-gapdb_3.sh - experiments/gapdb_v3_20110505/INFO - experiments/gapdb_v4_20110421/mo-gapdb_4.sh - experiments/gapdb_v4_20110505/INFO - experiments/gapdb_wrv_v0_20110507/INFO - experiments/gapdb_wrv_v1_20110507/INFO - experiments/gapdb_wrv_v2_20110507/INFO - experiments/gapdb_wrv_v3_20110507/INFO - experiments/gapdb_wrv_v4_20110507/INFO - experiments/get_tex_table.py - experiments/hhh_20110505/INFO - experiments/hhh_ap_20110505/INFO - experiments/hhh_pw_20110505/INFO - experiments/ipdb_hhh_old/airport-b.txt - experiments/ipdb_hhh_old/logistics00-iPDB-default.txt - experiments/ipdb_hhh_old/pnt-iPDB-best.txt - experiments/ipdb_hhh_old/psr-iPDB-default.txt - experiments/ipdb_hhh_old/pwt-iPDB-best.txt - experiments/ipdb_hhh_old/sat-iPDB-default.txt - experiments/ipdb_hhh_old/tpp-iPDB-default.txt - experiments/ipdb_v0_20110308/INFO - experiments/ipdb_v0_20110308/mo-ipdb.sh - experiments/ipdb_v1_20010314/INFO - experiments/ipdb_v1_20010314/mo-ipdb_1.sh - experiments/ipdb_v1_20110311/INFO - experiments/ipdb_v2_20110317/INFO - experiments/ipdb_v2_20110317/mo-ipdb_2.sh - experiments/ipdb_v3_20110317/INFO - experiments/ipdb_v3_20110317/mo-ipdb_3.sh - experiments/ipdb_v4_20110324/INFO - experiments/ipdb_v4_20110324/mo-ipdb_4.sh - experiments/ipdb_v5_20110331/INFO - experiments/ipdb_v6_20110331/INFO - experiments/ipdb_v6_20110406/INFO - experiments/ipdb_v6_20110406/mo-ipdb_6.sh - experiments/ipdb_v7_20110408/INFO - experiments/ipdb_v7_20110408/mo-ipdb_7.sh - experiments/ipdb_v8_20110413/INFO - experiments/ipdb_v9_20110413/INFO - experiments/longtable_ipdb.tex - experiments/max_cliques_exp/bug_vs_end.tex - experiments/max_cliques_exp/get_tex_table_downward_only.py - experiments/max_cliques_exp/longtable_bug_vs_end.tex - experiments/pdb_v0_20110118/INFO - experiments/pdb_v0_20110118/ss-pdb.sh - experiments/pdb_v1_20110301/INFO - experiments/pdb_v1_20110301/ss-pdb-again.sh - experiments/pdb_v1_20110408/INFO - experiments/pdb_v1_withouth_search_20110405/INFO - experiments/pdb_v1_withouth_search_20110405/ss-pdb-v1x.sh - experiments/pdb_v1a_20110311/INFO - experiments/pdb_v1a_20110311/ss-pdb-short.sh - experiments/pdb_v1x_20110413/INFO - experiments/pdb_v2_20110316/INFO - experiments/pdb_v2_20110316/ss-pdb-v2.sh - experiments/pdb_v2_20110410/INFO - experiments/pdb_v2x_20110410/INFO - experiments/pdb_v3_20110413/INFO - experiments/pdb_v3x_20110414/INFO - experiments/pdb_v4_20110415/INFO - experiments/pdb_v4x_20110420/INFO - experiments/pdbs_bug_20110526/INFO - experiments/pdbs_end_20110526/INFO - experiments/pdbs_v0_20110308/ss-pdbs.sh - experiments/pdbs_vs_ipdb/get_tex_table_downward_only.py - experiments/pdbs_vs_ipdb/longtable_pdbs_vs_ipdb.tex - experiments/pdbs_vs_ipdb/pdbs_vs_ipdbs.tex - results_overview/.DS_Store - results_overview/IPDB_Vergleich.tex - results_overview/gapdb_vs_ipdb.tex - results_overview/longtable_gapdb_vs_ipdb.tex - results_overview/longtable_ipdb.tex - results_overview/longtable_pdbs_vs_ipdb.tex - results_overview/pdbs_vs_ipdbs.tex Files we currently keep where I am not sure if I want to recommend deleting them and wouldn't really mind either way: - ref-results (whole directory) - results/preprocess/PROBLEMS Files/directories that I would keep but that others might prefer to remove: - downward/search/lp/setup - downward/translate/no-invariants.patch - downward/translate/run-additive-hmax - src/search/border_cases (from the iPDB integration) - src/search/easy-instances (from issue600) - src/search/ext - src/search/testcases (from the iPDB integration) - src/translate/no-invariants.patch - src/translate/run-additive-hmax
msg9519 (view)	Author: malte	Date: 2020-07-07.12:15:21
Thanks, Florian! All sounds good. Yes, adding the config line to disable the sparse-revlog feature should address what I mentioned. I agree about keeping src/search/ext, but will have a closer look at it. I'm currently rerunning the cleanup based on the newest version and will look at the manifest of the converted repository next.
msg9511 (view)	Author: florian	Date: 2020-07-07.05:00:21
We use a specific Mercurial version installed in a venv because fast-convert is very particular about which version is compatible with which Mercurial version. One way to access the created mercurial repository would be to use the Mercurial installed in the venv but I guess that just moves problem further down the line. I added your patch to the script. Is this a sufficient solution for your use case? I wouldn't document how to delete/recreate the venv because under normal operation this should never be necessary. About the tags: there is one commit that adds an empty line to .hgtags which is later removed, I assume it is where the message about tag "" comes from. The tags "seipp-helmert" are added, moved, renamed, and deleted multiple times in issue600. In the converted git repo, they point to the final location of these tags in the Mercurial history, so it looks like the warnings can be ignored. issue551-base was also moved once. After the move it pointed to an empty "start branch" commit, which is removed by the conversion to git. In git it points to the parent of the empty commit which is what we want. issue794-base looks to be broken in the source repository already. I added a note about them to the readme. I also added a note about not supporting paths with spaces and fixed the usage example. I think this just leaves the final list of files we want to get rid of. About "src/search/ext": I would argue that the other files in here are less known than Boost, so it would be harder to find a compatible version later on. But I don't have a strong opinion on this and am fine with removing the others as well (except for the ones we are still using, of course). * About VAL: I added downward/VAL/* to the list of files to ignore (is snuck back in when we unignored ./downward). Here is the updated table after the changes (17MB in .git/objects): size pack SHA location 717 718 7ac3af46ce2dc615319200a589f810645d824949 downward/dist/archive/downward-2006-09-29.tar.gz 714 714 0ad45840a1a3905146ac7b5927e4123cd40dd869 downward/dist/archive/downward-2006-09-26.tar.gz 459 384 28a41ec16ee92fcd72f4389f697f6c47b5f9b30d src/dist/data/doc/fast-downward.pdf 418 34 1bf9638ce6d0976210361ecb127d1ddf0c0ecaf5 docs/prof_logistics_7-0_actual 417 34 5b3cb57d1cd9bcabdf524e9c4432ba7658113afa docs/prof_logistics_v3 400 158 96f3f49f6e3dabdda338b69f5d1ca536fe1b518f src/bugs/ff-no-preconditions/search 278 240 163a1ceeb56a870fdf9ce11ad6cc72b746f21aeb src/dist/data/doc/translator.pdf 231 57 27aaf7516e9a117b1110f6510221c08ee9da8d0b misc/autodoc/external/txt2tags.py 177 50 b5493b6d986443c7650e169eb160d0c4642d86a2 misc/cpplint.py 113 56 63b5e223e29cbb3bb6a41d366707b1c583971ccf src/bugs/safety-net/search 91 19 f520b9a70b579738039f04343a1128d8d3656280 downward/val/Validator.cpp 89 16 7b9cf1d23eb191fbec8c8f9033c432138915c309 downward/val/Proposition.cpp 87 14 e58a508f48231009276a9aa106046aca205bc9bb src/search/raz_abstraction.cc 84 5 201f338aa38eca7cf1a4fd1ccb75654b1e497c86 src/translate/regression-tests/issue49-orig-domain.pddl 82 11 15302fb44458148674d67d1e15e45eaef68561ce src/search/ext/optional.hh 81 12 b2f300535a7febfd36f8c605c2ad89c5a637d474 src/search/ext/tree.hh 80 17 e78b49bdc05e17d76429e10cb28276976c4e76a5 src/search/ext/btree/btree.h 72 5 5f1c513dc656fba6c9179cec29b29d8c88938225 src/bugs/psr-strips-derived-predicates/domain.pddl 71 13 72d6b85b24ac1695a2cdb9baf8522c75bdd244b5 downward/val/Events.cpp 67 4 845ccc0d02af34c5c25b12745c991b09daab011b driver/portfolios/seq_sat_remix.py And here it is after also excluding downward/val/, downward/dist/archive/, three profiles, src/bugs/*, and downward/bugs (15MB in .git/objects): size pack SHA location 459 384 28a41ec16ee92fcd72f4389f697f6c47b5f9b30d src/dist/data/doc/fast-downward.pdf 417 34 cfb10a2b5c87bde2b6b55a694853a49b650c64c1 docs/prof_logistics_7-0_v3 278 240 163a1ceeb56a870fdf9ce11ad6cc72b746f21aeb src/dist/data/doc/translator.pdf 231 57 27aaf7516e9a117b1110f6510221c08ee9da8d0b misc/autodoc/external/txt2tags.py 177 50 b5493b6d986443c7650e169eb160d0c4642d86a2 misc/cpplint.py 87 14 e58a508f48231009276a9aa106046aca205bc9bb src/search/raz_abstraction.cc 84 5 201f338aa38eca7cf1a4fd1ccb75654b1e497c86 src/translate/regression-tests/issue49-orig-domain.pddl 82 11 15302fb44458148674d67d1e15e45eaef68561ce src/search/ext/optional.hh 81 12 b2f300535a7febfd36f8c605c2ad89c5a637d474 src/search/ext/tree.hh 80 17 e78b49bdc05e17d76429e10cb28276976c4e76a5 src/search/ext/btree/btree.h 67 4 845ccc0d02af34c5c25b12745c991b09daab011b driver/portfolios/seq_sat_remix.py 66 2 fc153719b7bafadf96c949828f9e51dae9c16fdb src/translate/regression-tests/issue73-domain.pddl 63 12 fbea03f3dfebc74d03bea3957a644cac83b3706a src/search/merge_and_shrink/transition_system.cc 61 29 fa451a18288c242ab9dbc99e300ab5078dc624e9 misc/autodoc/external/txt2tags.py 57 11 d84bc16fc57c23cfa7e8dada24185c835cae94ba src/search/landmarks/landmarks_graph.cc 55 15 4937766fdd188bf13f510b9e9d1c5c2fa57e4b5c downward/search/Doxyfile 53 5 4a9f7ed1ae9a3cb2b9dd0c880b4bf214b493fb77 src/search/successor_generator.cc 42 9 89c6f4292e3a73db554bc392bdee2f26aa807d96 src/search/landmarks/landmark_factory.cc 38 8 642f109eadcd0295dc52c85d08053c2816152841 src/search/landmarks/h_m_landmarks.cc 36 8 66b86f656a53f4904bf212ea0eabdbba160d8e1b src/search/cegar/abstraction.cc One more profile that should probably go, but otherwise the list looks good to me.
msg9510 (view)	Author: malte	Date: 2020-07-07.01:51:14
One last thing for today: I converted the master to git using Patrick's latest version and played around a bit with Florian's "largest files in git" script. Some observations: A) My converted .git is 17 MB, almost all of it in "objects". All objects are packed. B) The list of largest files is similar to what Florian posted earlier. But slightly different. I think it is identical to what Florian posted in a Zoom discussion today: size pack SHA location 717 718 7ac3af46ce2dc615319200a589f810645d824949 downward/dist/archive/downward-2006-09-29.tar.gz 714 714 0ad45840a1a3905146ac7b5927e4123cd40dd869 downward/dist/archive/downward-2006-09-26.tar.gz 459 384 28a41ec16ee92fcd72f4389f697f6c47b5f9b30d src/dist/data/doc/fast-downward.pdf 418 34 1bf9638ce6d0976210361ecb127d1ddf0c0ecaf5 docs/prof_logistics_7-0_actual 417 34 5b3cb57d1cd9bcabdf524e9c4432ba7658113afa docs/prof_logistics_v3 400 158 96f3f49f6e3dabdda338b69f5d1ca536fe1b518f src/bugs/ff-no-preconditions/search 278 240 163a1ceeb56a870fdf9ce11ad6cc72b746f21aeb src/dist/data/doc/translator.pdf 231 57 27aaf7516e9a117b1110f6510221c08ee9da8d0b misc/autodoc/external/txt2tags.py 177 50 b5493b6d986443c7650e169eb160d0c4642d86a2 misc/cpplint.py 113 56 63b5e223e29cbb3bb6a41d366707b1c583971ccf src/bugs/safety-net/search 91 19 f520b9a70b579738039f04343a1128d8d3656280 downward/VAL/Validator.cpp 89 16 7b9cf1d23eb191fbec8c8f9033c432138915c309 downward/VAL/Proposition.cpp 87 14 e58a508f48231009276a9aa106046aca205bc9bb src/search/raz_abstraction.cc 84 5 201f338aa38eca7cf1a4fd1ccb75654b1e497c86 src/translate/regression-tests/issue49-orig-domain.pddl 82 11 15302fb44458148674d67d1e15e45eaef68561ce src/search/ext/optional.hh 81 12 b2f300535a7febfd36f8c605c2ad89c5a637d474 src/search/ext/tree.hh 80 17 e78b49bdc05e17d76429e10cb28276976c4e76a5 src/search/ext/btree/btree.h 72 5 5f1c513dc656fba6c9179cec29b29d8c88938225 src/bugs/psr-strips-derived-predicates/domain.pddl 71 13 72d6b85b24ac1695a2cdb9baf8522c75bdd244b5 downward/VAL/Events.cpp 67 4 845ccc0d02af34c5c25b12745c991b09daab011b driver/portfolios/seq_sat_remix.py C) Looking at that list: We already discussed that we probably want to get rid of the tar.gz archives (entries #1 and #2), of the profiles (entries #4 and #5), and of the executables in src/bugs (entries #6 and #10). I said I'd like to look into what exactly these files and directories are about, but I haven't done that yet. I assume we want to keep src/search/ext, but it's kind-of similar to the Boost libraries we're pruning, so perhaps someone wants to weigh in with their opinion if they would prefer to remove it. Didn't we want to remove VAL? There are several VAL files in the top 20 as we can see, and I think VAL also has many source files in addition to having some large ones. (But perhaps not that many, not sure.) D) We were surprised by some of the "size" and "pack" entries, e.g. the executables like src/bugs/safety-net/search being so small. Indeed, that file isn't just 113K. $ git cat-file -p 63b5e223e29cbb3bb6a41d366707b1c583971ccf \| wc -c 411891 BTW, these 411K are indeed the whole Fast Downward search executable at the time, and it wasn't even stripped. :-) I've dug into this a bit more deeply. The output of "git verify-pack", on which the numbers reported by the script are based, also includes information on which files a given file is deltified against. If we look at the two executables in the above list: 400 158 96f3f49f6e3dabdda338b69f5d1ca536fe1b518f src/bugs/ff-no-preconditions/search 113 56 63b5e223e29cbb3bb6a41d366707b1c583971ccf src/bugs/safety-net/search then closer inspection shows that the first one is not deltified, and the second one is deltified against the first. For the first one, the reported 400K exactly matches the actual file size of 410545 (which is the size that verify-pack shows) -- the output is in KiB, not KB, and it's rounded down. 410545 bytes is roughly 400.9 KiB. I assume that the reported 158 KiB are then the size of the file after compression, which roughly matches the amount of compression that bzip2 can achieve on this file (down to 136 KiB, compared to 158 KiB reported here by git), so sounds plausible: $ git cat-file -p 96f3f49f6e3dabdda338b69f5d1ca536fe1b518f \| bzip2 \| wc -c 139671 The second file is also roughly 400 KiB in size in truth, but like I said it's deltified against the first file by git. My guess is that the first number is the size of the uncompressed delta, and the second size is the size of the compressed delta. To test this hypothesis, I concatenated the two uncompressed files, ran bzip2 on the concatenation, and subtracted the compressed size of the first file alone from this. This should give us a rough estimate of the compressed delta size. The difference is 53438 bytes, which is reasonably close to the 56 KiB reported by git here. This also makes some sense for the top two entries in the large file list, where the packed size more or less matches the basic size: these are gzipped archives, and we can expect that such compressed archives cannot be compressed or deltified well. So I think this explanation makes sense, and if it does it gives us a better idea of how to interpret these numbers now. One consequence is that the sizes cannot be interpreted in isolation. We probably don't need to worry about this too much -- I think they still serve the purpose of identifying what the big space wasters are. But one consequence is that if we remove a very large waster according to this list, it may not actually give us any benefit at all unless we also remove all other files in the repository that are deltified against them. These other files won't show up as large wasters in the script while their deltification source is still present, but at least one of them will after we have removed it. But this can be addressed by iterating the removal of big wasters. And of course if we do this properly, when we remove a file, we will likely also remove the most closely semantically related files (e.g. removing all VAL source files, not just the ones that show up near the top), and these are also the likely deltification sources.
msg9509 (view)	Author: malte	Date: 2020-07-07.00:36:09
(All the following comments are about the hg-to-hg clean-up script, not the hg-to-git conversion script.) 1) I tried the clean-up script, and it ran successfully. :-) I then wanted to look a bit at the resulting hg repository. It doesn't have a checked-out working directory, and I assume this is intentional. But when I try to create one with "hg up", I get: abort: repository requires features unknown to this Mercurial: sparserevlog! (see https://mercurial-scm.org/wiki/MissingRequirement for more information) Same reply from "hg log" and "hg status". My guess is that the script uses a hg version that makes uses of a repository format that the hg version installed on my computer (running Ubuntu 18.04) doesn't understand. Any ideas what to do? I guess I need some venv-based solution that installs the same hg version that the conversion script uses? [Addendum: I found some answers below.] It looks like sparserevlog requires Mercurial >= 4.7, but Ubuntu 18.04 has Mercurial 4.5. Could the script use a 4.5 or older Mercurial version instead? [Addendum: probably not, see point 4 below.] 2) The clean-up script produces 156 errors on stderr, one about "" being an invalid tag entry and the rest about missing tag entries, referring to tags seipp-helmert@icaps-2013, seipp-helmert@icaps-2014, seipp-helmert-icaps-2013, seipp-helmert-icaps-2014, issue551-base and issue794-base. Are these expected errors? Then I think it would be good to mention this in the readme. 3) I tentatively changed the hg version in the seup script and reran it, but that failed because the venv was already set up and wasn't recreated properly with the (now different) hg version. I deleted the venv, but it wasn't clear that this needed to be done because it is ignored by "git status" and hidden away behind "data" (FWIW, I find "data" an odd spot for the venv). It might be useful to add some information to the README what needs to be done for a "clean" re-run. Then again, perhaps we don't need to cater to people that fiddle with the scripts. :-) Just wanted to share my experience in case it is helpful for others in a similar situation. 4) The script didn't work with the older hg version, perhaps because renaming_mercurial_source.py does some version-specific monkey-patching? In that case, is there a way to instruct hg to create a version that works with older Mercurials? 5) I played around with possible answers to #4. Creating a bundle at the end of the script with "hg bundle" that can then be unbundled with the older hg version is an indirect solution that works, but Section 2.3 of https://www.mercurial-scm.org/wiki/MissingRequirement suggested a more direct idea, namely disabling the sparse-revlog to start with. Specifically, the following diff seems to have done the trick for me: $ git diff diff --git a/run-cleanup.sh b/run-cleanup.sh index ca661ba..044e649 100755 --- a/run-cleanup.sh +++ b/run-cleanup.sh @@ -29,6 +29,7 @@ fi source "$VIRTUALENV/bin/activate" hg \ + --config format.sparse-revlog=0 \ --config extensions.renaming_mercurial_source="${BASE}/renaming_mercurial_source.py" \ convert $1 $2 \ --config extensions.hgext.convert= \ So I now have a cleaned up hg repository I can work with. :-) I started poking at it a little bit, but I'd rather poke at it some more before reporting details. I also converting a second (derived from master) repository, and the relationship between the two cleaned up repositories looked fine. 6) So that it doesn't get lost, let me repeat one earlier comment: The script fails when the source path includes a space. I don't think we need to fix it, but then the README should mention it as a limitation. (Looking at the script, I think the same will happen if the destination path contains a space.) 7) In the README, the usage looks a bit odd: Usage: ./run-cleanup.sh [MERCURIAL REPOSITORY] [CLEANED MERCURIAL REPOSITORY] ./run-conversion.sh [MERCURIAL REPOSITORY] [CONVERTED GIT REPOSITORY] ./run-cleanup-and-conversion.sh [MERCURIAL REPOSITORY] \ [CLEANED MERCURIAL REPOSITORY] \ [CONVERTED GIT REPOSITORY] I assume these are supposed to be indented to the same level, and -- more importantly -- shouldn't the first argument to run-conversion.sh be described as "CLEANED MERCURIAL REPOSITORY" instead?
msg9479 (view)	Author: patfer	Date: 2020-07-06.13:37:16
Jendrik and I chatted shortly about filtering the output of fast-export (yes, fast-export writes everything to stderr). Currently, we both think, we should not filter it and have a live discussion about this topic.
msg9478 (view)	Author: patfer	Date: 2020-07-06.12:49:16
- I split the scripts into 2 standalone scripts (run_cleanup.sh, run_conversion.sh) and one script that runs both one after the other (but we can also remove this one, everyone should be able to execute 2 scripts after another in bash) - the -x option is removed Question: I checked the stdout/stderr of fast-export. Is it for you the same that fast-export only prints to stderr? That is quite annoying
msg9472 (view)	Author: malte	Date: 2020-07-03.19:39:06
[This crossed with Florian's message, but now I have to leave for today.] Some initial general comments before we have a code review link set up: - If I understand correctly, the script only keeps the final git repository. I would prefer it to also keep the intermediate hg repository. I think it's really useful to debug things, and also for our final conversion I would actually like to keep that version of the hg repository. My suggestion would be to make it two top-level scripts: an hg cleanup one, and an hg to git conversion one. I don't think a script that does both is needed, and I think it's easier to understand what goes on if they are kept separate. - The script fails when the source path includes a space. I don't think we need to fix it, but then the README should mention it as a limitation. - I'm not sure the -x tracing output is a good idea, and I think Jendrik already requested removing this in an early code review. Do we really want to keep it? The output is very noisy. The fact that the tracing output is on stderr makes it even more distracting; I'll generally look at stderr for errors. - Also besides -x, there is lots of output, including many things about missing tags that look like errors. If we want others to use this and if this output is not an error, the README should perhaps point this out. But I think a better solution would be to try to make the output much less noisy, without swallowing actual errors. I get more output than the size of my shell history, so I couldn't even scroll back to the start to see all of it. This is something that is perhaps better discussed interactively.
msg9471 (view)	Author: jendrik	Date: 2020-07-03.19:33:07
You could just resolve the conversations on the pull request page: https://github.com/aibasel/convert-downward/pull/3/files
msg9470 (view)	Author: florian	Date: 2020-07-03.19:32:50
I couldn't close the old pull request (only Patrick can, I suspect) but I marked it as closed and could create a new one: https://github.com/aibasel/convert-downward/pull/4/files
msg9469 (view)	Author: florian	Date: 2020-07-03.19:29:20
I made the changes and added one more file to the exclude list: ./src/search/lp/coinutils-configure.patched is a patch for the osi installer that we removed. It is not useful without the installer and we also keep a copy around attached to the issue where the patch was created. I'll try to recreate the pull request.
msg9468 (view)	Author: malte	Date: 2020-07-03.19:20:39
Hi Patrick, from my side this is ready for final code review, but the code review I see behind the last link is already really cluttered with all the comments, and I think we cannot easily review this way. How can we start in a fresh way, does this require a new pull request? I agree with Florian about adding the aggressive gc. One thing I noticed is that the README file misspells Fast Downward twice (no hyphen).
msg9467 (view)	Author: malte	Date: 2020-07-03.19:11:29
I don't see the filter-branch command as dangerous in the context in which we use it. The problem with it is that it's easy to shoot yourself in the foot, but we keep backups of our feet and are inspecting them carefully before and after. Also, pruning empty commits is a really simple transformation and I think the documentation explains how the tool works in this case nicely. But I haven't reviewed the script yet. One thing I would recommend to anyone doing size comparisons is to do the "git clone with a file URL" trick. It avoids many potential pitfalls that can make size comparisons off, such as hardlinks. Cautionary tale, this may really surprise you: $ git clone grounder-2 grounder-3 Cloning into 'grounder-3'... done. $ du -sch grounder-2 828K grounder-2 828K total => OK! $ du -sch grounder-3 828K grounder-3 828K total => Makes sense! $ du -sch grounder-2 grounder-3 828K grounder-2 664K grounder-3 1,5M total => What? $ du -sch grounder-3 grounder-2 828K grounder-3 664K grounder-2 1,5M total => What??? Long story short, if you clone repositories this way, they will share certain files. "du" will account any shared files to whatever repository it looks at first, so depending on the order in which you look at them, the second one it looks at will appear smaller. But they are really identical. Now with using file URLs: $ git clone file:///home/helmert/repos/grounder-2 grounder-2b Cloning into 'grounder-2b'... remote: Counting objects: 490, done. remote: Compressing objects: 100% (214/214), done. remote: Total 490 (delta 256), reused 490 (delta 256) Receiving objects: 100% (490/490), 144.18 KiB \| 5.77 MiB/s, done. Resolving deltas: 100% (256/256), done. => With the file URL, this is handled like a clone from a remote, so no hardlinking (= sharing of files). $ helmert@skinny:~/repos$ du -sch grounder-2 grounder-2b 828K grounder-2 828K grounder-2b 1,7M total => OK! $ du -sch grounder-2b grounder-2 828K grounder-2b 828K grounder-2 1,7M total => OK!
msg9466 (view)	Author: florian	Date: 2020-07-03.19:00:16
The repository was a clone of something else (cloned from a path, not the file:// url) but not the other way around. Unfortunately, I recreated the repositories in the meantime to start from the cleanly converted repository again. This time, I used a version of the conversion script that included "git gc --aggressive" as a last step and now both repositories have around 30MB. The repository without the empty commits is still a bit larger but this time the difference is small and not in the unpacked objects. I guess my stubborn unpacked files that were ignored by repack must have been a problem that occurred through something I did while analyzing the repositories. If you think the use of git filter-branch is fine, I don't think it is worth digging deeper. I suggest to add the call to git gc --aggressive and exclude downward/validate and leave it at that.
msg9465 (view)	Author: malte	Date: 2020-07-03.18:29:35
There is also the possibility that the files are not removed because they are also referenced from a second repository via hardlinks. Does a clone of the repository exist? To get a clone with no sharing, clone with a "file" URL rather than a filename (path). For example, try "git clone file:///path/to/repo new-repo" and look at new-repo.
msg9464 (view)	Author: malte	Date: 2020-07-03.18:23:11
Garbage collections works the other way round. Everything is deleted until it is proven to be reachable. git has no notion of which data "should" be there other than the direct references (branches, tags, reflogs, which all point to one specific revision) and ancestry. Garbage collection marks everything reachable from these and deletes everything else. But all this isn't really relevant here. This is about the packed/unpacked relationship, which is an unrelated thing. Packed objects are added to packs and should then be deleted. That requires no reasoning about what the objects even are; they are just blobs of data. There isn't really a way in which this can be corrupted. Also, to be clear, I think the warnings on git filter-branch are really just the usual warnings about "if you leave something unreachable, it will be gone after gc", or relatedly, "if you delete some parents of a merge commit, weird things will happen". They are really about shooting yourself in the foot in the "normal" git ways, except that you as using a powertool here.
msg9463 (view)	Author: florian	Date: 2020-07-03.18:14:30
This may be one of the problems that using git-filter-branch brings. Maybe the files are no longer associated with the represented tree and thus not considered by git-repack? I'm fine with continuing this on Monday. But I would also be fine with dropping it: we know what those files are and where the difference in size between the repositories comes from. If we don't strip empty commits, and use garbage collection as a last step of the conversion, we get a very small repo. Is it worth the effort to get rid of the empty commits and save another 300KB?
msg9462 (view)	Author: malte	Date: 2020-07-03.18:09:06
Hmmm, worked for me. :-( (On another repository.) Perhaps we should really continue this in a Zoom session at some point, sorry for the slew of messages to everyone on this unusually large nosy list!
msg9461 (view)	Author: florian	Date: 2020-07-03.18:07:42
No dice, unfortunately. I tried that earlier and I thing "git gc" internally also uses git-repack. I tried it again as you suggested but the files remain there.
msg9460 (view)	Author: malte	Date: 2020-07-03.18:03:49
Elaborating a bit: I think the normal mode of operation is that all files added by git are initially added unpacked and then periodically packed once enough of them have been accumulated (or perhaps it's also time-based). The point of packing is to apply delta compression (i.e., two similar files will be smaller than the sum of their two sizes) and also to reduce file system wastage due to having lot of small size. Try "git repack -ad" followed by "git gc".
msg9459 (view)	Author: malte	Date: 2020-07-03.17:57:53
These files (just like the others) can be "packed", in which case they end up compressed in the pack. Some invocation of the "repack" command I mentioned a few messages down ought to help with that.
msg9458 (view)	Author: florian	Date: 2020-07-03.17:41:02
Looks like downward/validate is missing from the filemap. Patrick, can you add it? I looked closer at the files in .git/objects/xx/*. As this article (https://git-scm.com/book/en/v2/Git-Internals-Git-Objects) explains, you can view the content of file .git/objects/xx/yyyyy using "git cat-file -p xxyyyyy". It looks like these are the commit meta data. There are roughly 11000 of them, which matches with the number of commits and printing them shows things like this: ============ tree 0d0df310b249d01fbdfeb8ff3c0ebfefb675956d parent aca0d632b2a4fd69ee4deebd58ee0228ab01a0ae author Jendrik Seipp <jendrik.seipp@unibas.ch> 1493220240 +0200 committer Jendrik Seipp <jendrik.seipp@unibas.ch> 1493220240 +0200 [issue719] clean up token parser =========== The question is why this data is stored in this way in one but not both repositories.
msg9456 (view)	Author: malte	Date: 2020-07-03.17:19:13
Thanks, Florian! The output makes a lot of sense. Blobs and trees should be identical, and they are. Commits should be lower without the empty commits, and they are. Not 100% sure what "references" means, but I'd guess it's tags, branches and reflog (plus perhaps a few similar things). The order of magnitude makes sense for this, and it makes sense that it's the same number. The only number that differs where I have no clue what it means is "maximum history depth". Ah, perhaps that's the length of the longest path in the repository DAG. If that's the case, it makes sense that removing the empty commits reduces it. It looks from the output like the repository includes a binary of VAL (see Blobs/maximum size and footnote 4). Shouldn't that be pruned by the conversion? Or is this based on a non-cleaned-up hg repository?
msg9455 (view)	Author: florian	Date: 2020-07-03.17:11:25
I used git-sizer for msg9444 and saw no significant differences between the repositories. For completeness' sake, here is the output of ./../git-sizer/git-sizer --threshold 0 Processing blobs: 28874 Processing trees: 37661 Processing commits: 12147 Matching commits to trees: 12147 Processing annotated tags: 0 Processing references: 657 \| Name \| Value \| Level of concern \| \| ---------------------------- \| --------- \| ------------------------------ \| \| Overall repository size \| \| \| \| * Commits \| \| \| \| * Count \| 12.1 k \| \| \| * Total size \| 3.33 MiB \| \| \| * Trees \| \| \| \| * Count \| 37.7 k \| \| \| * Total size \| 49.4 MiB \| \| \| * Total tree entries \| 1.22 M \| \| \| * Blobs \| \| \| \| * Count \| 28.9 k \| \| \| * Total size \| 225 MiB \| \| \| * Annotated tags \| \| \| \| * Count \| 0 \| \| \| * References \| \| \| \| * Count \| 657 \| \| \| \| \| \| \| Biggest objects \| \| \| \| * Commits \| \| \| \| * Maximum size [1] \| 1.50 KiB \| \| \| * Maximum parents [2] \| 2 \| \| \| * Trees \| \| \| \| * Maximum entries [3] \| 171 \| \| \| * Blobs \| \| \| \| * Maximum size [4] \| 6.97 MiB \| \| \| \| \| \| \| History structure \| \| \| \| * Maximum history depth \| 4.60 k \| \| \| * Maximum tag depth \| 0 \| \| \| \| \| \| \| Biggest checkouts \| \| \| \| * Number of directories [5] \| 220 \| \| \| * Maximum path depth [6] \| 6 \| \| \| * Maximum path length [7] \| 72 B \| \| \| * Number of files [8] \| 1.50 k \| \| \| * Total size of files [9] \| 10.6 MiB \| \| \| * Number of symlinks \| 0 \| \| \| * Number of submodules \| 0 \| \| [1] dd349a6f7f19776bbfb25d432f681d91152e4891 [2] 140a15e7c437b712267a7ed1d95242c44217c81e [3] 1da1890c77269a7c30cf4e745d815a9ca5a78760 (refs/heads/main:experiments) [4] 271631eb6d7c7bd5fac7cd9ac7d3d765badb4b20 (9c2ff8dffd48e7ab58f9cae203d4e1140d8bef85:downward/validate) [5] 9c89de671e544cb9753ff292ca3f81a3ff864f46 (refs/heads/main^{tree}) [6] af0d711aaf96396d8b1ba71cd5122d7c4cf75fc6 (3746f02337a08355cb4c0a5d24470c4917c4ca86^{tree}) [7] ec21e3c1f977abf9a360534ecf90e167cf140288 (6d2aa8c5d3fc942531536f6a323ac9d2898e36f6^{tree}) [8] 0644851e90592fdbfac7fbe69885cd850325d7e5 (82cb6d4d7516ede94b85fe150a3035dfae4884e6^{tree}) [9] 9d240a799d3efe530fafa4db4348dea9604a0301 (7b5c6b8c73b590594f440e5514c46b0d628bc1f9^{tree}) ... and for the repo without empty commits: Processing blobs: 28874 Processing trees: 37661 Processing commits: 10877 Matching commits to trees: 10877 Processing annotated tags: 0 Processing references: 657 \| Name \| Value \| Level of concern \| \| ---------------------------- \| --------- \| ------------------------------ \| \| Overall repository size \| \| \| \| * Commits \| \| \| \| * Count \| 10.9 k \| \| \| * Total size \| 2.99 MiB \| \| \| * Trees \| \| \| \| * Count \| 37.7 k \| \| \| * Total size \| 49.4 MiB \| \| \| * Total tree entries \| 1.22 M \| \| \| * Blobs \| \| \| \| * Count \| 28.9 k \| \| \| * Total size \| 225 MiB \| \| \| * Annotated tags \| \| \| \| * Count \| 0 \| \| \| * References \| \| \| \| * Count \| 657 \| \| \| \| \| \| \| Biggest objects \| \| \| \| * Commits \| \| \| \| * Maximum size [1] \| 1.50 KiB \| \| \| * Maximum parents [2] \| 2 \| \| \| * Trees \| \| \| \| * Maximum entries [3] \| 171 \| \| \| * Blobs \| \| \| \| * Maximum size [4] \| 6.97 MiB \| \| \| \| \| \| \| History structure \| \| \| \| * Maximum history depth \| 4.09 k \| \| \| * Maximum tag depth \| 0 \| \| \| \| \| \| \| Biggest checkouts \| \| \| \| * Number of directories [5] \| 220 \| \| \| * Maximum path depth [6] \| 6 \| \| \| * Maximum path length [7] \| 72 B \| \| \| * Number of files [8] \| 1.50 k \| \| \| * Total size of files [9] \| 10.6 MiB \| \| \| * Number of symlinks \| 0 \| \| \| * Number of submodules \| 0 \| \| [1] dd349a6f7f19776bbfb25d432f681d91152e4891 [2] 01d8a6d4ffcfdf935621bd90b91fdba051336048 [3] 1da1890c77269a7c30cf4e745d815a9ca5a78760 (refs/heads/main:experiments) [4] 271631eb6d7c7bd5fac7cd9ac7d3d765badb4b20 (9c2ff8dffd48e7ab58f9cae203d4e1140d8bef85:downward/validate) [5] 9c89de671e544cb9753ff292ca3f81a3ff864f46 (refs/heads/main^{tree}) [6] af0d711aaf96396d8b1ba71cd5122d7c4cf75fc6 (f2b3ba73f838a1527c2f6bbf5b2d34f7940923e5^{tree}) [7] ec21e3c1f977abf9a360534ecf90e167cf140288 (826823841e18bf7bb09efe4b3a48d0df7424acec^{tree}) [8] 0644851e90592fdbfac7fbe69885cd850325d7e5 (11d5fb6b66340f3637eea04a3be813fa6dd6b02c^{tree}) [9] 9d240a799d3efe530fafa4db4348dea9604a0301 (7b5c6b8c73b590594f440e5514c46b0d628bc1f9^{tree})
msg9454 (view)	Author: malte	Date: 2020-07-03.17:09:26
[This is the reply.] OK, this does indeed seem to be related to packs. My understanding is that files can be available in packed state, unpacked state, or both. It looks like in one of the repositories, all files are available only in packed state, and in the other one it's a mixture. But this is something I don't currently know enough about and would like to radmore. A clean comparison (but perhaps not the most relevant one) could be made if none of the files are packed because then the data format is very clear and deterministic. I don't know enough about git to know if it's normal for most or all files to be packed or not, but I would assume that it makes sense for all "historical" files to be packed and only recent additions not to be. Anyway, I don't know a lot about this (yet), but would like to learn more about it anyway because I think it's useful git knowledge.
msg9453 (view)	Author: malte	Date: 2020-07-03.17:04:08
[This overlapped with Florian's message, so this is not a reply. I'll send a reply next.] PS: There is also the issue of packs (see "git repack" and "git prune-packed"), and I'm not actually sure what the "best" way to measure repository size is. The git-sizer tool looks useful: https://github.com/github/git-sizer/ I don't care very much about pruning empty commits or not, but I would like to get to the bottom (or at least a bit closer to the bottom) of the question of what our repository size is, no matter whether we prune empty commits or not, because that seems to be a good thing to consider right now when we still have some chance to influence it. @Florian: if you're interested, it might be interesting to look into these things together via zoom with a screen share so that we can pool ideas etc. But I don't have much time today, as I'm meeting friends at 19:00. It may have to wait until Monday. Would that be OK?
msg9452 (view)	Author: florian	Date: 2020-07-03.16:59:43
In the clone with empty commits, I get the following after garbage collection: $ ls .git/objects info pack $ ls -la .git/objects/pack -r--r--r-- 1 pommeren pommeren 2.2M Jul 3 16:25 pack-a12509d779c195adac8215625482cfd49a1e3a9f.idx -r--r--r-- 1 pommeren pommeren 14M Jul 3 16:25 pack-a12509d779c195adac8215625482cfd49a1e3a9f.pack For the clone without empty commits, I get: $ ls .git/objects 00 06 0c 12 18 1e 24 2a 30 36 3c 42 48 4e 54 5a 60 66 6c 72 78 7e 84 8a 90 96 9c a2 a8 ae b4 ba c0 c6 cc d2 d8 de e4 ea f0 f6 fc 01 07 0d 13 19 1f 25 2b 31 37 3d 43 49 4f 55 5b 61 67 6d 73 79 7f 85 8b 91 97 9d a3 a9 af b5 bb c1 c7 cd d3 d9 df e5 eb f1 f7 fd 02 08 0e 14 1a 20 26 2c 32 38 3e 44 4a 50 56 5c 62 68 6e 74 7a 80 86 8c 92 98 9e a4 aa b0 b6 bc c2 c8 ce d4 da e0 e6 ec f2 f8 fe 03 09 0f 15 1b 21 27 2d 33 39 3f 45 4b 51 57 5d 63 69 6f 75 7b 81 87 8d 93 99 9f a5 ab b1 b7 bd c3 c9 cf d5 db e1 e7 ed f3 f9 ff 04 0a 10 16 1c 22 28 2e 34 3a 40 46 4c 52 58 5e 64 6a 70 76 7c 82 88 8e 94 9a a0 a6 ac b2 b8 be c4 ca d0 d6 dc e2 e8 ee f4 fa info 05 0b 11 17 1d 23 29 2f 35 3b 41 47 4d 53 59 5f 65 6b 71 77 7d 83 89 8f 95 9b a1 a7 ad b3 b9 bf c5 cb d1 d7 dd e3 e9 ef f5 fb pack $ ls -la .git/objects/pack -r--r--r-- 1 pommeren pommeren 2.1M Jul 3 16:40 pack-f226f56ac61da073ac7184b8eb48c8105b88cf57.idx -r--r--r-- 1 pommeren pommeren 14M Jul 3 16:40 pack-f226f56ac61da073ac7184b8eb48c8105b88cf57.pack All of the other directories (00, 06, 0c, ...) contain ~40-50 files of roughly 200KB each.
msg9451 (view)	Author: malte	Date: 2020-07-03.16:48:21
Then it really does sound like something slightly fishy is going on. I read up on git's storage system last year, and it's really fairly straightforward. When comparing two repositories with the same file contents (forming the same union over all commits), the set of filenames in .git/objects should really be the same because .git/objects is ultimately a hash map indexed by the (hash sum of the) file contents of the files. If these sets of filenames are wildly different between two logically identically repositories, there is something really fishy going on. A discrepancy in the size of the files can potentially be explained by more/less successful delta compression. But if I understand you correctly the difference is that one of the repository has a lot more files? That's quite weird.
msg9450 (view)	Author: florian	Date: 2020-07-03.16:41:29
Expiring the reflogs did not affect the size, even when calling the garbage collection afterwards.
msg9449 (view)	Author: malte	Date: 2020-07-03.16:40:46
As long as the old (before pruning empty commits) commits are still reachable from your reflog, they are still alive and cannot be garbage-collected. After pruning the empty commits, you need to make sure that your HEAD is pointed at a "new" commit (i.e. in the part of the DAG that references the state after pruning the empty commits). You also must make sure that none of the commits before the conversion are referenced in the reflog. Normally they would be because we were looking at them before the conversion, and the reflog tracks what we have recently been looking at. You can expire the reflog using the info in msg9447. You can check that your reflog has been cleaned with "git reflog". (Ideally, run before and after expiring it to see the difference.) Only when none of the pruned commits are reachable any more should we run garbage collection.
msg9448 (view)	Author: florian	Date: 2020-07-03.16:35:01
"git gc --aggressive" reduced the size of both of my repositories (with/without removing empty commits). The one without empty commits still has the additional files in .git/objects, so the new sizes for me are 29MB (with empty commits) 77MB (without empty commits) I think it would make sense to call "git gc --aggressive" at the end of the conversion but also, I don't think it is necessary for us to dig down to the level of .git/objects. This seems to be internal state of the repository which goes up an down during the livetime of a repository. The garbage collection will be called from time to time anyway if people work with the repository.
msg9447 (view)	Author: malte	Date: 2020-07-03.16:34:36
See also here: https://stackoverflow.com/questions/49067898/git-remove-old-reflog-entries Sorry for the spam.
msg9446 (view)	Author: malte	Date: 2020-07-03.16:21:55
(I wasn't clear: the reflog thing and the --aggressive recommendation are unrelated points. "git help gc" has some more info.)
msg9445 (view)	Author: malte	Date: 2020-07-03.16:21:13
Garbage collection won't remove revisions that are still reachable through your reflog. That is fixable. Not sure if cloning removes the reflog, but if it does, cloning should resolve this. It's also worth trying the --aggressive option.
msg9444 (view)	Author: florian	Date: 2020-07-03.16:18:24
I did a couple of tests with and without removing empty commits and with and without cloning after the conversion. I could not reproduce the effect that the repository got smaller after cloning. Removing the empty commits actually made the repository bigger for me (83MB to 133MB). All additional files where in .git/objects and I suspect that this came out of the step removing the empty commits. I also tried calling "git gc" to remove unused data but it didn't change anything. Other than the additional files, the size seems to have gone down by ~300KB (measured with git-sizer and this matches the size change of .git/objects/pack). All in all, I think removing the empty commit is a risky step: the effect on the repository size seems to be unpredictable and the tool used to remove them (git filter-branch) is known to be fragile and to have a tendency to mess things up. I would opt for leaving the empty commits in the repository. What do you think?
msg9430 (view)	Author: patfer	Date: 2020-07-03.11:41:50
The scripts is in a good shape now. After multiple reviews, the current version can be seen at https://github.com/aibasel/convert-downward/pull/3 Only a final approval is missing and we can ship it.
msg9391 (view)	Author: malte	Date: 2020-07-01.20:26:21
Thanks, Jendrik! I'm done with my comments.
msg9389 (view)	Author: jendrik	Date: 2020-07-01.19:51:16
Here you go: https://github.com/aibasel/convert-downward/pull/3/files
msg9388 (view)	Author: malte	Date: 2020-07-01.19:44:56
I have a comment or two on the list. Do we have it in commentable form somewhere, e.g. as a pull request?
msg9366 (view)	Author: florian	Date: 2020-07-01.09:39:48
The impact of removing files is predicted well by their "packed" sized (second column), so in this case, removing the first 7 files would only save us 1.1MB. It's probably better to leave them in. Since those were the largest files after the conversion, does that mean that we can treat the list of removed files as final? Here is the current list, in case you want to check that it is not too agressive: https://github.com/aibasel/convert-downward/blob/master/data/downward_filemap.txt
msg9365 (view)	Author: malte	Date: 2020-06-30.19:36:31
I wouldn't go overboard with removing things from history for another megabyte or two. Once we've gone after the big fish, I think there is a point where it's best to call it done. If you remove the ones you suggest, what is the impact on repository size? Several of these involve some kind of breakage when removed. For example, the PDF files in "docs" are part of the documentation of these old versions, and I think they are referenced in READMEs etc. They are not incredible valuable, but it's still not a common thing to retroactively break old repository revisions, and we should really only do it if there is a substantial benefit. For example, if it's 54 or 57 KiB for the txt2tags changes, I'm not sure it's a good tradeoff. Some version of txt2tags is available elsewhere, sure, but removing them still means that the autodoc code in these old revisions that currently works (with the correct python versions etc.) will no longer work because txt2tags is expected to be there but not present, and there is no way to find out which version is needed.
msg9363 (view)	Author: jendrik	Date: 2020-06-30.18:13:53
I also think that the first seven entries can be removed. Also, we don't use cpplint anymore and can remove it. Txt2tags is available on the PyPI now, so we could remove our copy from the repo and install it on the buildbot (and in the tox environment that tests building the docs under different Python versions). Should I prepare a patch with that change?
msg9361 (view)	Author: florian	Date: 2020-06-30.17:03:33
The converted git repository has a size of 83.4 MB: * 71.8 MB in .git (see below) * 7.2 MB in experiments * 3.6 MB in code * 0.8 MB in driver, misc, etc. In the history (.git) the largest space users are: size packed location (size and packed size in kB) 459 384 src/dist/data/doc/fast-downward.pdf 418 34 docs/prof_logistics_7-0_actual 417 34 docs/prof_logistics_7-0_v3 402 159 src/bugs/safety-net/search 400 158 src/bugs/ff-no-preconditions/search 395 44 docs/prof_logistics_v2 278 240 src/dist/data/doc/translator.pdf 231 57 misc/autodoc/external/txt2tags.py 231 57 misc/autodoc/external/txt2tags.py 231 57 misc/autodoc/external/txt2tags.py 189 54 misc/autodoc/external/txt2tags.py 177 54 misc/cpplint.py 130 38 misc/cpplint.py Sorting by packed size shows more or less the same files in a different order and another profile (docs/prof_logistics_v3). The next largest files are actual code files and PDDL files from src/translate/regression-tests which we should keep. Should we remove anything from the list above? To me it looks like the first 7 entries could be deleted but as far as I know, we are still using txt2tags and cpplint, so we cannot remove them. For reference, here is the script I used to generate the data: https://stackoverflow.com/questions/10622179/
msg9143 (view)	Author: patfer	Date: 2020-01-10.16:14:51
The clean up script now also removes the branches issue323 & ipc-2011-fixes. Thank you Jendrik for uploading the ipc tarball. The cleanup script is ready for another reviewing: https://github.com/aibasel/convert-downward/pull/1/commits The conversion script is mostly finished for reviewing. Currently all branches closed in Mercurial are open. After we have decided on a Git workflow (I suggest this for the next FastDownward meeting): https://github.com/aibasel/convert-downward/pull/2
msg9142 (view)	Author: jendrik	Date: 2020-01-08.22:53:22
Done.
msg9141 (view)	Author: silvan	Date: 2020-01-08.22:49:35
Good idea.
msg9140 (view)	Author: malte	Date: 2020-01-08.19:17:54
Works for me.
msg9139 (view)	Author: jendrik	Date: 2020-01-07.19:30:28
What do you think about exporting the latest revision on branch ipc-2011-fixes and uploading the tarball to http://www.fast-downward.org/IpcPlanners ?
msg9135 (view)	Author: malte	Date: 2019-12-24.14:06:44
Pruning the issue323 branch isn't a problem. That code can live somewhere else until the issue is completed. Regarding the ipc-2011-fixes branch, if we remove it from the repository, we need to decide how/where to host it instead it and what to do with the documentation page that relates to it (http://www.fast-downward.org/IpcPlanners). Those in favour of removal, what is your suggestion?
msg9130 (view)	Author: patfer	Date: 2019-12-20.10:19:30
I agree with Jendrik about removing the branches ipc-2011-fixes and issue323 from the master repository. @Jendrik: They do not consume space, the size log is not up to date to the exclude files. Here is an update rev 24: 2400 Renamed 'downward' directory to 'src'. rev 5469: 116 Move small heuristics to subdirectory rev 785: 96 added first testcases rev 347: 92 Merged merge-and-shrink implementation by Raz rev 754: 92 Moved merge-and-shrink stuff to a subdirectory. rev 5386: 92 Move files according to new class names. rev 4846: 84 Move driver out of src rev 0: 80 moved everything to trunk rev 1881: 80 Moved auto_doc script to misc/autodoc. rev 5650: 76 Move utility files to subdirectories (no code chan... rev 787: 72 added border cases rev 964: 72 Added profiling information from comparison of ver... rev 1142: 72 Moved PDB code to subdirectory. rev 7846: 72 Replace merge dfp by a set of new classes.
msg9129 (view)	Author: jendrik	Date: 2019-12-19.23:57:15
@Florian: I think the branches you mention are merged into the default branch. So I guess the question is whether we want to rename them to the issueXXX format or not. I'm also in favor of moving the ipc-2011-fixes branch out of the master repo and propose to do the same for the issue323 branch. Why does the gtest commit ("Weave gtest into codebase") still consume so much space even after we exclude the src/search/ext/gtest directory?
msg9123 (view)	Author: florian	Date: 2019-12-19.16:41:13
I don't mind much either way but the branches emil-new-integration hcea-cleanup issue133new issue329test raz-ipc-integration all sound like things we planned to do but then gave up on or abandoned at some time. If this is the case I would prefer if we could decide if we still want to do the integration or not. In the first case, I would create an issue for it, rebase the branches to that issue, push the issue branches to a private repository and delete them from the main repository (e.g., deal with them in the same way we deal with all other incomplete issue branches). In the second case, I would delete them. But like I said, I don't mind if we decide to keep the branches around.
msg9121 (view)	Author: silvan	Date: 2019-12-19.16:38:07
I would vote for removing the ipc-2011-fixes branch because I don't think we necessarily need to have IPC planners as part of the official repository.
msg9120 (view)	Author: patfer	Date: 2019-12-19.16:32:39
After sprint summary: We have a script version for the clean which - removes large chunks of files we do not want anymore are excluded - corrects authors in commits - fixes typos in branch names and merges branches issue133 and issue133new If someone wants to get rid/clean up some of the remaining branches which are not in line with our naming convention, please speak up. If someone wants something else fixed, please speak up.
msg9096 (view)	Author: patfer	Date: 2019-12-12.15:21:03
Thank you Malte for clarification. Typos in branch names. I will fix them. That is easy to do.
msg9095 (view)	Author: malte	Date: 2019-12-12.15:12:02
> The script does not rework on typos (Malte wanted to correct some messages). No, we didn't want to change commit messages. I am confused where this idea came up. I think this makes no sense, there will always be errors in such messages. What we discussed to clean up are: - files we don't want to have any more; this includes unnecessary massive renames like the one that is now the largest commit by far - the author names that you mention - the misspelled branch names The clearly misspelled branch names are issu139, issue-114, issue-149, issue-289. I am still in favour of fixing these. Besides these, there are a few other branch names that don't follow any special convention (emil-new-integration, hcea-cleanup, issue133new, issue329test, raz-ipc-integration), and a few that follow a largely non-existing convention (ijcai-2011, ipc-2011-fixes). They don't bother me, but I mention them in case they bother someone else. You can see the list of all branch names like this: $ hg branches --closed \| sort \| less and filter out the ones that follow our issueXYZ convention like this: $ hg branches --closed \| grep -vE '^issue[1-9][0-9]{0,2} ' \| sort Not all of these are wrong branch names. For example, "default" and "release-19.06" are fine.
msg9094 (view)	Author: patfer	Date: 2019-12-12.14:04:20
Here is a PR and yes we can do comments. https://github.com/aibasel/convert-downward/pull/1/files.
msg9093 (view)	Author: silvan	Date: 2019-12-12.14:00:37
I would like to leave a few comments on github; do you know how I can do so? Do we need a pull-request first? Other than that, I think the script does what it should do. I don't see added value in correcting typos even if this could be done easily.
msg9092 (view)	Author: patfer	Date: 2019-12-12.12:54:56
A first script for rewriting history does: 1. fix the author names 2. removes files which should never have been added or which are large and unused for too long. The script does not rework on typos (Malte wanted to correct some messages). For this, we would need to write our own mercurial extensions which I started on my machine, but directly got issues (said something about mercurial version). I think we should not do this. Everyone who has Mercurial can now simply run the script. If we write our own extension, everyone has to add this to his/her configuration and might need to fix version problems. The script can be inspected on: https://github.com/aibasel/convert-downward/tree/2019-12-Cleanup The repo size decreases from 103 MB -> 28 MB The most space is now used up by the history of: 1.4 MB Merge and shrink 1 MB Cegar And the most space consuming commits are: rev 24: 2400 Renamed 'downward' directory to 'src'. rev 5844: 344 weave gtest into codebase rev 5469: 116 Move small heuristics to subdirectory rev 785: 96 added first testcases rev 347: 92 Merged merge-and-shrink implementation by Raz rev 754: 92 Moved merge-and-shrink stuff to a subdirectory. rev 4846: 88 Move driver out of src rev 5386: 88 Move files according to new class names. rev 0: 80 moved everything to trunk
msg5326 (view)	Author: malte	Date: 2016-05-10.16:12:09
I think keeping a read-only clone of the old repository is a good idea, and it only takes a few minutes to set that up. (I suggest we host it on bitbucket.) Let's do that at the time that perform the transition.
msg5325 (view)	Author: atorralba	Date: 2016-05-10.16:10:08
A relevant question is how to keep our repositories synchronized with the main FD repository after the history has been removed. As suggested by Malte, it would be useful to have a script that removes the history from a repository to make it compatible with the new version. Also, does it make sense to keep a copy of the old repository (just before removing the history) so that people can update to that one before removing the history?
msg5301 (view)	Author: malte	Date: 2016-05-03.14:08:10
Gabi suggested removing the preprocessor before we rewrite the history. (This might get done this week.) I've added her to the nosy list.
msg5300 (view)	Author: jendrik	Date: 2016-05-03.13:53:06
Sure, here they are (I'll attach the mentioned scripts to this issue): =================================================================== Hi everyone, we've discussed the idea of separating the benchmarks from the rest of the Fast Downward repository a few times. One advantage of this would be that we could make operations like cloning the repository faster (especially over the network) and reduce space usage significantly. This might also speed up certain operations where repository operations happen under the hood (e.g. lab experiments, buildbot stuff). I suggest that we defer the discussion of whether we want to perform such a separation until a time when we're all back from holidays and don't have so many urgent deadlines, but since I have a bit of time right now, I already gathered some data. Some basic size information: $ hg clone ~/downward/master/ master-clone -U && du -sk master-clone 71076 master-clone => This is the pure repository size in KiB without a working directory: around 70 MiB. $ hg update -R master-clone && du -sk master-clone 4636 files updated, 0 files merged, 0 files removed, 0 files unresolved 358416 master-clone => This is repository size with a working directory: around 351 MiB. $ (cd master-clone/src && ./build_all -j4) && du -sk master-clone [...] 736948 master-clone => This is repository size with a working directory after the code has been compiled (release mode only): around 720 MiB. So, in summary: - repository only, no working directory: 70 MiB - with a clean working directory: 351 MiB - after compiling for release mode: 720 MiB (This is without the USE_LP option.) I think this is already some good news regarding the actual repository size: the complete repository with several thousand revisions in it takes only 70 MiB, while a checkout of a single revision adds roughly 281 MiB. This shows that Mercurial compresses the revisions well. Of the 281 MiB added by the working directory, the main culprits are: 263M benchmarks 15M src and drilling a bit more deeply into src, we have: 1.3M src/search/ext/boost 1.8M src/VAL 8.1M src/search/lp so the bulk of the 15 MiB is due to our external dependencies. So benchmarks make up more than 90% of the working directory, and external dependencies make up more than half of what remains. It's also worth looking why space usage doubles after compilation. Our compilation artifacts use up around 370 MiB of memory. This is split as follows: 184M src/search/.obj 52M src/VAL/.o 30M src/search/downward-1 30M src/search/downward-2 30M src/search/downward-4 19M src/VAL/validate 19M src/validate 4.7M src/preprocess/.obj 3.7M src/preprocess/preprocess 444K src/search/Makefile.depend 88K src/VAL/lex.yy.cc 4K src/preprocess/Makefile.depend (We might add another 228K for the translator's bytecode. While build_all doesn't byte-compile, these will be present if we run the planner.) In cases where we are just interested in the output of compilation, but not in the intermediate build artifacts, such as in lab's revision cache, the .obj directories and everything in the VAL subdirectory is a dead weight. By running "make clean" in the preprocess and search directories and "make distclean" in the VAL directory, we can eliminate roughly 260 MiB, so more than two thirds, of the space usage. This would leave only: 30M src/search/downward-1 30M src/search/downward-2 30M src/search/downward-4 19M src/validate 3.7M src/preprocess/preprocess If we don't plan to run a debugger, we can again reduce these much further by stripping the executables. $ strip src/search/downward-{1,2,4} src/validate src/preprocess/preprocess After which we only have: 3.0M src/validate 2.7M src/search/downward-1 2.7M src/search/downward-2 2.7M src/search/downward-4 1.2M src/preprocess/preprocess So another reduction by almost 90%. If I understand the various debug facilities correctly, stripping would have no negative effect unless we start using something like gdb, so I think this is something else we ought to be doing in lab experiments (or, alternatively, change the compile flags in this case so that we get stripped executables from the start). I also had a look at what happens to executable sizes if we link dynamically instead of statically. We get the following file sizes before stripping: 29M src/search/downward-1 29M src/search/downward-2 29M src/search/downward-4 18M src/validate 2.5M src/preprocess/preprocess and these file sizes after stripping: 1.7M src/validate 1.6M src/search/downward-1 1.6M src/search/downward-2 1.6M src/search/downward-4 176K src/preprocess/preprocess So the difference percentage-wise is small if we don't strip the executables, but quite substantial (close to another 50% reduction) if we do. Cheers, Malte =================================================================== On 12.03.2014 16:46, Florian Pommerening wrote: > Hi Malte, > > I just wanted to add that the size for build artifacts and executables > will be reduced to roughly a third once we merge issue214. > > Cheers > Florian Yes, I had that in the back of my mind and am looking forward to it. :-) We currently have something like: 370 MiB total build artifacts 112 MiB after "make clean" for our code; "make distclean" for VAL (keeping the copy of the VAL executable in ~/src) 11 MiB after stripping After issue214, we should have roughly: 147 MiB total build artifacts 53 MiB after "make clean" for our code; "make distclean" for VAL 7 MiB after stripping Cheers, Malte =================================================================== Hi again, the previous message was more of a detour from the original plan. What I actually wanted to do is measure which things contributed to the actual repository size (not the working directory), since this is what limits the speed of clone operations over the network etc. So this email is only about the 70 MiB pure repository size. I wrote a script (attached) which measures which commits to the repository added how much to the overall size. The following list shows which changesets contributed 100K or more to the overall repository size. The middle column shows how much the repository grew with the given changeset (in KiB); the third column shows the start of the log message for that changeset. rev 1: 14480 moved everything to trunk rev 653: 10168 Renamed 'downward' directory to 'src'. rev 392: 7192 Added lp solver and change Makefile to do static l... rev 2774: 5560 add ipc 2011 domains rev 106: 4148 trunk benchmarks: rev 652: 3800 Added IPC 2008 domains (apart from the humungous c... rev 103: 2380 Added openstacks-strips to trunk benchmarks. rev 1472: 1176 Added Michael's implicit abstractions code. rev 18: 1020 benchmarks: added propositional IPC5 domains (rove... rev 2230: 1008 created experiments directory and added first expe... rev 1566: 816 added boost dependencies rev 2391: 668 added ga-experiments with cost partitioning rev 2393: 664 added ga-experiments without relevant vars detection rev 55: 624 Renamed "val" directory to "VAL". Firstly, because... rev 51: 620 Added val source distribution to repository. rev 1447: 616 Benchmarks from the 2009 ASP competition rev 2327: 596 Removed some old experiemnts, added new ones. rev 2370: 552 Added results of gapbd experiments. rev 19: 508 benchmarks: added airport-adl rev 2368: 396 added pdb experiments v4 and v4x rev 98: 292 pathways-noneg: rev 122: 288 Moved scripts for merge-and-shrink, hcea and lmcut... rev 2361: 284 Added results of two ipdb experiments. rev 20: 280 benchmarks: added decoded variants of "mystery" an... rev 2483: 264 added gapdb-experiment for compairison with ipdb-r... rev 118: 248 Recreated trunk from everything. rev 2243: 224 experiments rev 2348: 216 Added pdb experiment version 3, MatchTree is worki... rev 2283: 200 experiment pdb v2 results added rev 2718: 180 Hopefully fixed issue295. rev 2293: 176 added experiment pdb v1 withouth search rev 2337: 172 replaced old experiment by new one rev 2359: 172 added pdb experiment version 3, with the introduct... rev 2462: 164 added gapdb-comparison experiments rev 2282: 148 Added experiment v5 und v6. rev 2235: 144 Added experiments. rev 204: 132 Added selective-max. rev 2339: 132 Added results of an experiment. rev 2388: 132 new hhh-experiments (for comparison with old hsp_f... rev 2474: 132 added more gapdb configuration test experiments rev 1449: 124 Recommit of ASP's Airport and Sokoban rev 2087: 124 started rev 2319: 124 Added new experiment results. rev 2375: 120 Added results of HHH_ipdb experiments. rev 2456: 120 Added results of pdbs experiment. rev 2489: 120 Added results of ipdb experiment. rev 2445: 116 Added results of pdbs experiment. rev 743: 112 VAL: Updated to most recent version (4.2.07 plus some rev 2463: 112 Added results of hhh experiment. rev 2488: 112 Added results of pdbs experiment. rev 3613: 108 Removed scripts made redundant by updated buildbot... rev 2112: 100 added first testcases The first commit was the largest one and is hard to interpret because it included a mixture of code, benchmarks, etc. I hope that it would have been much smaller without the benchmarks. The second-largest one is surprising to me since I would have thought that Mercurial would handle such a renaming more gracefully. Maybe it has something to do with the fact that we originally renamed in Subversion, and maybe there is something that can be done about this space usage. Number 3 is about the LP solver, which is a large chunk of code and hence not very surprising. The remaining ones are mostly about adding benchmarks and some things that probably shouldn't be part of our history. Michael's implicit-abstractions code was never merged, but apparently uses a large chunk of space. Presumably we added it at some point and then deleted it again. Also, there are many commits about experiments related to the iPDB implementation. When we merged iPDB, in retrospect we should have been careful only to include the code, but not all the experiment data. (Note that it's the data* for the experiments that uses so much space here. We now add the experiments scripts, but that's orders of magnitude smaller.) So long story short, there seems to be scope for cleanup here if we ever want to go to that trouble. Cheers, Malte =================================================================== On 12.03.2014 17:04, Malte Helmert wrote: > The second-largest one is surprising to me since I would have thought > that Mercurial would handle such a renaming more gracefully. Maybe it > has something to do with the fact that we originally renamed in > Subversion, and maybe there is something that can be done about this > space usage. OK, I investigated this a bit more closely by performing some experiments and learning a bit about Mercurial's repository layout. The size increase due to renaming is indeed by design: if a file changes its name, hg will store it twice in the repository. The reason we added 10 MiB (around 1/7 of the overall repository size!) with this single renaming from "downward" to "src" is because the directory contains some large files (such as the LP solver tarball) that end up being stored twice. One nice thing about Mercurial's repository layout is that it's quite easy to see how much space is taken up by what by simply running "baobab .hg" and exploring. This also shows how much garbage the repository has accumulated over the years. :-) I think if we removed the benchmarks and external dependencies and cleaned up some cruft (scripts, new-scripts and code that ended up in the repository by historical accidents, such as experimental results and the implicit-abstraction code), we should be able to cut things down from 70 MiB to 7-8 MiB. I guess it's a discussion for another time whether this would be a good or bad idea. Cheers, Malte ===================================================================
msg5297 (view)	Author: malte	Date: 2016-05-03.13:43:35
I wrote some old emails about this long ago. They included discussion of some nonobvious space wasters, I think. Do you still have this email? If yes, can you paste them here? If not, I can also try to find them.
msg5296 (view)	Author: jendrik	Date: 2016-05-03.13:42:03
Now that the benchmarks are gone from the repo (issue581) and VAL is soon to be removed (issue651), we can think about making the repository smaller by rewriting its history. This should make the repository about 10x smaller and thus speedup cloning.

History
Date	User	Action	Args
2020-08-06 14:22:50	patfer	set	status: reviewing -> resolved
2020-08-06 13:45:56	malte	set	messages: + msg9711
2020-08-06 13:22:08	patfer	set	messages: + msg9710
2020-08-06 13:16:57	florian	set	messages: + msg9709
2020-08-06 13:06:06	malte	set	messages: + msg9708
2020-07-29 09:40:59	malte	set	messages: + msg9697
2020-07-29 09:27:36	patfer	set	messages: + msg9696
2020-07-28 19:37:47	malte	set	messages: + msg9687
2020-07-28 19:11:55	malte	set	messages: + msg9686
2020-07-28 15:46:34	patfer	set	messages: + msg9684
2020-07-28 14:20:29	florian	set	messages: + msg9683
2020-07-27 22:07:05	silvan	set	messages: + msg9682
2020-07-27 18:57:57	malte	set	messages: + msg9680
2020-07-27 17:23:18	patfer	set	messages: + msg9679
2020-07-27 17:22:05	malte	set	messages: + msg9678
2020-07-23 12:29:10	patfer	set	messages: + msg9669
2020-07-20 23:11:16	florian	set	messages: + msg9661
2020-07-20 15:08:32	malte	set	messages: + msg9660
2020-07-20 13:32:04	malte	set	messages: + msg9659
2020-07-20 10:37:57	patfer	set	messages: + msg9657
2020-07-18 21:49:11	florian	set	messages: + msg9656
2020-07-18 15:57:39	malte	set	messages: + msg9655
2020-07-18 15:51:55	florian	set	messages: + msg9654
2020-07-18 15:35:11	malte	set	messages: + msg9653
2020-07-18 15:30:24	malte	set	messages: + msg9652
2020-07-17 23:10:52	florian	set	messages: + msg9651
2020-07-17 13:58:30	patfer	set	messages: + msg9646
2020-07-16 17:07:25	malte	set	status: resolved -> reviewing messages: + msg9640
2020-07-16 13:27:58	patfer	set	messages: + msg9639
2020-07-16 12:39:05	malte	set	messages: + msg9635
2020-07-09 14:52:50	malte	set	messages: + msg9563
2020-07-09 11:21:32	patfer	set	status: in-progress -> resolved messages: + msg9547
2020-07-08 12:45:05	malte	set	messages: + msg9536
2020-07-08 10:52:58	silvan	set	messages: + msg9534
2020-07-08 10:50:26	malte	set	messages: + msg9533
2020-07-08 10:42:13	silvan	set	messages: + msg9532
2020-07-07 22:50:18	malte	set	messages: + msg9531
2020-07-07 17:18:24	malte	set	messages: + msg9530
2020-07-07 12:15:21	malte	set	messages: + msg9519
2020-07-07 05:00:21	florian	set	messages: + msg9511
2020-07-07 01:51:14	malte	set	messages: + msg9510
2020-07-07 00:36:09	malte	set	messages: + msg9509
2020-07-06 13:37:16	patfer	set	messages: + msg9479
2020-07-06 12:49:16	patfer	set	messages: + msg9478
2020-07-03 19:39:06	malte	set	messages: + msg9472
2020-07-03 19:33:07	jendrik	set	messages: + msg9471
2020-07-03 19:32:50	florian	set	messages: + msg9470
2020-07-03 19:29:20	florian	set	messages: + msg9469
2020-07-03 19:20:39	malte	set	messages: + msg9468
2020-07-03 19:11:29	malte	set	messages: + msg9467
2020-07-03 19:00:16	florian	set	messages: + msg9466
2020-07-03 18:29:35	malte	set	messages: + msg9465
2020-07-03 18:23:11	malte	set	messages: + msg9464
2020-07-03 18:14:30	florian	set	messages: + msg9463
2020-07-03 18:09:06	malte	set	messages: + msg9462
2020-07-03 18:07:42	florian	set	messages: + msg9461
2020-07-03 18:03:49	malte	set	messages: + msg9460
2020-07-03 17:57:53	malte	set	messages: + msg9459
2020-07-03 17:41:02	florian	set	messages: + msg9458
2020-07-03 17:19:13	malte	set	messages: + msg9456
2020-07-03 17:11:25	florian	set	messages: + msg9455
2020-07-03 17:09:26	malte	set	messages: + msg9454
2020-07-03 17:04:08	malte	set	messages: + msg9453
2020-07-03 16:59:43	florian	set	messages: + msg9452
2020-07-03 16:48:21	malte	set	messages: + msg9451
2020-07-03 16:41:29	florian	set	messages: + msg9450
2020-07-03 16:40:46	malte	set	messages: + msg9449
2020-07-03 16:35:01	florian	set	messages: + msg9448
2020-07-03 16:34:36	malte	set	messages: + msg9447
2020-07-03 16:21:55	malte	set	messages: + msg9446
2020-07-03 16:21:13	malte	set	messages: + msg9445
2020-07-03 16:18:24	florian	set	messages: + msg9444
2020-07-03 11:41:50	patfer	set	messages: + msg9430
2020-07-01 20:26:21	malte	set	messages: + msg9391
2020-07-01 19:51:16	jendrik	set	messages: + msg9389
2020-07-01 19:44:56	malte	set	messages: + msg9388
2020-07-01 09:39:48	florian	set	messages: + msg9366
2020-06-30 19:36:31	malte	set	messages: + msg9365
2020-06-30 18:13:54	jendrik	set	messages: + msg9363
2020-06-30 17:03:34	florian	set	messages: + msg9361
2020-01-10 16:14:52	patfer	set	messages: + msg9143
2020-01-08 22:53:22	jendrik	set	messages: + msg9142
2020-01-08 22:49:35	silvan	set	messages: + msg9141
2020-01-08 19:17:54	malte	set	messages: + msg9140
2020-01-07 19:30:29	jendrik	set	messages: + msg9139
2019-12-24 14:06:44	malte	set	messages: + msg9135
2019-12-20 10:19:31	patfer	set	files: + repo_stats_20_12_20.txt messages: + msg9130
2019-12-19 23:57:15	jendrik	set	messages: + msg9129
2019-12-19 16:41:13	florian	set	messages: + msg9123
2019-12-19 16:38:07	silvan	set	messages: + msg9121
2019-12-19 16:32:39	patfer	set	messages: + msg9120
2019-12-12 15:21:03	patfer	set	messages: + msg9096
2019-12-12 15:12:02	malte	set	messages: + msg9095
2019-12-12 14:04:20	patfer	set	messages: + msg9094
2019-12-12 14:00:37	silvan	set	messages: + msg9093
2019-12-12 12:54:56	patfer	set	messages: + msg9092
2019-12-11 11:12:20	patfer	set	status: chatting -> in-progress nosy: + patfer assignedto: patfer
2016-05-10 16:12:09	malte	set	messages: + msg5326
2016-05-10 16:10:08	atorralba	set	nosy: + atorralba messages: + msg5325
2016-05-03 14:08:10	malte	set	nosy: + gabi messages: + msg5301
2016-05-03 13:58:59	silvan	set	nosy: + silvan
2016-05-03 13:54:17	jendrik	set	files: + README
2016-05-03 13:54:09	jendrik	set	files: + generate_repo_stats.sh
2016-05-03 13:53:59	jendrik	set	files: + analyze_repo_stats.py
2016-05-03 13:53:06	jendrik	set	messages: + msg5300
2016-05-03 13:50:47	florian	set	nosy: + florian
2016-05-03 13:43:35	malte	set	status: unread -> chatting messages: + msg5297
2016-05-03 13:42:03	jendrik	create

Issue652