Issue652

Title Make repository smaller by rewriting its history
Priority wish Status resolved
Superseder Nosy List atorralba, florian, gabi, jendrik, malte, patfer, silvan
Assigned To patfer Keywords
Optional summary

Created on 2016-05-03.13:42:03 by jendrik, last changed by patfer.

Files
File name Uploaded Type Edit Remove
README jendrik, 2016-05-03.13:54:17 application/octet-stream
analyze_repo_stats.py jendrik, 2016-05-03.13:53:59 text/x-python
generate_repo_stats.sh jendrik, 2016-05-03.13:54:09 application/x-shellscript
repo_stats_20_12_20.txt patfer, 2019-12-20.10:19:30 text/plain
Messages
msg9711 (view) Author: malte Date: 2020-08-06.13:45:56
Thanks! As a final step, can we also create a tag, for example 1.1? Otherwise it is very inviting to download the only tag (1.0), and that version doesn't work. Once this is done, I would suggest marking this issue as resolved.
msg9710 (view) Author: patfer Date: 2020-08-06.13:22:08
issue5 is merged into the main branch.
all branches except the main branch are deleted (empty, pull-request-target, 2019-12-Convert2Git, 2019-12-Cleanup, issue5)
msg9709 (view) Author: florian Date: 2020-08-06.13:16:57
> Florian has approved the branch on github, though I don't know if this approval 
> is current.
Yes, nothing changed after my approval.
msg9708 (view) Author: malte Date: 2020-08-06.13:06:06
Hi Patrick, are you OK to merge the "issue5" branch of the convert repository now and tag a new release? I don't think the current master version (without the reordering of commits) really works, it would be good to fix this ASAP.

Florian has approved the branch on github, though I don't know if this approval is current.

BTW, to keep things tidy I would also recommend deleting the following branches on github: empty, pull-request-target, 2019-12-Convert2Git, 2019-12-Cleanup.
msg9697 (view) Author: malte Date: 2020-07-29.09:40:59
Yes, I can see them now.
msg9696 (view) Author: patfer Date: 2020-07-29.09:27:36
For me the changes show up in the pull request (both in the commit of the branch  and I see the changes in the file diff view).
Can you see the changes now?
msg9687 (view) Author: malte Date: 2020-07-28.19:37:47
The version with my changes works in my tests.

I tested it on one repository, and the history was compatible both after Mercurial clean-up and after git conversion. I've also tested the two-step and single-step way of conversion as well as the redirect option, and all resulting repositories look good.

I've pushed the change to the convert-downward repository, and the commit shows up in the commit list, but not in the pull request. Perhaps the repository owner needs to do something for this?
msg9686 (view) Author: malte Date: 2020-07-28.19:11:55
My understanding from the discussion between Patrick, Silvan, Jendrik and me was that Silvan, Jendrik and I thought that the reordering should be an implementation detail, so I agree with Florian's comment. I also agree with the other things he mentioned on the pull request. I've worked on a small revision that hopefully addresses these points and am currently testing it. If it works, I'll push it to github.
msg9684 (view) Author: patfer Date: 2020-07-28.15:46:34
The current signature of our run-cleanup.sh script is:
./run-cleanup.sh MERCURIAL_REPOSITORY ORDERED_REPOSITORY CLEANED_MERCURIAL_REPOSITORY

with ORDERED_REPOSITORY is created during the script and internally used to order the commits. 

I added it (when merging run-order.sh and run-cleanup.sh) to the signature, such that the user has the full control where to store it and to investigate it (it won't be deleted), especially if something is wrong.

As we enforce now that the user's repository contains already all commits from the latest Mercurial master, I do not know what could go wrong in the ordering. 

One of Florian suggestions was it to make this a temporary directory for the 'run-cleanup.sh' script (and remove it from its signature). I also dislike the current signature, but like to have every intermediate directory available to investigate if something is faulty.

What are your opinions?
msg9683 (view) Author: florian Date: 2020-07-28.14:20:28
I left a few comments on GitHub. One thing I find a bit strange is the way how the two steps (ordering and cleanup) were merged into one. The script for this merged step now takes two parameters for the two intermediate results. This makes it still feel like two steps instead of a single step.

For me this was the reason to argue for separate steps: each step would have one version of the repository as its output. In the current state, the first step generates two versions of the repository. I like this less than the previous version but from msg9669 it sounds like you already discussed this. If so, I'm fine with leaving things as they are.
msg9682 (view) Author: silvan Date: 2020-07-27.22:07:05
I tested the version from the branch for converting a repository and found no problems.
msg9680 (view) Author: malte Date: 2020-07-27.18:57:57
> Has someone done a review for the modification: "merge run-order.sh
> and run-cleanp.sh"
>
> Can I merge this into the main branch and tag version 1.1?

I think Florian said he'd have a look.
msg9679 (view) Author: patfer Date: 2020-07-27.17:23:18
I think all desired changes are incorporated into the pull request, but I do not know what the state of the reviews is.
Has someone done a review for the modification: "merge run-order.sh and run-cleanp.sh"
Can I merge this into the main branch and tag version 1.1?
msg9678 (view) Author: malte Date: 2020-07-27.17:22:05
With issue950 marked as resolved, I think this one should also be marked as resolved. I haven't looked at the code again and haven't tested the script, but given that nobody complained after the last round of revisions, I assume everybody is happy with the current version. If not, please speak up.

One final request from me: can we tag the current code as version 1.1 of convert-downward on github? The only tagged version is 1.0, so people might be tempted to use this. But I think we've found that it doesn't work reliably enough.
msg9669 (view) Author: patfer Date: 2020-07-23.12:29:10
We decided today that the steps 'ordering of the commits' and 'cleaning up the repository' will be merged into one step.
msg9661 (view) Author: florian Date: 2020-07-20.23:11:16
Fine with me. I don't think it's such a big deal either way.
msg9660 (view) Author: malte Date: 2020-07-20.15:08:31
We had a brief discussion over Zoom and decided to only support converting repositories that include all commits from hg.fast-downward.org. This will be checked at the start of the script to avoid long waiting times before an error is reported.

@Florian: if you would like this to be more general, also supporting repositories that don't have all commits, we can always develop the script further after our initial announcement and make further script releases at a later time. In our discussion, Jendrik, Silvan and I were against supporting this.
msg9659 (view) Author: malte Date: 2020-07-20.13:32:04
Florian is on holiday, and it doesn't look like we agree on how to proceed. Can we perhaps schedule a Zoom call today between those who are available and have participated in the discussion so far (Silvan, Jendrik, Patrick, Malte)? Of course others are also welcome to join. I'll write a message on Discord to schedule the call.
msg9657 (view) Author: patfer Date: 2020-07-20.10:37:56
If I understand this correct. You concluded that you do need no changes (multiple heads warning is already in the readme; missing branches-> readme tells to pull from hg.fast-downward.org).
I added the test suggested by Florian for the issue323 and ipc-2011-fixes branches.
@Florian: you said issue323 is not required? Does this mean, we could remove the check for this and skip the strip is issue323 is not present or shall we still enforce that issue323 is present?
msg9656 (view) Author: florian Date: 2020-07-18.21:49:10
Ok, I understand that view and I think the readme already covers it by saying that if you run into problems you should pull from official repository before conversion. I wasn't suggesting to add any functionality beyond that, just to add a check at the beginning that the required branches are there and to exit if they are not. So instead of running for 5 minutes and producing an incompatible repository we could immediately tell the user "this is not going to work". This would safe people with repositories like mine some work. Its not that important, though, so I'm fine with not having the check there.
msg9655 (view) Author: malte Date: 2020-07-18.15:57:39
> About the first issue: I don't think this has to do with whether the repository is > behind our final hg version. The situation I mean can be created by 
>
> hg clone hg.fast-downward.org -r default

I would view such a repository as behind our repository. It lacks commits from our repository. Basically, if

$ hg incoming http://hg.fast-downward.org

lists something, you're behind in the sense I meant it. In this case, you're missing 16 commits from the branches you mention as well as from the two release branches.
msg9654 (view) Author: florian Date: 2020-07-18.15:51:55
About the first issue: I don't think this has to do with whether the repository is behind our final hg version. The situation I mean can be created by 

hg clone hg.fast-downward.or -r default

This would give you a perfectly up-to-date repository but the conversion would fail because it relies on the branches issue323 and ipc-2011-fixes being there. They are open branches in the official repository that we delete after the conversion but the hashes are different if those branches are present during the conversion (well, the ipc branch at least, the other one doesn't matter).

Thanks for the advice on the other case. I'll give renaming the branch a try. I will try it with an extension to hg convert first and see if I can fix this on the Mercurial side. I already wrote an extension for hg convert, so I know where to look there and don't have to read up on the fast-export plugins. But it seems there are a lot of options if this fails.
msg9653 (view) Author: malte Date: 2020-07-18.15:35:11
> (as in 1.).

I meant "as in 2.".
msg9652 (view) Author: malte Date: 2020-07-18.15:30:24
> * If the two branches (issue323, ipc-2011-fixes) that will be deleted are not in the
> repository, the command to delete them will fail. For issue323, we could delete the branch
> only if it is present but ipc-2011-fixes has to be there, otherwise the converted
> repository will be incompatible. I suggest we check at the start of the cleanup step if
> both branches exist and exit with a suitable explanation if they do not. I wouldn't
> automatically pull them in the source repository because so far we do not modify the
> source of the conversion and I think that is a good property to have. We could pull them
> in an additional intermediate step or exclude them from the strip but I don't think it's
> worth the effort.

As I said before, I think it's a bad idea to try to support converting repositories that are behind our final hg repository. If you (and Patrick, I think?) think it's important, I don't have a preference regarding what exactly it does. So in that sense the suggestion works for me.

> In my case this was the repository of Metis where the code builds on some implementation of
> strong stubborn sets which was started on the default branch and only later on used its own
> branch. I actually do not know how to fix this. Is there a way to rename a branch for only
> some commits? 

There is no particularly easy way to actually change the branch of these commits because you'd have to rewrite history for all descendants of the changed commits.


Three options:

1. You could use a filter in fast-export to change the branch name of the relevant commits before the conversion tool sees it. It doesn't look very complicated (see "Plugins" on https://github.com/frej/fast-export).


2. Before conversion, you could merge the additional head of default into the head of default you like, keeping all code of the head you like. If the head you like is the first parent, you can use the merge-tool ":local" to keep the version you like for all modified files (analogous to git's "ours" merge strategy). Something like (only tested a little):

$ hg update the-head-i-want
$ hg merge the-head-i-don't-want --tool :local

This will still add the files that only exist in the other head to the merge commit, so you will have to "hg rm --force" remove them before committing. (Similarly for files that are deleted without conflict in the other head, but that's perhaps less likely.) use hg diff -r the-head-i-want to make sure you have an empty diff before committing the merge.


3. There must be *some* cases of multiple heads on the same branch that don't bother fast-export because our cleaned up hg.fast-downward.org repository actually *has* multiple heads on several branches, and these convert without issue:

[on the cleaned-up repository]
$ hg heads --closed | grep ^branch | sort | uniq -cd | sort -n
      2 branch:      issue104
      2 branch:      issue114
      2 branch:      issue120
      2 branch:      issue139
      2 branch:      issue170
      2 branch:      issue172
      2 branch:      issue181
      2 branch:      issue203

These branches are all closed, and they are not topological heads. Not sure which of these two properties fast-export cares about; perhaps both. If your additional head is not closed, just update to it and close it with "hg commit --close-branch." Unfortunately this definitely creates a new topological head, so if fast-export doesn't like additional topological heads even when they are closed, you'll have to merge it into something (as in 1.).
msg9651 (view) Author: florian Date: 2020-07-17.23:10:52
I tested the code on 17 of my repositories and found two more issues:

* If the two branches (issue323, ipc-2011-fixes) that will be deleted are not in the repository, the command to delete them will fail. For issue323, we could delete the branch only if it is present but ipc-2011-fixes has to be there, otherwise the converted repository will be incompatible. I suggest we check at the start of the cleanup step if both branches exist and exit with a suitable explanation if they do not. I wouldn't automatically pull them in the source repository because so far we do not modify the source of the conversion and I think that is a good property to have. We could pull them in an additional intermediate step or exclude them from the strip but I don't think it's worth the effort.

* The problem that cost way more time to track down was that two of my repositories have an additional head on default. The ordering and cleanup worked as expected there but the git conversion does something with the additional head that I don't understand. Some issue branches were duplicated and could then not be removed because they were "not fully merged". In my case this was the repository of Metis where the code builds on some implementation of strong stubborn sets which was started on the default branch and only later on used its own branch. I actually do not know how to fix this. Is there a way to rename a branch for only some commits? 

The problematic thing about the second case is that the conversion script finishes fine and reports that the conversion was successful. The errors that happen on the way are the ones we tell people to ignore. If multiple heads on the same branch are not supported as the readme says, we should check for this and exit the script.


Also, I mentioned this by email already but for the record:
Ceterum censeo refs/original esse delendam.
https://stackoverflow.com/a/7654880/892961
msg9646 (view) Author: patfer Date: 2020-07-17.13:58:30
The -at that time- comments from Florian and Jendrik have been incorporated.
The code is ready for the next and hopefully last round of reviewing.
msg9640 (view) Author: malte Date: 2020-07-16.17:07:25
Any volunteers to review and/or test the code? We want to point to the conversion script when we announce the new release, so this should be looked at for the release.
msg9639 (view) Author: patfer Date: 2020-07-16.13:27:58
Indeed we used the GitHub issuetracker (mostly to get an issue number:).
https://github.com/aibasel/convert-downward/issues/5

We detected that the order in which the changesets are in the history is important. Thus, we need to ensure that the repository to convert has all commits in a valid order.


1. clone the Mercurial master repository anew (do not allow the user to use a local version, because until recently the clone from hg.fast-downward.org was also in a wrong order)
2. strip away new commits from master repository (important if the user has made commits on the default branch)
3. pull the user repository in the stripped master repository

PR: https://github.com/aibasel/convert-downward/pull/6
msg9635 (view) Author: malte Date: 2020-07-16.12:39:05
We've detected some further things we would like to change in the clean-up (this issue) and conversion (issue950) scripts to ensure that the clean-up results in compatible history.

I see that Patrick is using the github issue tracker of convert-downward to track this, but I think not all of you receive notifications from this, so perhaps you want to have a look and hop over.

I'd be happier to keep things in one place, but that's not for me to decide. But we should always tell people when we're moving the discussion to a new place.
msg9563 (view) Author: malte Date: 2020-07-09.14:52:50
Thanks for all your work on this, Patrick, and also thanks to the others who contributed!
msg9547 (view) Author: patfer Date: 2020-07-09.11:21:32
I added a tag for version v1.0
msg9536 (view) Author: malte Date: 2020-07-08.12:45:05
Patrick has made the convert-downward repository public, and we have agreed to tag a release for the version we have used for the conversion. Patrick, can you create the tag?
msg9534 (view) Author: silvan Date: 2020-07-08.10:52:58
I fixed the address and I'm happy to leave the rest as it is.
msg9533 (view) Author: malte Date: 2020-07-08.10:50:26
Good catch, we should change Manuel's email address to the one without "stud".

Regarding which email addresses to use in general and more specifically for Manuela: I think it's not a problem to use a legacy email address that doesn't work any more for people that contributed in the past. For Emil, it's a bit different because we previously had no email address mentioned at all for him. The email address we have now added for him is also the one we would have used back when he made his commits.

I'm not strongly against contacting Manuela, but it would open a bit of a can of worms, as I think the same rationale would apply to Cedric, Manuel (even after removing "stud"), Martin and Moritz Gö. The used email address for Manuela reflects the situation when she contributed to Fast Downward, so from my perspective it's fine.
msg9532 (view) Author: silvan Date: 2020-07-08.10:42:13
The comparison of manifests looks good to me.

Yesterday, I left two comments regarding the authors which seem to got lost: I saw that Manuel's email address is mapped from manuel.heusner@unibas.ch to manuel.heusner@stud.unibas.ch and found this surprising. Is this intended? For Manuela, we use ortlieb@informatik.uni-freiburg.de where I'm not sure if this is a valid email address anymore (of course, it doesn't need to be, but for others like Emil we contacted people and asked, so we could do the same here).
msg9531 (view) Author: malte Date: 2020-07-07.22:50:18
Patrick, Florian, Jendrik and I discussed this further on Discord and Zoom, and Florian and Jendrik have pushed some further commits afterwards. We now additionally exclude all the files from msg9530 except for results/preprocess/PROBLEMS (which I had in the "maybe" list) and the last list of files which I listed as candidates but recommended keeping.

Jendrik changed the handling of branches merged into other branches not merged into main.

I again compared "hg manifest --all" before and after cleanup in meld, and I think it now looks very nice. You can have a look with

    meld <(hg manifest --all -R master) <(hg manifest --all -R master-cleaned-up)

assuming the before/after cleanup repositories are "master" and "master-cleaned-up" in the current directory.

The list of users with commits in master-cleaned-up looks clean, too. You can check it with

    hg log | grep ^user | sort | uniq -c

from the cleaned-up hg repository.

The code makes sense. The working copy on the head of default/main before/after cleanup only differs in ~/.hgtags, which is at it should be.

I didn't do much to check the branch structure or commit messages, but at a glance they make sense.

Compared to yesterday's script version, the final .git directory size is down from 17232 KiB to 15060 KiB, which is nice.

I didn't do much to test the git repository, but if everyone else is happy with it, then I am too.

Some statistics:

$ git-sizer --threshold=0
Processing blobs: 28632                        
Processing trees: 37451                        
Processing commits: 22441                        
Matching commits to trees: 22441                        
Processing annotated tags: 0                        
Processing references: 1310                        
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |  22.4 k   |                                |
|   * Total size               |  6.14 MiB |                                |
| * Trees                      |           |                                |
|   * Count                    |  37.5 k   |                                |
|   * Total size               |  49.2 MiB |                                |
|   * Total tree entries       |  1.22 M   |                                |
| * Blobs                      |           |                                |
|   * Count                    |  28.6 k   |                                |
|   * Total size               |   213 MiB |                                |
| * Annotated tags             |           |                                |
|   * Count                    |     0     |                                |
| * References                 |           |                                |
|   * Count                    |  1.31 k   |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |  1.50 KiB |                                |
|   * Maximum parents      [2] |     2     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |   171     |                                |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |   460 KiB |                                |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |  4.54 k   |                                |
| * Maximum tag depth          |     0     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [5] |   220     |                                |
| * Maximum path depth     [6] |     6     |                                |
| * Maximum path length    [7] |    72 B   |                                |
| * Number of files        [8] |  1.50 k   |                                |
| * Total size of files    [9] |  6.80 MiB |                                |
| * Number of symlinks         |     0     |                                |
| * Number of submodules       |     0     |                                |

[1]  fba4b4dd03e1193ab081a2a1a7f0bbe053d5bb38
[2]  5c9e027316e88ef6eee251c907e506218f4c4324
[3]  526c630c44922a716f2fae47a79b9dad820207c7 (refs/heads/main:experiments)
[4]  28a41ec16ee92fcd72f4389f697f6c47b5f9b30d (b28f422155c919a0ab2b7a1c057d703c19fab125:src/dist/data/doc/fast-downward.pdf)
[5]  43d7171afef8fcc22ce88567ce27fe97fd503427 (refs/heads/main^{tree})
[6]  af0d711aaf96396d8b1ba71cd5122d7c4cf75fc6 (effc886eb88892099e13283fc6b61213531faf65^{tree})
[7]  ec21e3c1f977abf9a360534ecf90e167cf140288 (b502abec56be63fa100966ae9bcde269a545e266^{tree})
[8]  d0bb5c4140228aa3c29c7437e8330611e0cf0d2d (5c9e027316e88ef6eee251c907e506218f4c4324^{tree})
[9]  0644851e90592fdbfac7fbe69885cd850325d7e5 (cc2d79c787382edef1446da766add08af188f49c^{tree})


top 20 largest files by *packed* size, bearing in mind that for deltified files, we only see the size of the delta, not the size of the file itself:


All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file.
size  pack  SHA                                       location
459   384   28a41ec16ee92fcd72f4389f697f6c47b5f9b30d  src/dist/data/doc/fast-downward.pdf
278   240   163a1ceeb56a870fdf9ce11ad6cc72b746f21aeb  src/dist/data/doc/translator.pdf
231   57    27aaf7516e9a117b1110f6510221c08ee9da8d0b  misc/autodoc/external/txt2tags.py
177   50    b5493b6d986443c7650e169eb160d0c4642d86a2  misc/cpplint.py
61    29    fa451a18288c242ab9dbc99e300ab5078dc624e9  misc/autodoc/external/txt2tags.py
80    17    e78b49bdc05e17d76429e10cb28276976c4e76a5  src/search/ext/btree/btree.h
55    15    4937766fdd188bf13f510b9e9d1c5c2fa57e4b5c  downward/search/Doxyfile
87    14    e58a508f48231009276a9aa106046aca205bc9bb  src/search/raz_abstraction.cc
81    12    b2f300535a7febfd36f8c605c2ad89c5a637d474  src/search/ext/tree.hh
63    12    fbea03f3dfebc74d03bea3957a644cac83b3706a  src/search/merge_and_shrink/transition_system.cc
57    11    d84bc16fc57c23cfa7e8dada24185c835cae94ba  src/search/landmarks/landmarks_graph.cc
34    11    2fb2e74d8d7fa1c9286b18af0afa5c00402f56e3  LICENSE.md
82    11    15302fb44458148674d67d1e15e45eaef68561ce  src/search/ext/optional.hh
42    9     89c6f4292e3a73db554bc392bdee2f26aa807d96  src/search/landmarks/landmark_factory.cc
34    8     c7b764240bf12690cf5cd332d005c8c473f475e0  src/translate/translate.py
38    8     642f109eadcd0295dc52c85d08053c2816152841  src/search/landmarks/h_m_landmarks.cc
36    8     66b86f656a53f4904bf212ea0eabdbba160d8e1b  src/search/cegar/abstraction.cc
32    7     fc89e588d16fdf09b6c2c685ab67f1023e0adf45  src/search/merge_and_shrink/merge_and_shrink_heuristic.cc
35    7     af25e562c9016ed0891e9605acbe025cb0f1c840  src/search/landmarks/landmark_factory_h_m.cc
14    7     526b9556dcdc3b49403242d132198d78dbcf87af  misc/cpplint.py


top 20 largest files by *unpacked* size, bearing in mind that for deltified files, we only see the size of the delta, not the size of the file itself:

All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file.
size  pack  SHA                                       location
459   384   28a41ec16ee92fcd72f4389f697f6c47b5f9b30d  src/dist/data/doc/fast-downward.pdf
278   240   163a1ceeb56a870fdf9ce11ad6cc72b746f21aeb  src/dist/data/doc/translator.pdf
231   57    27aaf7516e9a117b1110f6510221c08ee9da8d0b  misc/autodoc/external/txt2tags.py
177   50    b5493b6d986443c7650e169eb160d0c4642d86a2  misc/cpplint.py
87    14    e58a508f48231009276a9aa106046aca205bc9bb  src/search/raz_abstraction.cc
84    5     201f338aa38eca7cf1a4fd1ccb75654b1e497c86  src/translate/regression-tests/issue49-orig-domain.pddl
82    11    15302fb44458148674d67d1e15e45eaef68561ce  src/search/ext/optional.hh
81    12    b2f300535a7febfd36f8c605c2ad89c5a637d474  src/search/ext/tree.hh
80    17    e78b49bdc05e17d76429e10cb28276976c4e76a5  src/search/ext/btree/btree.h
72    5     5f1c513dc656fba6c9179cec29b29d8c88938225  src/bugs/psr-strips-derived-predicates/domain.pddl
67    4     845ccc0d02af34c5c25b12745c991b09daab011b  driver/portfolios/seq_sat_remix.py
66    2     fc153719b7bafadf96c949828f9e51dae9c16fdb  src/translate/regression-tests/issue73-domain.pddl
63    12    fbea03f3dfebc74d03bea3957a644cac83b3706a  src/search/merge_and_shrink/transition_system.cc
61    29    fa451a18288c242ab9dbc99e300ab5078dc624e9  misc/autodoc/external/txt2tags.py
57    11    d84bc16fc57c23cfa7e8dada24185c835cae94ba  src/search/landmarks/landmarks_graph.cc
55    15    4937766fdd188bf13f510b9e9d1c5c2fa57e4b5c  downward/search/Doxyfile
53    5     4a9f7ed1ae9a3cb2b9dd0c880b4bf214b493fb77  src/search/successor_generator.cc
42    9     89c6f4292e3a73db554bc392bdee2f26aa807d96  src/search/landmarks/landmark_factory.cc
38    8     642f109eadcd0295dc52c85d08053c2816152841  src/search/landmarks/h_m_landmarks.cc
36    8     66b86f656a53f4904bf212ea0eabdbba160d8e1b  src/search/cegar/abstraction.cc


This looks all looks fine for me. From my side, this is ready to use. To emphasize, I haven't really tested the actual hg to git conversion in terms of investigating the resulting git repository. But if the others are happy to move forward, so am I.


We've also discussed what to do about our current hg repository on airepos and on hg.fast-downward.org once we convert. The idea is to leave hg.fast-downward.org alone for quite some time to give people time to transition. We haven't settled on an exact amount yet, and perhaps we want to discuss this in tomorrow morning's sprint meeting.

Keeping hg.fast-downward.org around for a while doesn't cause us any effort, and the repository is read-only anyway. But it will also be nice to be able to switch it off eventually and have one less thing to maintain.

We will set the repository on airepos to read-only. This is already prepared, we just need to uncomment a hook in ~/.repo/repositories/ai/downward/.hg/hgrc to flip the switch.
msg9530 (view) Author: malte Date: 2020-07-07.17:18:24
I looked at "hg manifest --all" after the cleanup to look for
additional files that I would remove.

To also check that we are not deleting too much, I also looked at "hg
manifest --all" before the cleanup and looked at the diff between the
two manifests in meld. I think was interesting, so perhaps others want
to try that too to double-check. But I didn't find any deletes that I
wouldn't delete.


List of files in the cleaned-up repository that I recommend removing:

- the whole docs directory, consisting of:
  - docs/GA-NOTES
  - docs/Haslum_Bemerkungen.txt
  - docs/Haslum_Vergleich-mit-Paper
  - docs/Pattern-Selektion (Edelkamp)
  - docs/Profiling_Haslum
  - docs/more iPDB questions 1
  - docs/more iPDB questions 2
  - docs/more iPDB questions 3
  - docs/prof_logistics_7-0_actual
  - docs/prof_logistics_7-0_v3
  - docs/prof_logistics_v2
  - docs/prof_logistics_v3
  (Not to be confused with downward/doc and some other "doc"
  directories.)
  I think these are all remnants of the iPDB code integration that
  should not have been merged, similarly to the iPDB evaluation
  results that we are already patially removing.
  Note: quotes are needed in the config file to deal with the
  filenames in this list that contain spaces. I don't know.

- downward/dist/archive/downward-2006-09-26.tar.gz
- downward/dist/archive/downward-2006-09-29.tar.gz
  - We already discused these. Previously I said I'd look into
    possible inclusion in "downward-ancient", but based on the commit
    message for a3fd3d95fca2 (in the unconverted repository), I think
    they can just go.

- temp files that were accidentally committed:
  - src/all.groups
  - src/output
  - src/output.sas
  - src/sas_plan
  - src/test.groups

- src/validate: a VAL wrapper script that we used temporarily; doesn't
  make much sense any more if we remove VAL, I think

- downward/bugs/ff-no-preconditions/search
- downward/bugs/safety-net/search
- src/bugs/ff-no-preconditions/search
- src/bugs/safety-net/search
  - We already discussed two of these, and the other two are
    identical.

- Under experiments and results_overview, I think that most or all of
  the files related to the original iPDB experiments should not be
  retained. If you look at "hg manifest --all" before and after the
  clean-up, you'll see all the partial ignores for these. I think it's
  better to remove these altogether rather than in the current more
  selective way. (It's not that these experiments could be easily
  rerun; I think they depend on the "scripts" or "new-scripts" that we
  remove.)

  Here is the list of additional files I would prune in this category:
  - experiments/IPDB-Vergleiche/IPDB_Vergleich.tex
  - experiments/IPDB-Vergleiche/IPDB_max_cliques.tex
  - experiments/IPDB-Vergleiche/get_tex_table.py
  - experiments/IPDB-Vergleiche/get_tex_table_downward_only.py
  - experiments/IPDB-Vergleiche/longtable_ipdb.tex
  - experiments/IPDB-Vergleiche/longtable_ipdb_max_cliques.tex
  - experiments/IPDB_Vergleich.tex
  - experiments/PDB-Experimente-Auswertung/PDB_Experimente.tex
  - experiments/PDB_Experimente.tex
  - experiments/gapdb_comparison_20110530/INFO
  - experiments/gapdb_comparison_extended_20110531/INFO
  - experiments/gapdb_comparison_haslum_20110604/INFO
  - experiments/gapdb_v0_20110414/INFO
  - experiments/gapdb_v0_20110505/INFO
  - experiments/gapdb_v1_20110421/mo-gapdb_1.sh
  - experiments/gapdb_v1_20110505/INFO
  - experiments/gapdb_v2_20110421/mo-gapdb_2.sh
  - experiments/gapdb_v2_20110505/INFO
  - experiments/gapdb_v3_20110421/mo-gapdb_3.sh
  - experiments/gapdb_v3_20110505/INFO
  - experiments/gapdb_v4_20110421/mo-gapdb_4.sh
  - experiments/gapdb_v4_20110505/INFO
  - experiments/gapdb_wrv_v0_20110507/INFO
  - experiments/gapdb_wrv_v1_20110507/INFO
  - experiments/gapdb_wrv_v2_20110507/INFO
  - experiments/gapdb_wrv_v3_20110507/INFO
  - experiments/gapdb_wrv_v4_20110507/INFO
  - experiments/get_tex_table.py
  - experiments/hhh_20110505/INFO
  - experiments/hhh_ap_20110505/INFO
  - experiments/hhh_pw_20110505/INFO
  - experiments/ipdb_hhh_old/airport-b.txt
  - experiments/ipdb_hhh_old/logistics00-iPDB-default.txt
  - experiments/ipdb_hhh_old/pnt-iPDB-best.txt
  - experiments/ipdb_hhh_old/psr-iPDB-default.txt
  - experiments/ipdb_hhh_old/pwt-iPDB-best.txt
  - experiments/ipdb_hhh_old/sat-iPDB-default.txt
  - experiments/ipdb_hhh_old/tpp-iPDB-default.txt
  - experiments/ipdb_v0_20110308/INFO
  - experiments/ipdb_v0_20110308/mo-ipdb.sh
  - experiments/ipdb_v1_20010314/INFO
  - experiments/ipdb_v1_20010314/mo-ipdb_1.sh
  - experiments/ipdb_v1_20110311/INFO
  - experiments/ipdb_v2_20110317/INFO
  - experiments/ipdb_v2_20110317/mo-ipdb_2.sh
  - experiments/ipdb_v3_20110317/INFO
  - experiments/ipdb_v3_20110317/mo-ipdb_3.sh
  - experiments/ipdb_v4_20110324/INFO
  - experiments/ipdb_v4_20110324/mo-ipdb_4.sh
  - experiments/ipdb_v5_20110331/INFO
  - experiments/ipdb_v6_20110331/INFO
  - experiments/ipdb_v6_20110406/INFO
  - experiments/ipdb_v6_20110406/mo-ipdb_6.sh
  - experiments/ipdb_v7_20110408/INFO
  - experiments/ipdb_v7_20110408/mo-ipdb_7.sh
  - experiments/ipdb_v8_20110413/INFO
  - experiments/ipdb_v9_20110413/INFO
  - experiments/longtable_ipdb.tex
  - experiments/max_cliques_exp/bug_vs_end.tex
  - experiments/max_cliques_exp/get_tex_table_downward_only.py
  - experiments/max_cliques_exp/longtable_bug_vs_end.tex
  - experiments/pdb_v0_20110118/INFO
  - experiments/pdb_v0_20110118/ss-pdb.sh
  - experiments/pdb_v1_20110301/INFO
  - experiments/pdb_v1_20110301/ss-pdb-again.sh
  - experiments/pdb_v1_20110408/INFO
  - experiments/pdb_v1_withouth_search_20110405/INFO
  - experiments/pdb_v1_withouth_search_20110405/ss-pdb-v1x.sh
  - experiments/pdb_v1a_20110311/INFO
  - experiments/pdb_v1a_20110311/ss-pdb-short.sh
  - experiments/pdb_v1x_20110413/INFO
  - experiments/pdb_v2_20110316/INFO
  - experiments/pdb_v2_20110316/ss-pdb-v2.sh
  - experiments/pdb_v2_20110410/INFO
  - experiments/pdb_v2x_20110410/INFO
  - experiments/pdb_v3_20110413/INFO
  - experiments/pdb_v3x_20110414/INFO
  - experiments/pdb_v4_20110415/INFO
  - experiments/pdb_v4x_20110420/INFO
  - experiments/pdbs_bug_20110526/INFO
  - experiments/pdbs_end_20110526/INFO
  - experiments/pdbs_v0_20110308/ss-pdbs.sh
  - experiments/pdbs_vs_ipdb/get_tex_table_downward_only.py
  - experiments/pdbs_vs_ipdb/longtable_pdbs_vs_ipdb.tex
  - experiments/pdbs_vs_ipdb/pdbs_vs_ipdbs.tex
  - results_overview/.DS_Store
  - results_overview/IPDB_Vergleich.tex
  - results_overview/gapdb_vs_ipdb.tex
  - results_overview/longtable_gapdb_vs_ipdb.tex
  - results_overview/longtable_ipdb.tex
  - results_overview/longtable_pdbs_vs_ipdb.tex
  - results_overview/pdbs_vs_ipdbs.tex

Files we currently keep where I am not sure if I want to recommend
deleting them and wouldn't really mind either way:
- ref-results (whole directory)
- results/preprocess/PROBLEMS

Files/directories that I would keep but that others might prefer to
remove:
- downward/search/lp/setup
- downward/translate/no-invariants.patch
- downward/translate/run-additive-hmax
- src/search/border_cases (from the iPDB integration)
- src/search/easy-instances (from issue600)
- src/search/ext
- src/search/testcases (from the iPDB integration)
- src/translate/no-invariants.patch
- src/translate/run-additive-hmax
msg9519 (view) Author: malte Date: 2020-07-07.12:15:21
Thanks, Florian! All sounds good. Yes, adding the config line to disable the sparse-revlog feature should address what I mentioned. I agree about keeping src/search/ext, but will have a closer look at it.

I'm currently rerunning the cleanup based on the newest version and will look at the manifest of the converted repository next.
msg9511 (view) Author: florian Date: 2020-07-07.05:00:21
We use a specific Mercurial version installed in a venv because fast-convert is very particular about which version is compatible with which Mercurial version. One way to access the created mercurial repository would be to use the Mercurial installed in the venv but I guess that just moves problem further down the line. I added your patch to the script. Is this a sufficient solution for your use case?

I wouldn't document how to delete/recreate the venv because under normal operation this should never be necessary.

About the tags: there is one commit that adds an empty line to .hgtags which is later removed, I assume it is where the message about tag "" comes from. The tags "seipp-helmert*" are added, moved, renamed, and deleted multiple times in issue600. In the converted git repo, they point to the final location of these tags in the Mercurial history, so it looks like the warnings can be ignored. issue551-base was also moved once. After the move it pointed to an empty "start branch" commit, which is removed by the conversion to git. In git it points to the parent of the empty commit which is what we want. issue794-base looks to be broken in the source repository already. I added a note about them to the readme.

I also added a note about not supporting paths with spaces and fixed the usage example.



I think this just leaves the final list of files we want to get rid of.

* About "src/search/ext": I would argue that the other files in here are less known than Boost, so it would be harder to find a compatible version later on. But I don't have a strong opinion on this and am fine with removing the others as well (except for the ones we are still using, of course).

* About VAL: I added downward/VAL/* to the list of files to ignore (is snuck back in when we unignored ./downward).


Here is the updated table after the changes (17MB in .git/objects):

size  pack  SHA                                       location
717   718   7ac3af46ce2dc615319200a589f810645d824949  downward/dist/archive/downward-2006-09-29.tar.gz
714   714   0ad45840a1a3905146ac7b5927e4123cd40dd869  downward/dist/archive/downward-2006-09-26.tar.gz
459   384   28a41ec16ee92fcd72f4389f697f6c47b5f9b30d  src/dist/data/doc/fast-downward.pdf
418   34    1bf9638ce6d0976210361ecb127d1ddf0c0ecaf5  docs/prof_logistics_7-0_actual
417   34    5b3cb57d1cd9bcabdf524e9c4432ba7658113afa  docs/prof_logistics_v3
400   158   96f3f49f6e3dabdda338b69f5d1ca536fe1b518f  src/bugs/ff-no-preconditions/search
278   240   163a1ceeb56a870fdf9ce11ad6cc72b746f21aeb  src/dist/data/doc/translator.pdf
231   57    27aaf7516e9a117b1110f6510221c08ee9da8d0b  misc/autodoc/external/txt2tags.py
177   50    b5493b6d986443c7650e169eb160d0c4642d86a2  misc/cpplint.py
113   56    63b5e223e29cbb3bb6a41d366707b1c583971ccf  src/bugs/safety-net/search
91    19    f520b9a70b579738039f04343a1128d8d3656280  downward/val/Validator.cpp
89    16    7b9cf1d23eb191fbec8c8f9033c432138915c309  downward/val/Proposition.cpp
87    14    e58a508f48231009276a9aa106046aca205bc9bb  src/search/raz_abstraction.cc
84    5     201f338aa38eca7cf1a4fd1ccb75654b1e497c86  src/translate/regression-tests/issue49-orig-domain.pddl
82    11    15302fb44458148674d67d1e15e45eaef68561ce  src/search/ext/optional.hh
81    12    b2f300535a7febfd36f8c605c2ad89c5a637d474  src/search/ext/tree.hh
80    17    e78b49bdc05e17d76429e10cb28276976c4e76a5  src/search/ext/btree/btree.h
72    5     5f1c513dc656fba6c9179cec29b29d8c88938225  src/bugs/psr-strips-derived-predicates/domain.pddl
71    13    72d6b85b24ac1695a2cdb9baf8522c75bdd244b5  downward/val/Events.cpp
67    4     845ccc0d02af34c5c25b12745c991b09daab011b  driver/portfolios/seq_sat_remix.py

And here it is after also excluding downward/val/*, downward/dist/archive/*, three profiles, src/bugs/*, and downward/bugs (15MB in .git/objects):

size  pack  SHA                                       location
459   384   28a41ec16ee92fcd72f4389f697f6c47b5f9b30d  src/dist/data/doc/fast-downward.pdf
417   34    cfb10a2b5c87bde2b6b55a694853a49b650c64c1  docs/prof_logistics_7-0_v3
278   240   163a1ceeb56a870fdf9ce11ad6cc72b746f21aeb  src/dist/data/doc/translator.pdf
231   57    27aaf7516e9a117b1110f6510221c08ee9da8d0b  misc/autodoc/external/txt2tags.py
177   50    b5493b6d986443c7650e169eb160d0c4642d86a2  misc/cpplint.py
87    14    e58a508f48231009276a9aa106046aca205bc9bb  src/search/raz_abstraction.cc
84    5     201f338aa38eca7cf1a4fd1ccb75654b1e497c86  src/translate/regression-tests/issue49-orig-domain.pddl
82    11    15302fb44458148674d67d1e15e45eaef68561ce  src/search/ext/optional.hh
81    12    b2f300535a7febfd36f8c605c2ad89c5a637d474  src/search/ext/tree.hh
80    17    e78b49bdc05e17d76429e10cb28276976c4e76a5  src/search/ext/btree/btree.h
67    4     845ccc0d02af34c5c25b12745c991b09daab011b  driver/portfolios/seq_sat_remix.py
66    2     fc153719b7bafadf96c949828f9e51dae9c16fdb  src/translate/regression-tests/issue73-domain.pddl
63    12    fbea03f3dfebc74d03bea3957a644cac83b3706a  src/search/merge_and_shrink/transition_system.cc
61    29    fa451a18288c242ab9dbc99e300ab5078dc624e9  misc/autodoc/external/txt2tags.py
57    11    d84bc16fc57c23cfa7e8dada24185c835cae94ba  src/search/landmarks/landmarks_graph.cc
55    15    4937766fdd188bf13f510b9e9d1c5c2fa57e4b5c  downward/search/Doxyfile
53    5     4a9f7ed1ae9a3cb2b9dd0c880b4bf214b493fb77  src/search/successor_generator.cc
42    9     89c6f4292e3a73db554bc392bdee2f26aa807d96  src/search/landmarks/landmark_factory.cc
38    8     642f109eadcd0295dc52c85d08053c2816152841  src/search/landmarks/h_m_landmarks.cc
36    8     66b86f656a53f4904bf212ea0eabdbba160d8e1b  src/search/cegar/abstraction.cc

One more profile that should probably go, but otherwise the list looks good to me.
msg9510 (view) Author: malte Date: 2020-07-07.01:51:14
One last thing for today: I converted the master to git using Patrick's latest version and played around a bit with Florian's "largest files in git" script. Some observations:


A) My converted .git is 17 MB, almost all of it in "objects". All objects are packed.


B) The list of largest files is similar to what Florian posted earlier. But slightly different. I think it is identical to what Florian posted in a Zoom discussion today:

size  pack  SHA                                       location
717   718   7ac3af46ce2dc615319200a589f810645d824949  downward/dist/archive/downward-2006-09-29.tar.gz
714   714   0ad45840a1a3905146ac7b5927e4123cd40dd869  downward/dist/archive/downward-2006-09-26.tar.gz
459   384   28a41ec16ee92fcd72f4389f697f6c47b5f9b30d  src/dist/data/doc/fast-downward.pdf
418   34    1bf9638ce6d0976210361ecb127d1ddf0c0ecaf5  docs/prof_logistics_7-0_actual
417   34    5b3cb57d1cd9bcabdf524e9c4432ba7658113afa  docs/prof_logistics_v3
400   158   96f3f49f6e3dabdda338b69f5d1ca536fe1b518f  src/bugs/ff-no-preconditions/search
278   240   163a1ceeb56a870fdf9ce11ad6cc72b746f21aeb  src/dist/data/doc/translator.pdf
231   57    27aaf7516e9a117b1110f6510221c08ee9da8d0b  misc/autodoc/external/txt2tags.py
177   50    b5493b6d986443c7650e169eb160d0c4642d86a2  misc/cpplint.py
113   56    63b5e223e29cbb3bb6a41d366707b1c583971ccf  src/bugs/safety-net/search
91    19    f520b9a70b579738039f04343a1128d8d3656280  downward/VAL/Validator.cpp
89    16    7b9cf1d23eb191fbec8c8f9033c432138915c309  downward/VAL/Proposition.cpp
87    14    e58a508f48231009276a9aa106046aca205bc9bb  src/search/raz_abstraction.cc
84    5     201f338aa38eca7cf1a4fd1ccb75654b1e497c86  src/translate/regression-tests/issue49-orig-domain.pddl
82    11    15302fb44458148674d67d1e15e45eaef68561ce  src/search/ext/optional.hh
81    12    b2f300535a7febfd36f8c605c2ad89c5a637d474  src/search/ext/tree.hh
80    17    e78b49bdc05e17d76429e10cb28276976c4e76a5  src/search/ext/btree/btree.h
72    5     5f1c513dc656fba6c9179cec29b29d8c88938225  src/bugs/psr-strips-derived-predicates/domain.pddl
71    13    72d6b85b24ac1695a2cdb9baf8522c75bdd244b5  downward/VAL/Events.cpp
67    4     845ccc0d02af34c5c25b12745c991b09daab011b  driver/portfolios/seq_sat_remix.py


C) Looking at that list:

We already discussed that we probably want to get rid of the tar.gz archives (entries #1 and #2), of the profiles (entries #4 and #5), and of the executables in src/bugs (entries #6 and #10). I said I'd like to look into what exactly these files and directories are about, but I haven't done that yet.

I assume we want to keep src/search/ext, but it's kind-of similar to the Boost libraries we're pruning, so perhaps someone wants to weigh in with their opinion if they would prefer to remove it.

Didn't we want to remove VAL? There are several VAL files in the top 20 as we can see, and I think VAL also has many source files in addition to having some large ones. (But perhaps not *that* many, not sure.)


D) We were surprised by some of the "size" and "pack" entries, e.g. the executables like src/bugs/safety-net/search being so small. Indeed, that file *isn't* just 113K.

$ git cat-file -p 63b5e223e29cbb3bb6a41d366707b1c583971ccf | wc -c
411891

BTW, these 411K are indeed the whole Fast Downward search executable at the time, and it wasn't even stripped. :-)

I've dug into this a bit more deeply. The output of "git verify-pack", on which the numbers reported by the script are based, also includes information on which files a given file is deltified against. If we look at the two executables in the above list:

400   158   96f3f49f6e3dabdda338b69f5d1ca536fe1b518f  src/bugs/ff-no-preconditions/search
113   56    63b5e223e29cbb3bb6a41d366707b1c583971ccf  src/bugs/safety-net/search

then closer inspection shows that the first one is not deltified, and the second one is deltified against the first. For the first one, the reported 400K exactly matches the actual file size of 410545 (which is the size that verify-pack shows) -- the output is in KiB, not KB, and it's rounded down. 410545 bytes is roughly 400.9 KiB. I assume that the reported 158 KiB are then the size of the file after compression, which roughly matches the amount of compression that bzip2 can achieve on this file (down to 136 KiB, compared to 158 KiB reported here by git), so sounds plausible:

$ git cat-file -p 96f3f49f6e3dabdda338b69f5d1ca536fe1b518f | bzip2  | wc -c
139671

The second file is also roughly 400 KiB in size in truth, but like I said it's deltified against the first file by git. My guess is that the first number is the size of the uncompressed delta, and the second size is the size of the compressed delta. To test this hypothesis, I concatenated the two uncompressed files, ran bzip2 on the concatenation, and subtracted the compressed size of the first file alone from this. This should give us a rough estimate of the compressed delta size. The difference is 53438 bytes, which is reasonably close to the 56 KiB reported by git here.

This also makes some sense for the top two entries in the large file list, where the packed size more or less matches the basic size: these are gzipped archives, and we can expect that such compressed archives cannot be compressed or deltified well.

So I think this explanation makes sense, and if it does it gives us a better idea of how to interpret these numbers now. One consequence is that the sizes cannot be interpreted in isolation. We probably don't need to worry about this too much -- I think they still serve the purpose of identifying what the big space wasters are. But one consequence is that if we remove a very large waster according to this list, it may not actually give us any benefit at all unless we also remove all other files in the repository that are deltified against them. These other files won't show up as large wasters in the script while their deltification source is still present, but at least one of them will after we have removed it. But this can be addressed by iterating the removal of big wasters. And of course if we do this properly, when we remove a file, we will likely also remove the most closely semantically related files (e.g. removing all VAL source files, not just the ones that show up near the top), and these are also the likely deltification sources.
msg9509 (view) Author: malte Date: 2020-07-07.00:36:09
(All the following comments are about the hg-to-hg clean-up script, not the hg-to-git conversion script.)

1) I tried the clean-up script, and it ran successfully. :-) I then wanted to look a bit at the resulting hg repository. It doesn't have a checked-out working directory, and I assume this is intentional. But when I try to create one with "hg up", I get:

abort: repository requires features unknown to this Mercurial: sparserevlog!
(see https://mercurial-scm.org/wiki/MissingRequirement for more information)

Same reply from "hg log" and "hg status".

My guess is that the script uses a hg version that makes uses of a repository format that the hg version installed on my computer (running Ubuntu 18.04) doesn't understand. Any ideas what to do? I guess I need some venv-based solution that installs the same hg version that the conversion script uses? [Addendum: I found some answers below.]

It looks like sparserevlog requires Mercurial >= 4.7, but Ubuntu 18.04 has Mercurial 4.5. Could the script use a 4.5 or older Mercurial version instead? [Addendum: probably not, see point 4 below.]


2) The clean-up script produces 156 errors on stderr, one about "" being an invalid tag entry and the rest about missing tag entries, referring to tags seipp-helmert@icaps-2013, seipp-helmert@icaps-2014, seipp-helmert-icaps-2013, seipp-helmert-icaps-2014, issue551-base and issue794-base. Are these expected errors? Then I think it would be good to mention this in the readme.


3) I tentatively changed the hg version in the seup script and reran it, but that failed because the venv was already set up and wasn't recreated properly with the (now different) hg version. I deleted the venv, but it wasn't clear that this needed to be done because it is ignored by "git status" and hidden away behind "data" (FWIW, I find "data" an odd spot for the venv). It might be useful to add some information to the README what needs to be done for a "clean" re-run. Then again, perhaps we don't need to cater to people that fiddle with the scripts. :-) Just wanted to share my experience in case it is helpful for others in a similar situation.


4) The script didn't work with the older hg version, perhaps because renaming_mercurial_source.py does some version-specific monkey-patching? In that case, is there a way to instruct hg to create a version that works with older Mercurials?


5) I played around with possible answers to #4. Creating a bundle at the end of the script with "hg bundle" that can then be unbundled with the older hg version is an indirect solution that works, but Section 2.3 of https://www.mercurial-scm.org/wiki/MissingRequirement suggested a more direct idea, namely disabling the sparse-revlog to start with. Specifically, the following diff seems to have done the trick for me:

$ git diff
diff --git a/run-cleanup.sh b/run-cleanup.sh
index ca661ba..044e649 100755
--- a/run-cleanup.sh
+++ b/run-cleanup.sh
@@ -29,6 +29,7 @@ fi
 source "$VIRTUALENV/bin/activate"
 
 hg \
+ --config format.sparse-revlog=0 \
  --config extensions.renaming_mercurial_source="${BASE}/renaming_mercurial_source.py" \
  convert $1 $2 \
  --config extensions.hgext.convert= \

So I now have a cleaned up hg repository I can work with. :-) I started poking at it a little bit, but I'd rather poke at it some more before reporting details. I also converting a second (derived from master) repository, and the relationship between the two cleaned up repositories looked fine.


6) So that it doesn't get lost, let me repeat one earlier comment:

The script fails when the source path includes a space. I don't think we need to fix it, but then the README should mention it as a limitation.

(Looking at the script, I think the same will happen if the destination path contains a space.)


7) In the README, the usage looks a bit odd:

Usage:
    
    ./run-cleanup.sh [MERCURIAL REPOSITORY] [CLEANED MERCURIAL REPOSITORY]
    
    ./run-conversion.sh [MERCURIAL REPOSITORY] [CONVERTED GIT REPOSITORY]
    
        ./run-cleanup-and-conversion.sh [MERCURIAL REPOSITORY] \
                                        [CLEANED MERCURIAL REPOSITORY] \
                                        [CONVERTED GIT REPOSITORY]

I assume these are supposed to be indented to the same level, and -- more importantly -- shouldn't the first argument to run-conversion.sh be described as "CLEANED MERCURIAL REPOSITORY" instead?
msg9479 (view) Author: patfer Date: 2020-07-06.13:37:16
Jendrik and I chatted shortly about filtering the output of fast-export (yes, 
fast-export writes everything to stderr). Currently, we both think, we should 
not filter it and have a live discussion about this topic.
msg9478 (view) Author: patfer Date: 2020-07-06.12:49:16
- I split the scripts into 2 standalone scripts (run_cleanup.sh, 
run_conversion.sh) and one script that runs both one after the other (but we 
can also remove this one, everyone should be able to execute 2 scripts after 
another in bash)
- the -x option is removed

Question: I checked the stdout/stderr of fast-export. Is it for you the same 
that fast-export only prints to stderr? That is quite annoying
msg9472 (view) Author: malte Date: 2020-07-03.19:39:06
[This crossed with Florian's message, but now I have to leave for today.]

Some initial general comments before we have a code review link set up:

- If I understand correctly, the script only keeps the final git repository. I would prefer it to also keep the intermediate hg repository. I think it's really useful to debug things, and also for our final conversion I would actually like to keep that version of the hg repository. My suggestion would be to make it two top-level scripts: an hg cleanup one, and an hg to git conversion one. I don't think a script that does both is needed, and I think it's easier to understand what goes on if they are kept separate.

- The script fails when the source path includes a space. I don't think we need to fix it, but then the README should mention it as a limitation.

- I'm not sure the -x tracing output is a good idea, and I think Jendrik already requested removing this in an early code review. Do we really want to keep it? The output is very noisy. The fact that the tracing output is on stderr makes it even more distracting; I'll generally look at stderr for errors.

- Also besides -x, there is lots of output, including many things about missing tags that look like errors. If we want others to use this and if this output is not an error, the README should perhaps point this out. But I think a better solution would be to try to make the output much less noisy, without swallowing actual errors.  I get more output than the size of my shell history, so I couldn't even scroll back to the start to see all of it. This is something that is perhaps better discussed interactively.
msg9471 (view) Author: jendrik Date: 2020-07-03.19:33:07
You could just resolve the conversations on the pull request page: https://github.com/aibasel/convert-downward/pull/3/files
msg9470 (view) Author: florian Date: 2020-07-03.19:32:50
I couldn't close the old pull request (only Patrick can, I suspect) but I marked it as closed and could create a new one:
https://github.com/aibasel/convert-downward/pull/4/files
msg9469 (view) Author: florian Date: 2020-07-03.19:29:20
I made the changes and added one more file to the exclude list: 
./src/search/lp/coinutils-configure.patched is a patch for the osi installer that we removed. It is not useful without the installer and we also keep a copy around attached to the issue where the patch was created.

I'll try to recreate the pull request.
msg9468 (view) Author: malte Date: 2020-07-03.19:20:39
Hi Patrick, from my side this is ready for final code review, but the code review I see behind the last link is already really cluttered with all the comments, and I think we cannot easily review this way. How can we start in a fresh way, does this require a new pull request?

I agree with Florian about adding the aggressive gc.

One thing I noticed is that the README file misspells Fast Downward twice (no hyphen).
msg9467 (view) Author: malte Date: 2020-07-03.19:11:29
I don't see the filter-branch command as dangerous in the context in which we use it. The problem with it is that it's easy to shoot yourself in the foot, but we keep backups of our feet and are inspecting them carefully before and after. Also, pruning empty commits is a really simple transformation and I think the documentation explains how the tool works in this case nicely. But I haven't reviewed the script yet.

One thing I would recommend to anyone doing size comparisons is to do the "git clone with a file URL" trick. It avoids many potential pitfalls that can make size comparisons off, such as hardlinks.

Cautionary tale, this may really surprise you:

$ git clone grounder-2 grounder-3
Cloning into 'grounder-3'...
done.

$ du -sch grounder-2
828K	grounder-2
828K	total

=> OK!

$ du -sch grounder-3
828K	grounder-3
828K	total

=> Makes sense!


$ du -sch grounder-2 grounder-3
828K	grounder-2
664K	grounder-3
1,5M	total

=> What?

$ du -sch grounder-3 grounder-2
828K	grounder-3
664K	grounder-2
1,5M	total

=> What???


Long story short, if you clone repositories this way, they will share certain files. "du" will account any shared files to whatever repository it looks at first, so depending on the order in which you look at them, the second one it looks at will appear smaller. But they are really identical.



Now with using file URLs:

$ git clone file:///home/helmert/repos/grounder-2 grounder-2b
Cloning into 'grounder-2b'...
remote: Counting objects: 490, done.
remote: Compressing objects: 100% (214/214), done.
remote: Total 490 (delta 256), reused 490 (delta 256)
Receiving objects: 100% (490/490), 144.18 KiB | 5.77 MiB/s, done.
Resolving deltas: 100% (256/256), done.

=> With the file URL, this is handled like a clone from a remote, so no hardlinking (= sharing of files).


$ helmert@skinny:~/repos$ du -sch grounder-2 grounder-2b
828K	grounder-2
828K	grounder-2b
1,7M	total

=> OK!

$ du -sch grounder-2b grounder-2

828K	grounder-2b
828K	grounder-2
1,7M	total

=> OK!
msg9466 (view) Author: florian Date: 2020-07-03.19:00:16
The repository was a clone of something else (cloned from a path, not the file:// url) but not the other way around. Unfortunately, I recreated the repositories in the meantime to start from the cleanly converted repository again. This time, I used a version of the conversion script that included "git gc --aggressive" as a last step and now both repositories have around 30MB. The repository without the empty commits is still a bit larger but this time the difference is small and not in the unpacked objects. I guess my stubborn unpacked files that were ignored by repack must have been a problem that occurred through something I did while analyzing the repositories.

If you think the use of git filter-branch is fine, I don't think it is worth digging deeper. I suggest to add the call to git gc --aggressive and exclude downward/validate and leave it at that.
msg9465 (view) Author: malte Date: 2020-07-03.18:29:35
There is also the possibility that the files are not removed because they are also referenced from a second repository via hardlinks. Does a clone of the repository exist? To get a clone with no sharing, clone with a "file" URL rather than a filename (path). For example, try "git clone file:///path/to/repo new-repo" and look at new-repo.
msg9464 (view) Author: malte Date: 2020-07-03.18:23:11
Garbage collections works the other way round. Everything is deleted until it is proven to be reachable.

git has no notion of which data "should" be there other than the direct references (branches, tags, reflogs, which all point to one specific revision) and ancestry. Garbage collection marks everything reachable from these and deletes everything else.

But all this isn't really relevant here. This is about the packed/unpacked relationship, which is an unrelated thing. Packed objects are added to packs and should then be deleted. That requires no reasoning about what the objects even are; they are just blobs of data. There isn't really a way in which this can be corrupted.

Also, to be clear, I think the warnings on git filter-branch are really just the usual warnings about "if you leave something unreachable, it will be gone after gc", or relatedly, "if you delete some parents of a merge commit, weird things will happen". They are really about shooting yourself in the foot in the "normal" git ways, except that you as using a powertool here.
msg9463 (view) Author: florian Date: 2020-07-03.18:14:30
This may be one of the problems that using git-filter-branch brings. Maybe the files are no longer associated with the represented tree and thus not considered by git-repack?

I'm fine with continuing this on Monday. But I would also be fine with dropping it: we know what those files are and where the difference in size between the repositories comes from. If we don't strip empty commits, and use garbage collection as a last step of the conversion, we get a very small repo. Is it worth the effort to get rid of the empty commits and save another 300KB?
msg9462 (view) Author: malte Date: 2020-07-03.18:09:06
Hmmm, worked for me. :-( (On another repository.) Perhaps we should really continue this in a Zoom session at some point, sorry for the slew of messages to everyone on this unusually large nosy list!
msg9461 (view) Author: florian Date: 2020-07-03.18:07:42
No dice, unfortunately. I tried that earlier and I thing "git gc" internally also uses git-repack. I tried it again as you suggested but the files remain there.
msg9460 (view) Author: malte Date: 2020-07-03.18:03:49
Elaborating a bit: I think the normal mode of operation is that all files added by git are initially added unpacked and then periodically packed once enough of them have been accumulated (or perhaps it's also time-based).

The point of packing is to apply delta compression (i.e., two similar files will be smaller than the sum of their two sizes) and also to reduce file system wastage due to having lot of small size.

Try "git repack -ad" followed by "git gc".
msg9459 (view) Author: malte Date: 2020-07-03.17:57:53
These files (just like the others) can be "packed", in which case they end up compressed in the pack. Some invocation of the "repack" command I mentioned a few messages down ought to help with that.
msg9458 (view) Author: florian Date: 2020-07-03.17:41:02
Looks like downward/validate is missing from the filemap. Patrick, can you  add it?

I looked closer at the files in .git/objects/xx/*. As this article (https://git-scm.com/book/en/v2/Git-Internals-Git-Objects) explains, you can view the content of file .git/objects/xx/yyyyy using "git cat-file -p xxyyyyy".

It looks like these are the commit meta data. There are roughly 11000 of them, which matches with the number of commits and printing them shows things like this:

============
tree 0d0df310b249d01fbdfeb8ff3c0ebfefb675956d
parent aca0d632b2a4fd69ee4deebd58ee0228ab01a0ae
author Jendrik Seipp <jendrik.seipp@unibas.ch> 1493220240 +0200
committer Jendrik Seipp <jendrik.seipp@unibas.ch> 1493220240 +0200

[issue719] clean up token parser
===========

The question is why this data is stored in this way in one but not both repositories.
msg9456 (view) Author: malte Date: 2020-07-03.17:19:13
Thanks, Florian! The output makes a lot of sense. Blobs and trees should be identical, and they are. Commits should be lower without the empty commits, and they are. Not 100% sure what "references" means, but I'd guess it's tags, branches and reflog (plus perhaps a few similar things). The order of magnitude makes sense for this, and it makes sense that it's the same number.

The only number that differs where I have no clue what it means is "maximum history depth". Ah, perhaps that's the length of the longest path in the repository DAG. If that's the case, it makes sense that removing the empty commits reduces it.

It looks from the output like the repository includes a binary of VAL (see Blobs/maximum size and footnote 4). Shouldn't that be pruned by the conversion? Or is this based on a non-cleaned-up hg repository?
msg9455 (view) Author: florian Date: 2020-07-03.17:11:25
I used git-sizer for msg9444 and saw no significant differences between the repositories. For completeness' sake, here is the output of ./../git-sizer/git-sizer --threshold 0

Processing blobs: 28874                        
Processing trees: 37661                        
Processing commits: 12147                        
Matching commits to trees: 12147                        
Processing annotated tags: 0                        
Processing references: 657                        
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |  12.1 k   |                                |
|   * Total size               |  3.33 MiB |                                |
| * Trees                      |           |                                |
|   * Count                    |  37.7 k   |                                |
|   * Total size               |  49.4 MiB |                                |
|   * Total tree entries       |  1.22 M   |                                |
| * Blobs                      |           |                                |
|   * Count                    |  28.9 k   |                                |
|   * Total size               |   225 MiB |                                |
| * Annotated tags             |           |                                |
|   * Count                    |     0     |                                |
| * References                 |           |                                |
|   * Count                    |   657     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |  1.50 KiB |                                |
|   * Maximum parents      [2] |     2     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |   171     |                                |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |  6.97 MiB |                                |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |  4.60 k   |                                |
| * Maximum tag depth          |     0     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [5] |   220     |                                |
| * Maximum path depth     [6] |     6     |                                |
| * Maximum path length    [7] |    72 B   |                                |
| * Number of files        [8] |  1.50 k   |                                |
| * Total size of files    [9] |  10.6 MiB |                                |
| * Number of symlinks         |     0     |                                |
| * Number of submodules       |     0     |                                |

[1]  dd349a6f7f19776bbfb25d432f681d91152e4891
[2]  140a15e7c437b712267a7ed1d95242c44217c81e
[3]  1da1890c77269a7c30cf4e745d815a9ca5a78760 (refs/heads/main:experiments)
[4]  271631eb6d7c7bd5fac7cd9ac7d3d765badb4b20 (9c2ff8dffd48e7ab58f9cae203d4e1140d8bef85:downward/validate)
[5]  9c89de671e544cb9753ff292ca3f81a3ff864f46 (refs/heads/main^{tree})
[6]  af0d711aaf96396d8b1ba71cd5122d7c4cf75fc6 (3746f02337a08355cb4c0a5d24470c4917c4ca86^{tree})
[7]  ec21e3c1f977abf9a360534ecf90e167cf140288 (6d2aa8c5d3fc942531536f6a323ac9d2898e36f6^{tree})
[8]  0644851e90592fdbfac7fbe69885cd850325d7e5 (82cb6d4d7516ede94b85fe150a3035dfae4884e6^{tree})
[9]  9d240a799d3efe530fafa4db4348dea9604a0301 (7b5c6b8c73b590594f440e5514c46b0d628bc1f9^{tree})


... and for the repo without empty commits:

Processing blobs: 28874                        
Processing trees: 37661                        
Processing commits: 10877                        
Matching commits to trees: 10877                        
Processing annotated tags: 0                        
Processing references: 657                        
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |  10.9 k   |                                |
|   * Total size               |  2.99 MiB |                                |
| * Trees                      |           |                                |
|   * Count                    |  37.7 k   |                                |
|   * Total size               |  49.4 MiB |                                |
|   * Total tree entries       |  1.22 M   |                                |
| * Blobs                      |           |                                |
|   * Count                    |  28.9 k   |                                |
|   * Total size               |   225 MiB |                                |
| * Annotated tags             |           |                                |
|   * Count                    |     0     |                                |
| * References                 |           |                                |
|   * Count                    |   657     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |  1.50 KiB |                                |
|   * Maximum parents      [2] |     2     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |   171     |                                |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |  6.97 MiB |                                |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |  4.09 k   |                                |
| * Maximum tag depth          |     0     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [5] |   220     |                                |
| * Maximum path depth     [6] |     6     |                                |
| * Maximum path length    [7] |    72 B   |                                |
| * Number of files        [8] |  1.50 k   |                                |
| * Total size of files    [9] |  10.6 MiB |                                |
| * Number of symlinks         |     0     |                                |
| * Number of submodules       |     0     |                                |

[1]  dd349a6f7f19776bbfb25d432f681d91152e4891
[2]  01d8a6d4ffcfdf935621bd90b91fdba051336048
[3]  1da1890c77269a7c30cf4e745d815a9ca5a78760 (refs/heads/main:experiments)
[4]  271631eb6d7c7bd5fac7cd9ac7d3d765badb4b20 (9c2ff8dffd48e7ab58f9cae203d4e1140d8bef85:downward/validate)
[5]  9c89de671e544cb9753ff292ca3f81a3ff864f46 (refs/heads/main^{tree})
[6]  af0d711aaf96396d8b1ba71cd5122d7c4cf75fc6 (f2b3ba73f838a1527c2f6bbf5b2d34f7940923e5^{tree})
[7]  ec21e3c1f977abf9a360534ecf90e167cf140288 (826823841e18bf7bb09efe4b3a48d0df7424acec^{tree})
[8]  0644851e90592fdbfac7fbe69885cd850325d7e5 (11d5fb6b66340f3637eea04a3be813fa6dd6b02c^{tree})
[9]  9d240a799d3efe530fafa4db4348dea9604a0301 (7b5c6b8c73b590594f440e5514c46b0d628bc1f9^{tree})
msg9454 (view) Author: malte Date: 2020-07-03.17:09:26
[This is the reply.]

OK, this does indeed seem to be related to packs. My understanding is that files can be available in packed state, unpacked state, or both. It looks like in one of the repositories, all files are available only in packed state, and in the other one it's a mixture. But this is something I don't currently know enough about and would like to radmore.

A clean comparison (but perhaps not the most relevant one) could be made if none of the files are packed because then the data format is very clear and deterministic.

I don't know enough about git to know if it's normal for most or all files to be packed or not, but I would assume that it makes sense for all "historical" files to be packed and only recent additions not to be. Anyway, I don't know a lot about this (yet), but would like to learn more about it anyway because I think it's useful git knowledge.
msg9453 (view) Author: malte Date: 2020-07-03.17:04:08
[This overlapped with Florian's message, so this is not a reply. I'll send a reply next.]

PS: There is also the issue of packs (see "git repack" and "git prune-packed"), and I'm not actually sure what the "best" way to measure repository size is. The git-sizer tool looks useful:

    https://github.com/github/git-sizer/

I don't care very much about pruning empty commits or not, but I would like to get to the bottom (or at least a bit closer to the bottom) of the question of what our repository size is, no matter whether we prune empty commits or not, because that seems to be a good thing to consider right now when we still have some chance to influence it.


@Florian: if you're interested, it might be interesting to look into these things together via zoom with a screen share so that we can pool ideas etc. But I don't have much time today, as I'm meeting friends at 19:00. It may have to wait until Monday. Would that be OK?
msg9452 (view) Author: florian Date: 2020-07-03.16:59:43
In the clone with empty commits, I get the following after garbage collection:

$ ls .git/objects 
info
pack

$ ls -la .git/objects/pack
-r--r--r-- 1 pommeren pommeren 2.2M Jul  3 16:25 pack-a12509d779c195adac8215625482cfd49a1e3a9f.idx
-r--r--r-- 1 pommeren pommeren  14M Jul  3 16:25 pack-a12509d779c195adac8215625482cfd49a1e3a9f.pack


For the clone without empty commits, I get:

$ ls .git/objects 
00  06  0c  12  18  1e  24  2a  30  36  3c  42  48  4e  54  5a  60  66  6c  72  78  7e  84  8a  90  96  9c  a2  a8  ae  b4  ba  c0  c6  cc  d2  d8  de  e4  ea  f0  f6  fc
01  07  0d  13  19  1f  25  2b  31  37  3d  43  49  4f  55  5b  61  67  6d  73  79  7f  85  8b  91  97  9d  a3  a9  af  b5  bb  c1  c7  cd  d3  d9  df  e5  eb  f1  f7  fd
02  08  0e  14  1a  20  26  2c  32  38  3e  44  4a  50  56  5c  62  68  6e  74  7a  80  86  8c  92  98  9e  a4  aa  b0  b6  bc  c2  c8  ce  d4  da  e0  e6  ec  f2  f8  fe
03  09  0f  15  1b  21  27  2d  33  39  3f  45  4b  51  57  5d  63  69  6f  75  7b  81  87  8d  93  99  9f  a5  ab  b1  b7  bd  c3  c9  cf  d5  db  e1  e7  ed  f3  f9  ff
04  0a  10  16  1c  22  28  2e  34  3a  40  46  4c  52  58  5e  64  6a  70  76  7c  82  88  8e  94  9a  a0  a6  ac  b2  b8  be  c4  ca  d0  d6  dc  e2  e8  ee  f4  fa  info
05  0b  11  17  1d  23  29  2f  35  3b  41  47  4d  53  59  5f  65  6b  71  77  7d  83  89  8f  95  9b  a1  a7  ad  b3  b9  bf  c5  cb  d1  d7  dd  e3  e9  ef  f5  fb  pack

$ ls -la .git/objects/pack
-r--r--r-- 1 pommeren pommeren 2.1M Jul  3 16:40 pack-f226f56ac61da073ac7184b8eb48c8105b88cf57.idx
-r--r--r-- 1 pommeren pommeren  14M Jul  3 16:40 pack-f226f56ac61da073ac7184b8eb48c8105b88cf57.pack

All of the other directories (00, 06, 0c, ...) contain ~40-50 files of roughly 200KB each.
msg9451 (view) Author: malte Date: 2020-07-03.16:48:21
Then it really does sound like something slightly fishy is going on.

I read up on git's storage system last year, and it's really fairly straightforward. When comparing two repositories with the same file contents (forming the same union over all commits), the set of filenames in .git/objects should really be the same because .git/objects is ultimately a hash map indexed by the (hash sum of the) file contents of the files. If these sets of filenames are wildly different between two logically identically repositories, there is something really fishy going on.

A discrepancy in the *size* of the files can potentially be explained by more/less successful delta compression. But if I understand you correctly the difference is that one of the repository has a lot *more* files? That's quite weird.
msg9450 (view) Author: florian Date: 2020-07-03.16:41:29
Expiring the reflogs did not affect the size, even when calling the garbage collection afterwards.
msg9449 (view) Author: malte Date: 2020-07-03.16:40:46
As long as the old (before pruning empty commits) commits are still reachable from your reflog, they are still alive and cannot be garbage-collected.

After pruning the empty commits, you need to make sure that your HEAD is pointed at a "new" commit (i.e. in the part of the DAG that references the state after pruning the empty commits). You also must make sure that none of the commits before the conversion are referenced in the reflog. Normally they would be because we were looking at them before the conversion, and the reflog tracks what we have recently been looking at.

You can expire the reflog using the info in msg9447. You can check that your reflog has been cleaned with "git reflog". (Ideally, run before and after expiring it to see the difference.)

Only when none of the pruned commits are reachable any more should we run garbage collection.
msg9448 (view) Author: florian Date: 2020-07-03.16:35:01
"git gc --aggressive" reduced the size of both of my repositories (with/without removing empty commits). The one without empty commits still has the additional files in .git/objects, so the new sizes for me are
 29MB (with empty commits)
 77MB (without empty commits)

I think it would make sense to call "git gc --aggressive" at the end of the conversion but also, I don't think it is necessary for us to dig down to the level of .git/objects. This seems to be internal state of the repository which goes up an down during the livetime of a repository. The garbage collection will be called from time to time anyway if people work with the repository.
msg9447 (view) Author: malte Date: 2020-07-03.16:34:36
See also here:

https://stackoverflow.com/questions/49067898/git-remove-old-reflog-entries

Sorry for the spam.
msg9446 (view) Author: malte Date: 2020-07-03.16:21:55
(I wasn't clear: the reflog thing and the --aggressive recommendation are unrelated points. "git help gc" has some more info.)
msg9445 (view) Author: malte Date: 2020-07-03.16:21:13
Garbage collection won't remove revisions that are still reachable through your reflog. That is fixable. Not sure if cloning removes the reflog, but if it does, cloning should resolve this. It's also worth trying the --aggressive option.
msg9444 (view) Author: florian Date: 2020-07-03.16:18:24
I did a couple of tests with and without removing empty commits and with and without cloning after the conversion. I could not reproduce the effect that the repository got smaller after cloning.

Removing the empty commits actually made the repository bigger for me (83MB to 133MB). All additional files where in .git/objects and I suspect that this came out of the step removing the empty commits. I also tried calling "git gc" to remove unused data but it didn't change anything. Other than the additional files, the size seems to have gone down by ~300KB (measured with git-sizer and this matches the size change of .git/objects/pack).

All in all, I think removing the empty commit is a risky step: the effect on the repository size seems to be unpredictable and the tool used to remove them (git filter-branch) is known to be fragile and to have a tendency to mess things up. I would opt for leaving the empty commits in the repository. What do you think?
msg9430 (view) Author: patfer Date: 2020-07-03.11:41:50
The scripts is in a good shape now. After multiple reviews, the current version 
can be seen at
https://github.com/aibasel/convert-downward/pull/3

Only a final approval is missing and we can ship it.
msg9391 (view) Author: malte Date: 2020-07-01.20:26:21
Thanks, Jendrik! I'm done with my comments.
msg9389 (view) Author: jendrik Date: 2020-07-01.19:51:16
Here you go: https://github.com/aibasel/convert-downward/pull/3/files
msg9388 (view) Author: malte Date: 2020-07-01.19:44:56
I have a comment or two on the list. Do we have it in commentable form somewhere, e.g. as a pull request?
msg9366 (view) Author: florian Date: 2020-07-01.09:39:48
The impact of removing files is predicted well by their "packed" sized (second column), so in this case, removing the first 7 files would only save us 1.1MB. It's probably better to leave them in. Since those were the largest files after the conversion, does that mean that we can treat the list of removed files as final?

Here is the current list, in case you want to check that it is not too agressive:
https://github.com/aibasel/convert-downward/blob/master/data/downward_filemap.txt
msg9365 (view) Author: malte Date: 2020-06-30.19:36:31
I wouldn't go overboard with removing things from history for another megabyte or two. Once we've gone after the big fish, I think there is a point where it's best to call it done.

If you remove the ones you suggest, what is the impact on repository size?

Several of these involve some kind of breakage when removed. For example, the PDF files in "docs" are part of the documentation of these old versions, and I think they are referenced in READMEs etc. They are not incredible valuable, but it's still not a common thing to retroactively break old repository revisions, and we should really only do it if there is a substantial benefit.

For example, if it's 54 or 57 KiB for the txt2tags changes, I'm not sure it's a good tradeoff. *Some* version of txt2tags is available elsewhere, sure, but removing them still means that the autodoc code in these old revisions that currently works (with the correct python versions etc.) will no longer work because txt2tags is expected to be there but not present, and there is no way to find out which version is needed.
msg9363 (view) Author: jendrik Date: 2020-06-30.18:13:53
I also think that the first seven entries can be removed. Also, we don't use cpplint anymore and can remove it. Txt2tags is available on the PyPI now, so we could remove our copy from the repo and install it on the buildbot (and in the tox environment that tests building the docs under different Python versions). Should I prepare a patch with that change?
msg9361 (view) Author: florian Date: 2020-06-30.17:03:33
The converted git repository has a size of 83.4 MB:
* 71.8 MB in .git (see below)
* 7.2 MB in experiments
* 3.6 MB in code
* 0.8 MB in driver, misc, etc.

In the history (.git) the largest space users are:

size  packed  location    (size and packed size in kB)
459   384     src/dist/data/doc/fast-downward.pdf
418   34      docs/prof_logistics_7-0_actual
417   34      docs/prof_logistics_7-0_v3
402   159     src/bugs/safety-net/search
400   158     src/bugs/ff-no-preconditions/search
395   44      docs/prof_logistics_v2
278   240     src/dist/data/doc/translator.pdf
231   57      misc/autodoc/external/txt2tags.py
231   57      misc/autodoc/external/txt2tags.py
231   57      misc/autodoc/external/txt2tags.py
189   54      misc/autodoc/external/txt2tags.py
177   54      misc/cpplint.py
130   38      misc/cpplint.py

Sorting by packed size shows more or less the same files in a different order and another profile (docs/prof_logistics_v3). The next largest files are actual code files and PDDL files from src/translate/regression-tests which we should keep.

Should we remove anything from the list above? To me it looks like the first 7 entries could be deleted but as far as I know, we are still using txt2tags and cpplint, so we cannot remove them.




For reference, here is the script I used to generate the data:
https://stackoverflow.com/questions/10622179/
msg9143 (view) Author: patfer Date: 2020-01-10.16:14:51
The clean up script now also removes the branches issue323 & ipc-2011-fixes.
Thank you Jendrik for uploading the ipc tarball.

The cleanup script is ready for another reviewing:
https://github.com/aibasel/convert-downward/pull/1/commits

The conversion script is mostly finished for reviewing. Currently all branches
closed in Mercurial are open. After we have decided on a Git workflow (I suggest
this for the next FastDownward meeting):
https://github.com/aibasel/convert-downward/pull/2
msg9142 (view) Author: jendrik Date: 2020-01-08.22:53:22
Done.
msg9141 (view) Author: silvan Date: 2020-01-08.22:49:35
Good idea.
msg9140 (view) Author: malte Date: 2020-01-08.19:17:54
Works for me.
msg9139 (view) Author: jendrik Date: 2020-01-07.19:30:28
What do you think about exporting the latest revision on branch ipc-2011-fixes and uploading the tarball to http://www.fast-downward.org/IpcPlanners ?
msg9135 (view) Author: malte Date: 2019-12-24.14:06:44
Pruning the issue323 branch isn't a problem. That code can live somewhere else until the issue is completed.

Regarding the ipc-2011-fixes branch, if we remove it from the repository, we need to decide how/where to host it instead it and what to do with the documentation page that relates to it (http://www.fast-downward.org/IpcPlanners). Those in favour of removal, what is your suggestion?
msg9130 (view) Author: patfer Date: 2019-12-20.10:19:30
I agree with Jendrik about removing the branches ipc-2011-fixes and issue323
from the master repository.

@Jendrik: They do not consume space, the size log is not up to date to the
exclude files. Here is an update

rev   24:   2400  Renamed 'downward' directory to 'src'.
rev 5469:    116  Move small heuristics to subdirectory
rev  785:     96  added first testcases
rev  347:     92  Merged merge-and-shrink implementation by Raz
rev  754:     92  Moved merge-and-shrink stuff to a subdirectory.
rev 5386:     92  Move files according to new class names.
rev 4846:     84  Move driver out of src
rev    0:     80  moved everything to trunk
rev 1881:     80  Moved auto_doc script to misc/autodoc.
rev 5650:     76  Move utility files to subdirectories (no code chan...
rev  787:     72  added border cases
rev  964:     72  Added profiling information from comparison of ver...
rev 1142:     72  Moved PDB code to subdirectory.
rev 7846:     72  Replace merge dfp by a set of new classes.
msg9129 (view) Author: jendrik Date: 2019-12-19.23:57:15
@Florian: I think the branches you mention are merged into the default branch. So I guess the question is whether we want to rename them to the issueXXX format or not.

I'm also in favor of moving the ipc-2011-fixes branch out of the master repo  and propose to do the same for the issue323 branch.

Why does the gtest commit ("Weave gtest into codebase") still consume so much space even after we exclude the src/search/ext/gtest directory?
msg9123 (view) Author: florian Date: 2019-12-19.16:41:13
I don't mind much either way but the branches
  emil-new-integration
  hcea-cleanup
  issue133new
  issue329test
  raz-ipc-integration
all sound like things we planned to do but then gave up on or abandoned at some
time. If this is the case I would prefer if we could decide if we still want to
do the integration or not. In the first case, I would create an issue for it,
rebase the branches to that issue, push the issue branches to a private
repository and delete them from the main repository (e.g., deal with them in the
same way we deal with all other incomplete issue branches). In the second case,
I would delete them. But like I said, I don't mind if we decide to keep the
branches around.
msg9121 (view) Author: silvan Date: 2019-12-19.16:38:07
I would vote for removing the ipc-2011-fixes branch because I don't think we necessarily need to have IPC planners as part of the official repository.
msg9120 (view) Author: patfer Date: 2019-12-19.16:32:39
After sprint summary:
We have a script version for the clean which
- removes large chunks of files we do not want anymore are excluded
- corrects authors in commits
- fixes typos in branch names and merges branches issue133 and issue133new

If someone wants to get rid/clean up some of the remaining branches which are
not in line with our naming convention, please speak up.
If someone wants something else fixed, please speak up.
msg9096 (view) Author: patfer Date: 2019-12-12.15:21:03
Thank you Malte for clarification. Typos in branch names. I will fix them. That
is easy to do.
msg9095 (view) Author: malte Date: 2019-12-12.15:12:02
> The script does not rework on typos (Malte wanted to correct some messages).

No, we didn't want to change commit messages. I am confused where this idea came up. I think this makes no sense, there will always be errors in such messages.

What we discussed to clean up are:

- files we don't want to have any more; this includes unnecessary massive renames like the one that is now the largest commit by far
- the author names that you mention
- the misspelled branch names

The clearly misspelled branch names are issu139, issue-114, issue-149, issue-289. I am still in favour of fixing these. Besides these, there are a few other branch names that don't follow any special convention (emil-new-integration, hcea-cleanup, issue133new, issue329test, raz-ipc-integration), and a few that follow a largely non-existing convention (ijcai-2011, ipc-2011-fixes). They don't bother me, but I mention them in case they bother someone else. You can see the list of all branch names like this:

$ hg branches --closed | sort | less

and filter out the ones that follow our issueXYZ convention like this:

$ hg branches --closed | grep -vE '^issue[1-9][0-9]{0,2} ' | sort

Not all of these are wrong branch names. For example, "default" and "release-19.06" are fine.
msg9094 (view) Author: patfer Date: 2019-12-12.14:04:20
Here is a PR and yes we can do comments.
https://github.com/aibasel/convert-downward/pull/1/files.
msg9093 (view) Author: silvan Date: 2019-12-12.14:00:37
I would like to leave a few comments on github; do you know how I can do so? Do we need a pull-request first?

Other than that, I think the script does what it should do. I don't see added value in correcting typos even if this could be done easily.
msg9092 (view) Author: patfer Date: 2019-12-12.12:54:56
A first script for rewriting history does:
1. fix the author names
2. removes files which should never have been added or which are large and
unused for too long.

The script does not rework on typos (Malte wanted to correct some messages). For
this, we would need to write our own mercurial extensions which I started on my
machine, but directly got issues (said something about mercurial version). I
think we should not do this. Everyone who has Mercurial can now simply run the
script. If we write our own extension, everyone has to add this to his/her
configuration and might need to fix version problems.

The script can be inspected on:
https://github.com/aibasel/convert-downward/tree/2019-12-Cleanup

The repo size decreases from 103 MB -> 28 MB

The most space is now used up by the history of:
1.4 MB Merge and shrink
1 MB Cegar

And the most space consuming commits are:

rev   24:   2400  Renamed 'downward' directory to 'src'.
rev 5844:    344  weave gtest into codebase
rev 5469:    116  Move small heuristics to subdirectory
rev  785:     96  added first testcases
rev  347:     92  Merged merge-and-shrink implementation by Raz
rev  754:     92  Moved merge-and-shrink stuff to a subdirectory.
rev 4846:     88  Move driver out of src
rev 5386:     88  Move files according to new class names.
rev    0:     80  moved everything to trunk
msg5326 (view) Author: malte Date: 2016-05-10.16:12:09
I think keeping a read-only clone of the old repository is a good idea, and it
only takes a few minutes to set that up. (I suggest we host it on bitbucket.)
Let's do that at the time that perform the transition.
msg5325 (view) Author: atorralba Date: 2016-05-10.16:10:08
A relevant question is how to keep our repositories synchronized with the main 
FD repository after the history has been removed.

As suggested by Malte, it would be useful to have a script that removes the 
history from a repository to make it compatible with the new version. Also, 
does it make sense to keep a copy of the old repository (just before removing 
the history) so that people can update to that one before removing the history?
msg5301 (view) Author: malte Date: 2016-05-03.14:08:10
Gabi suggested removing the preprocessor before we rewrite the history. (This
might get done this week.) I've added her to the nosy list.
msg5300 (view) Author: jendrik Date: 2016-05-03.13:53:06
Sure, here they are (I'll attach the mentioned scripts to this issue):

===================================================================

Hi everyone,

we've discussed the idea of separating the benchmarks from the rest of
the Fast Downward repository a few times. One advantage of this would be
that we could make operations like cloning the repository faster
(especially over the network) and reduce space usage significantly. This
might also speed up certain operations where repository operations
happen under the hood (e.g. lab experiments, buildbot stuff).

I suggest that we defer the discussion of whether we want to perform
such a separation until a time when we're all back from holidays and
don't have so many urgent deadlines, but since I have a bit of time
right now, I already gathered some data.

Some basic size information:

$ hg clone ~/downward/master/ master-clone -U && du -sk master-clone
71076   master-clone
=> This is the pure repository size in KiB *without* a working
directory: around 70 MiB.

$ hg update -R master-clone && du -sk master-clone
4636 files updated, 0 files merged, 0 files removed, 0 files unresolved
358416  master-clone
=> This is repository size *with* a working directory: around 351 MiB.

$ (cd master-clone/src && ./build_all -j4) && du -sk master-clone
[...]
736948  master-clone
=> This is repository size with a working directory after the code has
been compiled (release mode only): around 720 MiB.

So, in summary:
- repository only, no working directory: 70 MiB
- with a clean working directory: 351 MiB
- after compiling for release mode: 720 MiB
(This is without the USE_LP option.)

I think this is already some good news regarding the actual repository
size: the complete repository with several thousand revisions in it
takes only 70 MiB, while a checkout of a single revision adds roughly
281 MiB. This shows that Mercurial compresses the revisions well.

Of the 281 MiB added by the working directory, the main culprits are:
    263M    benchmarks
    15M     src

and drilling a bit more deeply into src, we have:
    1.3M    src/search/ext/boost
    1.8M    src/VAL
    8.1M    src/search/lp
so the bulk of the 15 MiB is due to our external dependencies.

So benchmarks make up more than 90% of the working directory, and
external dependencies make up more than half of what remains.

It's also worth looking why space usage doubles after compilation. Our
compilation artifacts use up around 370 MiB of memory. This is split as
follows:

    184M    src/search/.obj
    52M     src/VAL/*.o
    30M     src/search/downward-1
    30M     src/search/downward-2
    30M     src/search/downward-4
    19M     src/VAL/validate
    19M     src/validate
    4.7M    src/preprocess/.obj
    3.7M    src/preprocess/preprocess
    444K    src/search/Makefile.depend
    88K     src/VAL/lex.yy.cc
    4K      src/preprocess/Makefile.depend

(We might add another 228K for the translator's bytecode. While
build_all doesn't byte-compile, these will be present if we run the
planner.)

In cases where we are just interested in the output of compilation, but
not in the intermediate build artifacts, such as in lab's revision
cache, the .obj directories and everything in the VAL subdirectory is a
dead weight. By running "make clean" in the preprocess and search
directories and "make distclean" in the VAL directory, we can eliminate
roughly 260 MiB, so more than two thirds, of the space usage. This would
leave only:

    30M     src/search/downward-1
    30M     src/search/downward-2
    30M     src/search/downward-4
    19M     src/validate
    3.7M    src/preprocess/preprocess

If we don't plan to run a debugger, we can again reduce these much
further by stripping the executables.

$ strip src/search/downward-{1,2,4} src/validate src/preprocess/preprocess

After which we only have:
    3.0M    src/validate
    2.7M    src/search/downward-1
    2.7M    src/search/downward-2
    2.7M    src/search/downward-4
    1.2M    src/preprocess/preprocess

So another reduction by almost 90%.

If I understand the various debug facilities correctly, stripping would
have no negative effect unless we start using something like gdb, so I
think this is something else we ought to be doing in lab experiments
(or, alternatively, change the compile flags in this case so that we get
stripped executables from the start).

I also had a look at what happens to executable sizes if we link
dynamically instead of statically. We get the following file sizes
before stripping:
    29M     src/search/downward-1
    29M     src/search/downward-2
    29M     src/search/downward-4
    18M     src/validate
    2.5M    src/preprocess/preprocess

and these file sizes after stripping:
    1.7M    src/validate
    1.6M    src/search/downward-1
    1.6M    src/search/downward-2
    1.6M    src/search/downward-4
    176K    src/preprocess/preprocess

So the difference percentage-wise is small if we don't strip the
executables, but quite substantial (close to another 50% reduction) if
we do.

Cheers,
Malte

===================================================================

On 12.03.2014 16:46, Florian Pommerening wrote:
> Hi Malte,
>
> I just wanted to add that the size for build artifacts and executables
> will be reduced to roughly a third once we merge issue214.
>
> Cheers
> Florian

Yes, I had that in the back of my mind and am looking forward to it. :-)

We currently have something like:

  370 MiB  total build artifacts
  112 MiB  after "make clean" for our code; "make distclean" for VAL
           (keeping the copy of the VAL executable in ~/src)
   11 MiB  after stripping

After issue214, we should have roughly:

  147 MiB  total build artifacts
   53 MiB  after "make clean" for our code; "make distclean" for VAL
    7 MiB  after stripping

Cheers,
Malte


===================================================================

Hi again,

the previous message was more of a detour from the original plan.

What I actually wanted to do is measure which things contributed to the
actual repository size (not the working directory), since this is what
limits the speed of clone operations over the network etc.

So this email is only about the 70 MiB pure repository size. I wrote a
script (attached) which measures which commits to the repository added
how much to the overall size.

The following list shows which changesets contributed 100K or more to
the overall repository size. The middle column shows how much the
repository grew with the given changeset (in KiB); the third column
shows the start of the log message for that changeset.

rev    1:  14480  moved everything to trunk
rev  653:  10168  Renamed 'downward' directory to 'src'.
rev  392:   7192  Added lp solver and change Makefile to do static l...
rev 2774:   5560  add ipc 2011 domains
rev  106:   4148  trunk benchmarks:
rev  652:   3800  Added IPC 2008 domains (apart from the humungous c...
rev  103:   2380  Added openstacks-strips to trunk benchmarks.
rev 1472:   1176  Added Michael's implicit abstractions code.
rev   18:   1020  benchmarks: added propositional IPC5 domains (rove...
rev 2230:   1008  created experiments directory and added first expe...
rev 1566:    816  added boost dependencies
rev 2391:    668  added ga-experiments with cost partitioning
rev 2393:    664  added ga-experiments without relevant vars detection
rev   55:    624  Renamed "val" directory to "VAL". Firstly, because...
rev   51:    620  Added val source distribution to repository.
rev 1447:    616  Benchmarks from the 2009 ASP competition
rev 2327:    596  Removed some old experiemnts, added new ones.
rev 2370:    552  Added results of gapbd experiments.
rev   19:    508  benchmarks: added airport-adl
rev 2368:    396  added pdb experiments v4 and v4x
rev   98:    292  pathways-noneg:
rev  122:    288  Moved scripts for merge-and-shrink, hcea and lmcut...
rev 2361:    284  Added results of two ipdb experiments.
rev   20:    280  benchmarks: added decoded variants of "mystery" an...
rev 2483:    264  added gapdb-experiment for compairison with ipdb-r...
rev  118:    248  Recreated trunk from everything.
rev 2243:    224  experiments
rev 2348:    216  Added pdb experiment version 3, MatchTree is worki...
rev 2283:    200  experiment pdb v2 results added
rev 2718:    180  Hopefully fixed issue295.
rev 2293:    176  added experiment pdb v1 withouth search
rev 2337:    172  replaced old experiment by new one
rev 2359:    172  added pdb experiment version 3, with the introduct...
rev 2462:    164  added gapdb-comparison experiments
rev 2282:    148  Added experiment v5 und v6.
rev 2235:    144  Added experiments.
rev  204:    132  Added selective-max.
rev 2339:    132  Added results of an experiment.
rev 2388:    132  new hhh-experiments (for comparison with old hsp_f...
rev 2474:    132  added more gapdb configuration test experiments
rev 1449:    124  Recommit of ASP's Airport and Sokoban
rev 2087:    124  started
rev 2319:    124  Added new experiment results.
rev 2375:    120  Added results of HHH_ipdb experiments.
rev 2456:    120  Added results of pdbs experiment.
rev 2489:    120  Added results of ipdb experiment.
rev 2445:    116  Added results of pdbs experiment.
rev  743:    112  VAL: Updated to most recent version (4.2.07 plus some
rev 2463:    112  Added results of hhh experiment.
rev 2488:    112  Added results of pdbs experiment.
rev 3613:    108  Removed scripts made redundant by updated buildbot...
rev 2112:    100  added first testcases

The first commit was the largest one and is hard to interpret because it
included a mixture of code, benchmarks, etc. I hope that it would have
been much smaller without the benchmarks.

The second-largest one is surprising to me since I would have thought
that Mercurial would handle such a renaming more gracefully. Maybe it
has something to do with the fact that we originally renamed in
Subversion, and maybe there is something that can be done about this
space usage.

Number 3 is about the LP solver, which is a large chunk of code and
hence not very surprising.

The remaining ones are mostly about adding benchmarks and some things
that probably shouldn't be part of our history. Michael's
implicit-abstractions code was never merged, but apparently uses a large
chunk of space. Presumably we added it at some point and then deleted it
again.

Also, there are many commits about experiments related to the iPDB
implementation. When we merged iPDB, in retrospect we should have been
careful only to include the code, but not all the experiment data. (Note
that it's the *data* for the experiments that uses so much space here.
We now add the experiments *scripts*, but that's orders of magnitude
smaller.)

So long story short, there seems to be scope for cleanup here if we ever
want to go to that trouble.

Cheers,
Malte

===================================================================

On 12.03.2014 17:04, Malte Helmert wrote:

> The second-largest one is surprising to me since I would have thought
> that Mercurial would handle such a renaming more gracefully. Maybe it
> has something to do with the fact that we originally renamed in
> Subversion, and maybe there is something that can be done about this
> space usage.

OK, I investigated this a bit more closely by performing some
experiments and learning a bit about Mercurial's repository layout.

The size increase due to renaming is indeed by design: if a file changes
its name, hg will store it twice in the repository. The reason we added
10 MiB (around 1/7 of the overall repository size!) with this single
renaming from "downward" to "src" is because the directory contains some
large files (such as the LP solver tarball) that end up being stored twice.

One nice thing about Mercurial's repository layout is that it's quite
easy to see how much space is taken up by what by simply running "baobab
.hg" and exploring. This also shows how much garbage the repository has
accumulated over the years. :-) I think if we removed the benchmarks and
external dependencies and cleaned up some cruft (scripts, new-scripts
and code that ended up in the repository by historical accidents, such
as experimental results and the implicit-abstraction code), we should be
able to cut things down from 70 MiB to 7-8 MiB. I guess it's a
discussion for another time whether this would be a good or bad idea.

Cheers,
Malte

===================================================================
msg5297 (view) Author: malte Date: 2016-05-03.13:43:35
I wrote some old emails about this long ago. They included discussion of some
nonobvious space wasters, I think. Do you still have this email? If yes, can you
paste them here? If not, I can also try to find them.
msg5296 (view) Author: jendrik Date: 2016-05-03.13:42:03
Now that the benchmarks are gone from the repo (issue581) and VAL is soon to be 
removed (issue651), we can think about making the repository smaller by rewriting 
its history. This should make the repository about 10x smaller and thus speedup 
cloning.
History
Date User Action Args
2020-08-06 14:22:50patfersetstatus: reviewing -> resolved
2020-08-06 13:45:56maltesetmessages: + msg9711
2020-08-06 13:22:08patfersetmessages: + msg9710
2020-08-06 13:16:57floriansetmessages: + msg9709
2020-08-06 13:06:06maltesetmessages: + msg9708
2020-07-29 09:40:59maltesetmessages: + msg9697
2020-07-29 09:27:36patfersetmessages: + msg9696
2020-07-28 19:37:47maltesetmessages: + msg9687
2020-07-28 19:11:55maltesetmessages: + msg9686
2020-07-28 15:46:34patfersetmessages: + msg9684
2020-07-28 14:20:29floriansetmessages: + msg9683
2020-07-27 22:07:05silvansetmessages: + msg9682
2020-07-27 18:57:57maltesetmessages: + msg9680
2020-07-27 17:23:18patfersetmessages: + msg9679
2020-07-27 17:22:05maltesetmessages: + msg9678
2020-07-23 12:29:10patfersetmessages: + msg9669
2020-07-20 23:11:16floriansetmessages: + msg9661
2020-07-20 15:08:32maltesetmessages: + msg9660
2020-07-20 13:32:04maltesetmessages: + msg9659
2020-07-20 10:37:57patfersetmessages: + msg9657
2020-07-18 21:49:11floriansetmessages: + msg9656
2020-07-18 15:57:39maltesetmessages: + msg9655
2020-07-18 15:51:55floriansetmessages: + msg9654
2020-07-18 15:35:11maltesetmessages: + msg9653
2020-07-18 15:30:24maltesetmessages: + msg9652
2020-07-17 23:10:52floriansetmessages: + msg9651
2020-07-17 13:58:30patfersetmessages: + msg9646
2020-07-16 17:07:25maltesetstatus: resolved -> reviewing
messages: + msg9640
2020-07-16 13:27:58patfersetmessages: + msg9639
2020-07-16 12:39:05maltesetmessages: + msg9635
2020-07-09 14:52:50maltesetmessages: + msg9563
2020-07-09 11:21:32patfersetstatus: in-progress -> resolved
messages: + msg9547
2020-07-08 12:45:05maltesetmessages: + msg9536
2020-07-08 10:52:58silvansetmessages: + msg9534
2020-07-08 10:50:26maltesetmessages: + msg9533
2020-07-08 10:42:13silvansetmessages: + msg9532
2020-07-07 22:50:18maltesetmessages: + msg9531
2020-07-07 17:18:24maltesetmessages: + msg9530
2020-07-07 12:15:21maltesetmessages: + msg9519
2020-07-07 05:00:21floriansetmessages: + msg9511
2020-07-07 01:51:14maltesetmessages: + msg9510
2020-07-07 00:36:09maltesetmessages: + msg9509
2020-07-06 13:37:16patfersetmessages: + msg9479
2020-07-06 12:49:16patfersetmessages: + msg9478
2020-07-03 19:39:06maltesetmessages: + msg9472
2020-07-03 19:33:07jendriksetmessages: + msg9471
2020-07-03 19:32:50floriansetmessages: + msg9470
2020-07-03 19:29:20floriansetmessages: + msg9469
2020-07-03 19:20:39maltesetmessages: + msg9468
2020-07-03 19:11:29maltesetmessages: + msg9467
2020-07-03 19:00:16floriansetmessages: + msg9466
2020-07-03 18:29:35maltesetmessages: + msg9465
2020-07-03 18:23:11maltesetmessages: + msg9464
2020-07-03 18:14:30floriansetmessages: + msg9463
2020-07-03 18:09:06maltesetmessages: + msg9462
2020-07-03 18:07:42floriansetmessages: + msg9461
2020-07-03 18:03:49maltesetmessages: + msg9460
2020-07-03 17:57:53maltesetmessages: + msg9459
2020-07-03 17:41:02floriansetmessages: + msg9458
2020-07-03 17:19:13maltesetmessages: + msg9456
2020-07-03 17:11:25floriansetmessages: + msg9455
2020-07-03 17:09:26maltesetmessages: + msg9454
2020-07-03 17:04:08maltesetmessages: + msg9453
2020-07-03 16:59:43floriansetmessages: + msg9452
2020-07-03 16:48:21maltesetmessages: + msg9451
2020-07-03 16:41:29floriansetmessages: + msg9450
2020-07-03 16:40:46maltesetmessages: + msg9449
2020-07-03 16:35:01floriansetmessages: + msg9448
2020-07-03 16:34:36maltesetmessages: + msg9447
2020-07-03 16:21:55maltesetmessages: + msg9446
2020-07-03 16:21:13maltesetmessages: + msg9445
2020-07-03 16:18:24floriansetmessages: + msg9444
2020-07-03 11:41:50patfersetmessages: + msg9430
2020-07-01 20:26:21maltesetmessages: + msg9391
2020-07-01 19:51:16jendriksetmessages: + msg9389
2020-07-01 19:44:56maltesetmessages: + msg9388
2020-07-01 09:39:48floriansetmessages: + msg9366
2020-06-30 19:36:31maltesetmessages: + msg9365
2020-06-30 18:13:54jendriksetmessages: + msg9363
2020-06-30 17:03:34floriansetmessages: + msg9361
2020-01-10 16:14:52patfersetmessages: + msg9143
2020-01-08 22:53:22jendriksetmessages: + msg9142
2020-01-08 22:49:35silvansetmessages: + msg9141
2020-01-08 19:17:54maltesetmessages: + msg9140
2020-01-07 19:30:29jendriksetmessages: + msg9139
2019-12-24 14:06:44maltesetmessages: + msg9135
2019-12-20 10:19:31patfersetfiles: + repo_stats_20_12_20.txt
messages: + msg9130
2019-12-19 23:57:15jendriksetmessages: + msg9129
2019-12-19 16:41:13floriansetmessages: + msg9123
2019-12-19 16:38:07silvansetmessages: + msg9121
2019-12-19 16:32:39patfersetmessages: + msg9120
2019-12-12 15:21:03patfersetmessages: + msg9096
2019-12-12 15:12:02maltesetmessages: + msg9095
2019-12-12 14:04:20patfersetmessages: + msg9094
2019-12-12 14:00:37silvansetmessages: + msg9093
2019-12-12 12:54:56patfersetmessages: + msg9092
2019-12-11 11:12:20patfersetstatus: chatting -> in-progress
nosy: + patfer
assignedto: patfer
2016-05-10 16:12:09maltesetmessages: + msg5326
2016-05-10 16:10:08atorralbasetnosy: + atorralba
messages: + msg5325
2016-05-03 14:08:10maltesetnosy: + gabi
messages: + msg5301
2016-05-03 13:58:59silvansetnosy: + silvan
2016-05-03 13:54:17jendriksetfiles: + README
2016-05-03 13:54:09jendriksetfiles: + generate_repo_stats.sh
2016-05-03 13:53:59jendriksetfiles: + analyze_repo_stats.py
2016-05-03 13:53:06jendriksetmessages: + msg5300
2016-05-03 13:50:47floriansetnosy: + florian
2016-05-03 13:43:35maltesetstatus: unread -> chatting
messages: + msg5297
2016-05-03 13:42:03jendrikcreate