.
Routing Working Group
.
RIPE 83
.
23 November 2021
.
1 p.m. (UTC + 1)
JOB SNIJDERS: I think the clock chimes truth at it being noon UTC, aka the start of the Routing Working Group.
Welcome, everyone. This is the RIPE 83 Routing Working Group Session. The Routing Working Group concerns themselves with all aspects of IP routing technology. In this slot, we have four presentations, all very exciting contents. But let's first pull up the slides with our administrivia.
If you have a question for the presenter, please use the Q&A function. Otherwise, if you want to comment on the content as it streams past on your screen, use the chat. Please be polite and respectful to each other.
All questions will be read out loud so that the scribes can also catch them ‑ Paul will be doing that.
Our first presentation is from Pavlos Sermpezis. I fear I did not pronounce that name correctly. Pavlos will be sharing a perspective on estimation of the impact of BGP hijacks. Pavlos, welcome and I look forward to your presentation.
PAVLOS SERMPEZIS: Thank you for welcoming me, and you pronounced quite well my last name. So, let me share my screen, and... hello everybody. I am a researcher at the DataLab at the Aristotle University of Thessaloniki in Greece, and today I'll talk to you about a recent work we did about how to estimate the impact of an ongoing hijack, and this presentation is based on very simple research paper where we can see at the bottom of the slide, and how to estimate this hijack with measurements and what are the problems that happen when these measurements are from the public Internet measurement infrastructure like RIPE RIS or routviews and what we can do to avoid these problems that appear.
So I suppose that all of you know what these BGP and, let's say that we have a network that we operate, that's on the bottom here, and we announce our IP prefixes and everyone in the Internet, all the ASes here, send the routing traffic to us. They become blue because that's the colour of our network. Then a malicious network comes, it does some kind of hijack. And it seems to attract traffic from some networks. We do denote these networks with black colour here.
So what do people do in order to defend hijack is, at first, they detect them and how this is done it's using public monitoring infrastructure like RIPE RIS or routviews or whatever we can see, around the BGP paths that are propagated. And there has been a lot of research and tools that can see a realtime detection, so it's a bit self‑advertising because that was done by my previous group. But when you detect a hijack what do you do next? Then you have to proceed to mitigation. And in terms of mitigation, there is not only only a single action you can take. You can do some filtering or call some people or do the deaggregation by yourself or ask someone else to announce your prefixes or pay someone else to blackhole etc. And since this different actions were different costs, there is a decision you have to make there: Which action should I take? And let's say you chose an action, was this enough? Is the hijack now mitigated? How do you know and should you do something more?
.
Now, in order to answer this question, we propose that someone needs to know what's the impact of the hijack? What's the impact of the hijack before the mitigation, so he can tell what is a good cost efficiency trade‑off for different mitigation measures, and after the mitigation, so we can see what are the effects after the mitigation.
So, this is the problem we investigate to estimate the outcome of a hijack. In terms of feedback, you have divide it in different ways. A very simple definition, we start from this, is that the impact is the number of ASes that are infected by the hijack. So, for example, here the impact would be 3 because there are 3 ASes that are infected by the hijack.
So, in this research, what we did was we tried to estimate the impact of an ongoing hijack through measurements, and to find if the estimations we can do with public Internet monitoring infrastructure, like RIPE route collectors, are accurate. And second, if they are not accurate, how we can make them more accurate. And, in fact, our best knowledge, this topic has not been studied before, or we are not aware of any tools of trying to do another estimation of hijack impact.
So let's see an example.
We have the same example as before, and the actual impact of the hijack, it's 3 over the entire population of ASes, so in this case it would be 3, because there are 3 black ASes, over 8, because there are 8 ASes in total. However, if we have monitors like peers that provide feeds to route collectors in these 3 ASes, because we don't have everything with monitors, the impact that we would estimate would be 1 over 3. And as we can see, the actual value, the estimated value they have a difference and this is the estimation error. So, our first objective goes to calculate ‑‑ to quantify how much is this error in general. So what we did, we did different experiments, and here in this globe you can see the X axis is the number of monitors that we use and the Y axis is the error. The higher, the worse. So as we can see with public infrastructure, the continuous lines, the error never drops less than 9%, 9 or 10%, depending on the infrastructure you are using. However, if we could do random sampling and measure any AS we would like, we would see a much lower error. So why there is this gap? The answer is because there is a location bias in the locations of the public infrastructure, and this location can be geographic location or network size location.
.
So the problem here is that while we would like to do to attack this error, we can only attack this error with public infrastructure. The next question is how we can improve this, and in order to improve this, we try two different approaches.
.
In the first approach we tried to use our own measurements Internet infrastructure and, in particular, we are preparing to do ping measurements, and why ping? Because we can ping any AS we want, and we can do ‑‑ and in this way we can do random sampling, which, as we saw before, it is the best performance. However, in this case, we should be careful of ping failures. So, in this case, in this approach, we proposed the methodology that we also did next.
In the second approach, it's to still use public infrastructure measurements, but try to identify the correlations and the bias and remove them.
So in the first approach, our methodology that we tried is quite simple. So the first type is to find pingable IP addresses for every AS, which means IP addresses that answer to pings, and we got them from the IP hit list, then we ping multiple IP addresses per AS, more redundancy, and at least of these ping replies, of these pings replied to us, then we infer that the AS is not affected by the hijack, which is an assumption, but it's the first step.
So, what we have seen, we have seen that, doing this thing, we cannot take the best in theory performance because we actually do random sampling. And, in fact, some things we have found our results is that we don't have to ping a lot of addresses, at least two addresses for each AS we measure is enough, and we only have to measure a few hundred of ASes, we don't have to measure everything for a good estimation.
Now, in the second approach, we use the public measurement infrastructure, plus machine learning in order to remove this bias. So what we do is we collect measurements from hijacks, we train a model and when a new hijack comes we collect measurements and estimate the hijack.
So, the results are shown in this graph. With the continuous slide we can see the error with public infrastructure, the one I showed you before, and with the continuous line with circles, you can see the improved performance when using this machine learning mode. The finding here is that this model, or some improved versions of this, could eliminate the bias, and we have shown that we don't need a lot of past data to train these models.
So, summarising:
.
What we have seen in our report is that estimations of hijack impact from public monitoring infrastructure are biased ‑ they have 10% error. And what we can do for this is, at first use pings from your own network, a few hundred of pings could give you a 2% error, or reducing infrastructure or smart removal of the bias with machine learning.
In fact, this second thread, it's part of our ongoing work on our AI 4NetMon project, which was funded also by the RIPE NCC and whose goal is to reduce bias infrastructure for different use cases, not only for hijack and the impact estimation. And what we would like from you is to give us feedback, in fact to tell us how useful do you find the proposal? Of course I show you here a high‑level version. You find more information in the paper or just contact us.
And second, to tell us, to propose us other important use cases apart from hijacking where you would like to study the bias, what's important for you. And in terms of this, we have prepared a survey as part of our project, so you can ‑‑ it's a ten‑minute survey, something like this, where you can provide us feedback.
And, of course, you can contact us, and the relevant starts on Thursday, it's the RIS Jam, it's about the RIS infrastructure what you can do and not do. And with this I would like to thank you for listening to this talk.
JOB SNIJDERS: Thank you so much for this wonderful presentation. I see somebody standing in the microphone queue. Randy?
RANDY BUSH: So, can you tell ‑‑ first of all, thank you for the presentation. Two good approaches and quite interesting. Do you know ‑‑ have you learned anything about the topology of the "blast radius"; in other words, depending upon where the attack is located, what shape within the topology, and, you know, does it get past tier 1s, you know, questions like that, are you starting to look at that? It would be very interesting to know. Thank you.
PAVLOS SERMPEZIS: Thanks a lot. I hear some echo. So, that's a very interesting comment with deep ‑‑ we could dive deeper on this, we have seen +plots on the paper, we have seen some attacks from Europe to Latin America have different estimation error than Europe to Europe, and stuff like this, which is more or less it's expected. And some initial results we tried to do, which monitors even more unbiased view, if we are on tier 1, ISPs are not, but in the paper we have really preliminary results and these are some things, some topics we investigate in our current project. But in any case, thanks for the feedback and I think that's interesting to investigate more. I don't have more insights, or useful insights to tell you now about this.
JOB SNIJDERS: Let's allow another few seconds to see if more people have questions or feedback.
PAUL HOOGSTEDER: No questions.
JOB SNIJDERS: All right. Pavlos, thank you so much for sharing your insights.
PAVLOS SERMPEZIS: Certainly. Thank you for inviting me. Bye‑bye.
JOB SNIJDERS: Next up we have Ties de Kock from an organisation called RIPE NCC, who will be sharing with us an update on RPKI and other related infrastructure. The floor is yours.
TIES DE KOCK: I really want to share my screen and I think that this window is the one up guys want. Okay. Hi. My name is Ties de Kock. I'm from the RIPE NCC and I'll give an update on our work for this last period. I'll start with a very short update about the items on my roadmap that we worked on.
First of all, did the review of the certification authority software, and the penetration test of the systems and for this test the relevant parts of portal, RIPE access, the IP dashboards and other endpoints were in scope. We aimed to publish the sort code review and the report by the end of December and we are preparing to open source the CA code. We also did some work that should be completely invisible to you all; namely, we are migrating the offline trust anchor key so the key which the public key is in the something to a new HSMM early December. For this transfer, three out of five admin cardholders for the old HSSM and all five admin cards of the new HSSM need to be present in person, so you can imagine this was kind of hard to set up.
Finally, over the next few sheets I'll give you details of our work on the RPKI repositories and an overview of our monitoring and alerting setup as I think we promised on the Routing Working Group.
So, let me recap on how we used to deploy rrdp.ripe.net. Rrdp.ripe.net was deployed when RPKI option was still very low in early 2016. This was first seen in certificates on the 12th January 2016. I looked it up yesterday, to be honest. And it was a single instance deployed an AWS elastic beanstalk behind the CDM. In parallel to publishing to RRDP, the CS software publishes to rsync which runs in two data centres behind the load balancer. By now, the majority of the relying party instances start to use RRDP. And while the publication server was updated over time, so the software itself. The deployment was never revisited. Meanwhile, RPI users grew quite a bit with monthly traffic being 26 gigabytes in January 2018, 2.5 terabytes a month in January 2020 and 53 terabytes last month. So that's a serious growth, I would say.
.
Over the last period, we revisited how publication works in RCA software and one of the things you may have heard about was that we changed how we write our sync repositories. We also added support for publishing to multiple publications servers which need to have the same external URL because that URL in the certificates at the same time. And, of course, we also monitor that the same files are present in the CA software in the various rsync servers and in the various publication servers.
We improved the publication server software and deployed it in the RIPE NCC network. This is a direct result of the concerns about RIPE NCC using Cloud technologies. We deployed the publication server ‑‑ two instances of the publication server in two data centres behind the load balancers, and we made them as independent as possible so they have separate database servers, and different session IDs, and, because the session IDs are different, only one can be active at a time. Finally, the instance in AWS is still available as a warm standby and this is our new setup for the rrdp.ripe.net.
Now, let's move on to monitoring and alerting.
.
As you may know, RPKI is a critical service and we have an engineer on‑call for it 24/7. We informed four priorities for the alerts. High priority alerts are sent to the engineer on‑call. All the alerts go to various chat channels and to e‑mail. We also check that alert delivery actually works and we have additional alerts in place if the alert delivery is failing. So, we are monitoring the monitoring, I guess. We have set up quite a large number of others over time and we are continuously tuning them like almost daily to improve the signal to noise ratio. And we even have some rules that even though they are never triggered on historic data, they are based on thresholds that we see in the numbers. That's how we caught, for example, a recent FEN failure in one of our HSSMs where the FEN RPM was dropping and at some point it triggered the alert.
.
I'll now show you what the alerts actually look like to us and I have to stress that both of these alerts are not production incidents. The alert on the left is from our testing environment, when there was an expired manifest. What you see here is that we have an RPKI client repeatedly with a wrapper, we screen‑scraped the output. That creates metrics and, for those metrics, you have defined alerts. And the alert ended up in chat. On the right, you see an alert that was delivered by SMS to a phone. When the number of objects dropped below 95% of the 24 hour maximum, and in this case it was a software glitch.
.
Where do we get the metrics from. The areas we monitor a certification authority software. The content that's in the repositories and the results that the relying party software gets. We monitor our staging and production environment and monitoring is part of our quality assurance process. We have added metrics in our applications and we have external monitoring tools. One example of this are the metrics from various RPKI validators such as the one I described and we also use monitoring specific tools and here we build for example a project that we called RPKI monitoring that compares the content of the repositories so that something of the fun between various rsync servers and various RRDP servers and the CA software itself. We also created metrics from our smoke and end‑to‑end tests and for that ‑‑ and with that with he make sure that important flows for our users such as logging in and having a working RPKI dashboards or that when you create a ROA it actually shows up in the validator are working.
I also want to do a quick callout here to a tool that maybe useful for network operators, which is the RTR's RTRNUM. You can use this to check that the serial of a RTR session is increasing and that the RTR session is in sync with the JSON from a validator but you can also compare to JSON end points and we use this to check that VRPs end up in validators after some time. And this is one of the alerts you will see on the next sheet, where I will show you the alerts that happen in the hypothetical situation where your CA software stops working.
On the sheet from left to right you see the timeline of a part of the alerts that would fire if our CA software just stops working. The earliest alert that's likely to fire is a really simple one which checks the number of errors messages that were logged over time, and this alert has a very good signal to a noise ratio in practice in our test environment and when something is wrong it fires almost immediately. The second alert after about ten minutes in this diagram, is a check that our publication process is running, and this publication process is one of the best jobs that are basically the heartbeat of our CA system. A bit later the alert that VRPs do not show up in a validator, based on RTR (something), and after been hour because we run it every hour, the end‑to‑end test that checks that VRPs that are created through API end up in a validator file.
Finally, and for this you need to remember that manifests are recreated when they have 16 hours of time left before they expire, about 14 hours after publication stops we detect that objects are about to expire and that alert would fire, and I hope to never see that one in practice to be honest.
This was only part of the alerts that we have in practice. And for example we also have alert that's very similar to the one about objects about be to expired, that checks that there are recent objects that have been created, so in practice we would notice much earlier that the irrigation stops.
I hope these gave an overview of everything that we have set up and now I'll open up the floor for questions. It should be a few minutes left.
JOB SNIJDERS: Thank you so much. This was very insightful and impressive. Are there questions from the audience in relation to ‑‑
PAUL HOOGSTEDER: Not at the moment, no.
JOB SNIJDERS: Well, we can do 60 seconds of uncomfortable silence. I do see in the chat some compliments. Tim Bruijnzeels says: "Nice setup." Peter Hessler says: "Very nice. Thanks." Robert Leichssenring applauds.
PAUL HOOGSTEDER: We have got a question:
"How many engineers are notified when an error occurs? Thanks."
TIES DE KOCK: That's a really good one, and what we have right now is, during office hours, we see all the alerts and we react on them, but outside office hours only one engineer receives these alerts and I'm not sure if we have automated escalation set up right now, but basically only one person is paid to receive the alerts so only one gets them at night. But our experience is that most of the stuff happens during the daytime and around deployments and stuff that we do ourselves.
JOB SNIJDERS: That is probably how things are in most organisations.
TIES DE KOCK: One more bit there. We make sure that when we deploy we do this at the end of the morning so we have a large part of the day left to see what happens and to see if any of metrics change, because if you ‑‑ even if you change like something in the test path validators that we have and one of those alerts starts to fire, you may have a lot of alerts over the weekend if you change something there. And we have quite trigger happy alerting rules in some areas, so we really need to make sure that we tune those in parallel with changing the software.
JOB SNIJDERS: I see somebody queued up for the microphone.
RANDY BUSH: Thank you, it sounds amazingly like a production operation, and so it's really nice to see this mature in this way.
So, can you feed the alerts to an IRC channel so we can watch? Or some subsection of them or ‑‑ you know, some visibility, and perhaps the particular technology was just a suggestion. I can understand how you might feel that's revealing too much, but there is some space in there because a bunch of us care about it and a bunch of large infrastructure services do have automatic status visibility.
TIES DE KOCK: Thank you for the question, Randy. I think that I am getting a status page that's more automatically updated was part ‑‑ is ongoing work right now. And I have heard ‑‑ I then ‑‑ there are two components here: I think we can definitely give more insight in the technology that we use. I kept it out of the sheets, but it's a standard prometious setup and this is more for Nathalie, I think you are basically asking whether we can publish part of our internal metrics on the roadmap or would that help you?
RANDY BUSH: Road maps are for presentations at RIPE meetings. Realtime metrics and status, you know, something in my network is funny, God, it looks like this router has funny roads. Hey, is the RIPE noticing anything in their publication or is it me?
JOB SNIJDERS: Nathalie Trenaman also joins the queue, so, if she wants to comment, now is a good time.
NATHALIE TRENAMAN: Yes, Randy, thanks for the suggestion. I saw you make that suggestion I think a couple of weeks ago already on the list, and we are definitely looking into to what level of transparency we can provide in terms of the monitoring and alerting. We have to see what goes on the public status page that will be for all our services, but on a separate notion, we can definitely look into what we can share.
TIES DE KOCK: And one thing I may want to add here, I think I was quite clear when I showed the example alerts, right now the majority of our alerts are false positives, because we are really trying to be on the side of having more alerts rather than miss something. So, we would need to figure out which ones are relevant.
TORE ANDERSON: Open up the metrics, opening up the metrics sounds really interesting because it also allows others to basically look at historic data that's there. So it's an interesting suggestion, for me at least.
RANDY BUSH: So if I see a dip in what I'm receiving, I can look and say ah, their publishing graph is straight. It's me, I am just trying to diagnose my network which is depending upon your service.
TIES DE KOCK: Yeah, it's what people are using like the ‑‑ I have seen the Wikimedia route is referenced for this recently. I think I can of course use the public validators for RIPE for this. We could be more transparent here. It's an interesting suggestion.
JOB SNIJDERS: I see one question from Rudiger Volk.
PAUL HOOGSTEDER: Rudiger wants to know: "Did I miss the information on the how and which events get publicly reported?"
TIES DE KOCK: I think over the last year, at least since last December, we have publicly disclosed all the events that we really saw which we thought were externally visible, but if you think you notice something, please let us know because there may be something we missed about but we really try to be transparent about what events we see. I don't think there is a written policy in any case in necessary slides about what we can publish and what we can't.
JOB SNIJDERS: Thank you. And with that, we will move on to the next topic, which is certificate transparency and discoverability. I reached out to the fine people at Google, who are absolute masters when it comes to certificate transparency, and pleaded that they should inform our community about what it is, how it works, what the benefits are. And to this effect, Martin Hutchinson indicated willingness to teach us more about certificate transparency. So, Martin, the floor is yours. Thank you so much for being here.
MARTIN HUTCHINSON: Thanks, Job. That was quite a brutal introduction, but thank you. Let's see if I can just share this tab.
.
I am Martin Hutchinson, and with me on the call here is a participant Alcutta as well, we're both from Google. I am from the ‑ fabric team, which is responsible for operating Google's transparency logs. So, we're talking about certificate transparency here in that web PKI sense (dropping voice) we're giving a little overview of certificate transparency as it works today and then a little framework for how people want to take these transparency lessons can design their own similar systems.
So if we remind a little what was the problem? Right, so before certificate transparency, CAs are outright trusted to issue certificates, browsers that get a certificate from a trusted CA were trusted and we all kind of follow along with that, which is fine, but obviously certificates can be mis‑issued, whether that's by accident, whether it's a bug or human error, it could be some sort of social engineering inside threat or caution, or another way it could happen is just the key has been stolen through hacking.
And so, all of that can mean, you know, if a probe has been compromised and a bad certificate has been issued, the owner of this bad certificate and web PKI can really choose to hide that from being discovered for quite a long time, because there is ‑‑ the nature of TLS connections, they show the certificates to one person at a time, and therefore this kind of ‑‑ this lack of transparency in sort of where certificates are shown really makes it possible to kind of hide the certificates and hopefully the attacks that people are getting. This isn't just a theoretical problem. If do you a quick Google around and there is stuff an the security Google blog talking about problems, that happened with CAs. So it's not to say that the trust is outright a bad thing. We want the trust verified.
So, how the certificate transparency solve this, and, you know, it is quite simple in the heart of it, which is all certificates would be in a log and this log ‑‑ or all these logs are publicly visible, which allows any misuse of certificates to come to life. Now, you know, you just ask the CAs to put all the certificates in logs, but the real kind of carrot on the stick is browsers won't trust the certificate unless they can help prove it's in a trusted log and, because of that, the certificates are then compelled to be in these logs, so that they will be useful. Once the certificates are in the public log, then verifiers can come along to inspect, or the board certificates are in the log and identify any bad ones.
The only really verifier ‑‑ there is lots of people that can look for kind of strange patterns maybe in certificate issuance, but the real proper true verifier for any certificate is the domain number for that certificate. Because, you know, you should be able to go and see a transparency log, look through all the certificates in it and the only certificates in your domain should be the ones that you requested and there should be no other ones in there. So that's the kind property we're looking for.
The certificate authority received a certificate‑signing request. The first job they will have is to confirm that the person making these requests is genuinely the owner of the domain, for which the certificate will bind. Then, internally, they will created the certificate which contains a private public key and the domains and then there is a little kind of dance that needs to happen with the transparency logs where there is pre‑cert is sent in the log and then something is sent to prove that the certificate has been logged is sent back to the CA. Once the CA has enough of these artefacts, they can create a final certificate and then pass that along to the domain owner.
So, we will go into a little bit more detail in what are verifiable logs? So it's a Merkle tree, it's a binary tree using cryptographic hashing to make sure that the ‑‑ that any changes to the data would be tamper‑evident. And the logs support cryptographic proof that any certificate is contained within the log, that the view of the log that you are seeing today is consistent, i.e. depend only to any view you previously have of the log and that all parties in the world, you know, verifiers, believers, sort of the other browsers, are all seeing the same view of the log.
.
If you have any of these properties right now, it would be very hard to realise the kind of transparency benefits.
So, we ask what is log transparency really? So, it's a certificate transparency is a mechanism to ensure that all certificates are discoverable. And the discoverability for me is transparency is a good word but I prefer the word "discoverable". Discoverability means all the stakeholders get to see the same list of certificates; that's browsers, domain owners, and certificate properties and any security researchers and any other interested parties as well. If we look at this from another perspective, certificates are effectively a claim. And the claim that we're interested in, you know, the fame that I'm talking about here this public key was legitimately requested or specified the domain or domains that are inside the certificate. Now, browsers I believe are in there, domain owners can verify that, if they can find the certificates, and CAs are making this claim, and interestingly, CAs can also verify this claim. Right. So like if key theft has happened, CAs ‑‑ any certificate that's been really created by the CA, is ability to say yeah, I created that. But if any have been created that they weren't aware of, then that would really indicate kind of process has been broken there or internally the log key record keeping or the key hack has happened. So if this way of looking at things is claims, believers and verifiers, we have documentation for that, it's linked below, and it's called the claimant model which is just, to use a price terminology for discussing logs and designing ecosystems is really what we're trying to get is more people to use transparency but not to directly copy certificate transparency or it would be quite ‑‑ transparency and blindly. The goal here is to get the people to take the lessons from certificate transparency, what works really well about it and design ecosystems with what.
And so this property of discoverability which basically means anything that's been trusted and eventually been verified, we found homes for this property in other ecosystems, right, so, pixel 6 recently launched with binary transparency for its factory images, and there is various firmware transparency projects, so axial have an armory drive and six store DPS very projects happening at the moment. Quite noticeably, the programme has a module proxy which also uses transparency logs at the heart of it to make sure that everybody sees the same commitments to modules they are depending on.
So, that's transparency. And there is more information there at transparency.dev, but I'll hand it over now for any questions.
JOB SNIJDERS: Thank you so much, Martin. No questions in the Q&A yet. If I add a comment from my personal perspective on how certificate transparency is applicable to the RPKI infrastructure, what I personally am interested in is not keeping track of all ROAs or BGP certificates that are issued, those are all entity certificates, or EE certificates, but to keep a close eye on the CA certificates, aka how entitlements to Internet number resources are distributed by the RIR and if, for example, an RIR, by accident, grants an entitlement to the wrong certificate authority. So this is where I think a full log of certificates in a Merkle tree will have significant ‑‑ will help us increase our trust in the ecosystem.
PAUL HOOGSTEDER: No more questions.
JOB SNIJDERS: If anyone has any questions at a later moment, please send them to the Routing Working Group mailing list and us Chairs will make sure that experts from outside in a specific field of work will help answer any such questions. Martin, thank you so much for sharing your time and insights, and until next time.
With that, we will switch to the next presentation by Pim van Pelt. He recently took on an adventure to re‑number his autonomous system from one number to a slightly better‑looking number. And Pim will share his experiences in this renumbering project. Pim, the floor is yours.
PIM VAN PELT: Thanks, Job, for the invite and also thanks, Ignas, for helping me figure out how to do Meetecho and get this stuff all squared away. My name is Pim van Pelt, I have been in the RIPE community for a while, since about '99, and when Job asked me this on IRC, this is not strange, certainly, but then I reflected back and in these at least 22‑odd years, I have personally never renumbered an ISP into a new AS and so I was kind of figuring out how this goes, and just as a show of hands in the chat, if anyone could sort of plus one or wave at me if they have done something like this before, notably not turned down an ISP and then bring it up under a new as but actually a running ISP.
Okay. So I moved to Switzerland about 15 years ago and had to register the obligatory ipnd.ch. The last year or so I got bored and started to run an actual ISP and incorporated this thing in in 2021.
You may have heard of me from from SixXS, an IPv6 tunnel broker that was pretty large back in the day. We started in about 1999, turned it off in 2017, because we didn't really think we were helping that much any more. Founded this with this guy who found a project called the Gost Route Hunter if you were back there in the day you would remember these weird AS paths which show up and we built a looking glass to sort of figure out what types of implementations were buggy, and we borrowed an AS number back in the day and borrowing was easy back then, it wasn't quite as formal.
A little bit about IPng Networks itself. We like software‑based routers. I happen to think that CPUs of today are as powerful or even more so than the A6 and FPGAs of a decade ago, if you use them correctly. I have worked a little bit on vector packet processing, a completely different presentation. As, you know, he can check that out in December.
Anyway, DPDK and VPP are ‑‑ these ways are doing very, very fast packet forwarding rate on commodity hardware. So 100 million PPS easily done.
We acquired this AS 8298 from SixXS and also built a European ring based on VPP and Bird. If you ever see Bird in practice, you shouldn't assume it's a Linux kernel or a free BSD or OpenBSD kernel. Sometimes these things can be pretty fast.
We do peer on the FLAP, although, in my case, the 'L' in FLAP means Lille ‑ Frankfurt, Lille, Amsterdam and Paris ‑ and we have about 1850 adjacencies, the numbering may not be as easy in this case.
So as a little bit of a timeline and I'm not yet done, I started this in, at the end of October on the 22nd, by publishing a blog post that is predicting what I would do. And then either in December I was happy because it worked out or I was crying because I broke everything and it's all down. Timeline is here, but I'll just move on.
.
So, preparation. Like step 0 is get this AS number. The original holder was Easynet, now Sky, and in 2004, so, RIPE NCC registered this AS number to my co‑founder at SixXS and I asked them nicely would you mind if I used this, and they both agreed, so at this point it's a transfer agreement between your again and myself that signals his willingness to move the AS to me. It's kind of boiler plate. You can download a template for this. Then also an agreement between Sky and my own LIR, that signalled the transfer from sponsoring LIR to ch.ipng. Do the paperwork, it didn't even take that long, maybe 48 hours end‑to‑end. And in a matter of these few days the AS number was safely in my hands at ch.ipng. Then of course to prepare this move I have to ensure that are objects in the Ripe Database for the IR folks and this will ‑‑ ROAs assigned for both 50869, which was the original AS number, as well as the 8298 that I am moving into.
.
So, this network that I described has a little arrow at C, that is where I am now, that's my Home Office and the first step here is to split this AS number into two parts. You are not going to see anything on the DFC at this point but certainly in C, I will tell my routers to originate these prefixes that are in my own AS, AS 50869. And then other routers will stop nil routing and announces it themselves but simply carry forward what they know from the network already. Then to make sure I have reachability in the entire network I'll put all of my connected nets and statics into OSPF. Not all operators do this, I happen to do this. It means all of my routers can each other other and connect.
Nothing has happened just yet. Then I can switch these two routers at C, which is where I live, into AS 8298. One by one, I'll re‑number these to connect out to AS 50869 and start announcing these three IPv4 prefixes and three IPv6 prefixes. So, at this point, the first routing table change happens. You will see my own prefixes at IPng pop up at past 50869, 8298. So far, so good. There is a local tiny Internet exchange run by the guy who wins this Kahoot every time, so he has a little /2 and /64 exchange point, which is an excellent place for me to test does peering even work at all. I did connect to these two AS numbers, 58299 and 58280, one of which actually provided me a full table, which is very nice, in case everything else fails, at least my house has Internet.
Then I also operate a tiny little colo in a location A, that is now encircled in yellow. It's a customer of mine, they have a small floor with two fibres out to my network, and I have a colo there. And there is a customer in this colo that takes a full transit from me, so they are essentially a downstream. And now that we have moved our own prefixes into sort of 8298 we can put this first customer behind that AS as well.
So, interestingly, even though this colo is split between two ASes like all of its reachable is because it all shares one OSPF backbone, area 0, so if anything in E or D or C receives a packet destined for the colo it knows how to find it already. It doesn't need BGP for that at all. As such I can take these two routers and put them into 8298, one after the other and then they will connect to two different routers in my network.
At this point I have an iBGP two islands that are connected via this point do it in the middle and that works. We have established that AS 50869 has reachability to all these more specifics already. We know that the only way to get into 5777 is via the colo and so all this routing will work just fine. But I don't want to keep it there for long so I'll move on to the next step.
So this is a slightly bigger change. It was not too controversial, but we have these two islands, A and C, and I want to connect them via this middle point over here in D, this is interaction near Zurich. So we have had these two islands connecting them via this middle point. That essentially creates a little U‑shape with five routers, you know, in the same AS, essentially moving CHGTG 0 to 8298 connecting and taking transit from the other AS I have. You see this European network pop up, the two errors there are in Zurich and in Geneva. So my AS is now essentially connected, my home network, to the colocation network via the colo in interaction, all three sites are in AS 9298 and they take transit free easily from 50869. But there are also other transit providers and there is an Internet Exchange involved. In this step, at interaction, there is a Swiss Internet Exchange called CHIX, and it's not huge but it has a couple of important peers to me notably Google. I do like sending traffic to Google. So, being able to test drive an Internet Exchange in this way, is like super useful because if I were to lose, it I still have other Internet exchanges with adjacent systems so that networks that I find important.
So then, finally, there is one final step for the Zurich Metro to be over, this is a Rumlong (phonetic), that is a data centre NTT, it was called the E Shelter, has the local Swiss Internet exchange, so at this point it's getting real, I am moving the CHRMA0, its one to the 8298 as well and it has to take transit from the next hop up, which would be Frankfurt for me in the ring and of course I already had moved CHGTG0 to take transit from Geneva.
Swiss XS is now renumbered as well as some smaller Internet exchanges, Community IX and Free IX. There is about 188 broken BGP sessions at this point. There is six routers in my network now all connected in a U‑shape. Hanging off of this ring in two points. And they are all redundantly connected. So at this point, I think we're happy, 50869 is the main transit provider. Openfactory, IP Max and ‑‑ thanks again for that.
So these BGP sessions are broken and it can take a while to fix, and what I learned from this you can't do this without automation, if you do it by hand you will make a mistake and in fact I made several mistakes doing this except I didn't do them on the live routers. What you see here is a screenshot of me editing some files in the repositories generating a complete config which slurps in all sorts of stuff from PeeringDB and other resources as well to generate these Bird configuration. I can do a diff on them to see I made a typo or this is not as I intended it and eventually pushing it to the routers, essentially rsyncing the files onto the machines and reloading this.
.
And the second one is, as I said 188 adjacencies went poof because I renumbered my AS and I couldn't tell them all ahead of time, you know, my peers. So being able to do this Bird show and then pipe it through some AUC, gives me the AS numbers that are reporting bad AS for me. And writing an e‑mail as a template that gets hibrated with stuff we pull off PeeringBD. By the way, Peering DB is actually fantastic. Doing this this way saves me a bunch of typos, a bunch of like mess‑ups, and of course it will eventually go wrong where I send all the people the wrong e‑mail because of a bug in the script but I would prefer that over all the manual labour instead.
So, the last step is yet to come. This is on the 2nd December, so in a week or so, I will move all the nodes in the rest of the ring into AS 8298 which means the German Internet exchange, French Internet exchange, the assortment of Dutch Internet exchanges and the tiny little one in Lille in the north of France as well, so this is where essentially 1,600 adjacencies will be broken and I will rely on this first set of transits to ensure that I have good connectivity. If your AS is in this list, please help me out and remember when I mail you about it or when you see it go down in your monitoring.
This is of course a thinly veiled request for peering. If you peer with IPng Networks, that is great. Please help us by renumbering. If you don't, I'd be happy to peer with you, it helps me a lot because the VP P projects is still in development, at least for the limited control plane part that I wrote and it would be really great to have a lot of BGP interaction to make sure that the forwarding plane and the routing ‑‑ the RIB stay in sync for us.
So for me, success means no outages. And we're on step 5.2 today, having done all of Zurich including the three Internet exchanges. I haven't had an outage yet. So knock on wood, and I think it would be a success if I lose less than 100 adjacencies at the end of this project, just kind of last until January 2022. And with that I'm done speaking. If there are any questions or remarks I'd be happy to take them.
PAUL HOOGSTEDER: Nothing in the queue yet.
JOB SNIJDERS: Thank you so much, Pim, for sharing your insights, experiences in this exciting project.
PIM VAN PELT: Thanks for having me.
JOB SNIJDERS: Earlier you asked whether people had experiences renumbering ASNs themselves, but I also know of the opposite where organisations decided to not re‑number because the project effort was considered insurmountable.
PIM VAN PELT: Yeah. It could have gone wrong in all sorts of ways, but like, honestly, the automation that I showed is the number one way to make sure you keep sane like regenerating conflicts to take new iBGP sessions and all that stuff.
JOB SNIJDERS: Network automation is king. Randy Bush is lining up for the microphone.
RANDY BUSH: You might take a look at some papers and I think it was also this thesis, but I'm not sure I remember, Lauren van Woeber (?) who lives in a little town called Zurich in some country in Europe, I forget, and did very good formal work on router migration configuration hit list.
JOB SNIJDERS: What was the title of the paper?
RANDY BUSH: God knows. Of course he has published a million, but I can dig it up. He'll beat me up with an e‑mail and I'll find a thread for you, but it's Lauren van Woeber.
PIM VAN PELT: That's cool. One thing I noticed is that there is very little information on the Internet about this renumbering. I did a little bit of homework ahead of time. There is actually a lot of his information, you know, things that you might type that don't work very well. So, I hope that if anyone watches this after the RIPE meeting, they will find this is at least one data point of someone who did it and there is some contact information as well if you want to exchange notes or ask some questions, I am happy to help.
RANDY BUSH: Yeah, there was also some work in the IETF a working group or a series of, I forget, around renumbering. It's a bitch, and those of us who have done it generally prefer not to remember it.
PIM VAN PELT: I understand.
JOB SNIJDERS: I see one question lined up.
PAUL HOOGSTEDER: Yeah. Gili (?) wants to know: "What are the advantages of renumbering? Well, I guess I missed the beginning. So why did you do it?"
PIM VAN PELT: You didn't miss the beginning. I had this AS number in the corner of the room and it was gathering dust and it was four digits and I always wanted a four‑digit AS number and it doesn't help me at all but yet I did it because I can.
JOB SNIJDERS: That sound like a very honest answer. Rudiger Volk will go first and then next after him will be Randy Bush.
RUDIGER VOLK: A two‑digit number, will you do it again?
PIM VAN PELT: Yes. I'm not going to tell a lie. Yes.
RANDY BUSH: So, back in the day when we built very owe, we merged 66 ISPs ‑‑ so answer what the motivation. So, you know, you buy an ISP, they were in units based or sprint space or whoever their major upstream that they were, you know, originally married to, and seeing as we were competitors we had to move them out of those spaces. So, it was fun. And, you know, that's I think a more normal motivation for people numbering today is to get in space that they financially and contractually control.
PIM VAN PELT: Right. With that, let's drink some coffee, shall we, know?
JOB SNIJDERS: Yes. Pim, thank you for your time.
PIM VAN PELT: You're welcome.
JOB SNIJDERS: And dear Working Group, the next time we will see each other will be at RIPE 84, it's not entirely clear where exactly that will take place, but certainly inside cyber space. Thank you all for attending and I wish you a pleasant day.
PAUL HOOGSTEDER: Have a great day.
(Coffee break)