Mikrotik: Regex URL Block

Sumber: https://support.1blocker.com/hc/en-us/articles/360002309778-Using-Regex-to-Block-URLs

Dalam regex, setiap karakter "encoded" dan divalidasi simbol demi simbol dari kiri ke kanan. Kita dapat membagi karakter regex menjadi dua grup:

karakter biasa seperti huruf, angka, dan beberapa simbol. Ketika mesin regex bertemu dengan karakter biasa, dia mengartikan karakter seperti apa adanya, tanpa mengubah fungsinya. Misalnya, kata "blocker" regex menggunakan pola yang paling dasar, cukup mencocokkan kata "blocker" yang sebenarnya.
Special karakter, di sinilah kekuatan besar regex terletak. Beberapa karakter regex membuat mesin bertindak dengan cara khusus, dan akan memungkinkan pencarian lebih fleksibel. Misalnya, sebuah titik. cocok dengan karakter apa pun, sehingga tiga titik ... dapat berarti: dog, cat, run, one, etc. Berikut adalah karakter spesial:

. - a dot
[] - square brackets
() - parentheses
? - a question mark
* - an asterisk
+ - a plus
^ - a caret

Jika URL berisi salah satu karakter spesial, maka karakter tersebut harus di escape menggunakan backslash \.

Contoh, kita akan memblokir URL: https://domain.com Ada sebuah dot sebelum "com". Mesin akan berfikir ada sebuah karakter sebelum com, tidak harus dot. Contoh, mesin akan match dengan https://domain#com , tentunya bukan yang kita inginkan. Oleh karenanya kita perlu menambahkan backslash sebelum dot agar match secara literal https://domain\.com, sehingga regex akan melihat ada dot sebelum "com".

Special Characters

If the main idea of regular expressions is a bit clearer now, let’s take a closer look at every special character supported by our engine. Important: these features apply only while creating rules in the Expert editor (available on Premium, learn more: Going Premium).

   . (dot) can match any single character (letter, digit, whitespace, anything). If you actually need a dot in a URL, don’t forget to escape it with \.

For instance, https://1bl.cker\.com will block not only https://1blocker.com, but also https://1blucker.com, https://1bl1cker.com, etc.

   [a-c], [abc] or any set of characters inside [square brackets] will only match any combinations of given characters [abc] or a sequential range of characters [a-c] and nothing else. In other words, the rule will be triggered only if any of the characters given in square brackets are found in the corresponding place in the URL.

A couple of examples: the pattern https://1blocker\.[kc]om corresponds either to https://1blocker.сom or https://1blocker.kom.

And speaking of ranges of characters, https://[1-3]blocker\.com will match three websites, https://1blocker.com, https://2blocker.com, and https://3blocker.com.

   + represents either 1 or more of the character that it follows (it always follows a character or a group of characters).

For instance, we have the following pattern https://1blo+cker\.сom. It will match https://1blocker.сom, 1blooocker.сom, https://1blooooooocker.com and so on ad infinitum.

   * represents either zero or more of the character that it follows (it also follows a character or a group of characters). So, as it may represent zero characters, there might be nothing in the corresponding position in the URL.

This example https://1blo*ck*r\.сom matches such URLs as https://1blckr.сom, https://1blocker.сom, https://1blooockeer.сom, and so on.

   ? allows you to match either zero or one of the preceding character or group of characters. In other words, it denotes optionality. You can use it in any pattern to match both http and https versions of the same website.

So the pattern https?://domain.сom will block both http://domain.com and https://domain.сom.

   (abc) can capture any subpattern inside of a pair of parentheses as a group, which means that we can apply other special symbols, e.g., +, * or ? to the whole group.

Let’s take a look at this pattern: https://(.*\.)?domain\.com. It will block not only https://domain.com but its subdomains consisting of any number of characters as well, for instance, https://ads.domain.com. Let's break down the pattern. Here we have a group marked by (). The whole group is optional because of ?, and the group itself consists of any number (*) of any characters (.) and an escaped dot \.

   ^ a caret at the beginning of a pattern restricts the URL to start only with the character that follows the caret(^). Also, a caret inside square brackets tells the engine to match all characters except the ones inside given square brackets.

Here are the examples for both cases:

this pattern ^https://domain\.com will make sure that nothing precedes the letter h, it must be the first character of the string. So, if the target URL is a part of another URL, the rule won’t work: https://anotherdomain.com/https://domain.com.

And an example using caret in square brackets [ ], the pattern https://[^d]omain\.com won’t match domain.com, but will work for lomain.com, romain.com, etc.

Useful Templates

We would like to give you two basic patterns you can apply to create effective custom blocking rules.

.*

It matches any URLs, no matter what characters it contains or their total number.

^https?://+([^:/]+\.)?domain\.com[:/]

Apple recommends using this pattern, as it blocks subdomains and is balanced enough, so you are less likely to see any side-effects.

Let’s break it down into smaller parts:

   ^https?://+ matches http:// and https:// and makes sure that there is no text before the URL; 
   ([^:/]+\.)? targets all subdomains if there are any;
   domain\.com matches the domain itself;
   [:/] this tiny part blocks the domain even if the URL contains an extra part after .com. For example, domain.com/page or domain.com:8000

Referensi

Pranala Menarik

Mikrotik

Mikrotik: Regex URL Block

Referensi

Pranala Menarik

Navigation menu

Search