1
00:00:00,033 --> 00:00:07,332
So cleaning the input, which is more technically also known as pre-processing--what's the idea behind that?
2
00:00:07,333 --> 00:00:12,066
Here's what you would normally do for an NP-complete problem as we have talked about so far:
3
00:00:12,067 --> 00:00:18,966
So if you're given the input for an NP-complete problem. What you would do using the techniques from the previous units
4
00:00:18,967 --> 00:00:22,966
is you would fire up your search tree to try and find an optimal solution.
5
00:00:22,967 --> 00:00:27,899
And of course, that search tree has exponential size. So the algorithm goes through that tree here
6
00:00:27,900 --> 00:00:34,566
until, at a certain point in time, it says, "Bingo, I found a solution," or, "I found the best possible solution."
7
00:00:34,567 --> 00:00:41,366
The idea of pre-processing now is similar to something that we already saw for vertex cover or independent set,
8
00:00:41,367 --> 00:00:47,032
where, for certain vertices, while we were traversing the search tree, or even in advance,
9
00:00:47,033 --> 00:00:54,499
what we could already say for certain vertices, we know what assignment that vertex is going to have in an optimum solution.
10
00:00:54,500 --> 00:00:59,932
And we could make that statement without actually going through any branching, but in polynomial time.
11
00:00:59,933 --> 00:01:06,832
And that is the idea of pre-processing. The idea of pre-processing is, if you can actually find certain parts of the input,
12
00:01:06,833 --> 00:01:15,666
where in polynomial time, of course, you can already say how they would be handled in an optimum solution.
13
00:01:15,667 --> 00:01:21,866
So we're kind of nibbling away at the input here. And what that, of course, means is if your pre-processing is successful,
14
00:01:21,867 --> 00:01:29,499
or especially if it's very successful, then the search tree that results from that input is not going to be as big.
15
00:01:29,500 --> 00:01:34,432
So there's certain parts of the search tree that you don't have to do, because you already have found out
16
00:01:34,433 --> 00:01:38,466
in the pre-processing what that part of the solution is going to look like.
17
00:01:38,467 --> 00:01:44,532
So the search tree size will decrease. So we can, for example, cut off this branch here, because we've already
18
00:01:44,533 --> 00:01:49,666
pre-processed this, and we can cut off this one here because we also pre-processed that one.
19
00:01:49,667 --> 00:01:52,666
So now let's make this more concrete, and let me give you a concrete example.
20
00:01:52,667 --> 00:01:58,666
And we're going to do this for SAT, because SAT is a problem where pre-processing is usually very successful.
21
00:01:58,667 --> 00:02:05,299
So if you were, for example, to use a commercial SAT solver, then pre-processing will play a very very important role in that.
22
00:02:05,300 --> 00:02:11,866
I once talked to somebody who develops those solvers, and they basically said that his package works 90-95%
23
00:02:11,867 --> 00:02:19,166
through pre-processing. So even for SAT instances with thousands of variables, his package can basically solve it,
24
00:02:19,167 --> 00:02:23,566
but it can only solve it because the pre-processing algorithms are very good.
25
00:02:23,567 --> 00:02:31,266
So you'll remember that SAT was the problem of finding if a given Boolean formula has a satisfying assignment or not.
26
00:02:31,267 --> 00:02:35,766
And I'm now going to write down a Boolean formula for you, and then we're going to do a little quiz
27
00:02:35,767 --> 00:02:38,766
to make pre-processing more concrete.
28
00:02:38,767 --> 00:02:48,666
So the SAT formula is x1 or x3 or x5, and not x1 or x2 or x4, and so on, and so on.
29
00:02:48,667 --> 00:02:53,066
Now, of course this formula here doesn't have very many variables. It's just six variables--
30
00:02:53,067 --> 00:03:01,432
x1, x2, x3, x4, x5 and x6. So with a little playing around, you would probably be able to figure out if this Boolean formula here
31
00:03:01,433 --> 00:03:06,466
has a satisfying assignment or not. But of course, what we want to do now is pre-processing,
32
00:03:06,467 --> 00:03:12,566
And that means that we want to see if, for certain variables, in this Boolean formula, we can figure out
33
00:03:12,567 --> 00:03:18,132
if they should be set to true or false, without actually trying all possible combinations.
34
00:03:18,133 --> 00:03:23,732
And as I said, we're going to do this as a quiz. So what I would like you to do is to look at this Boolean formula here,
35
00:03:23,733 --> 00:03:33,532
and then consider the variables x1, x2, x3 and x4, and for each of those variables, determine if it's easy to see,
36
00:03:33,533 --> 00:03:38,332
if they should be set to true or false. And by easy, I mean without actually trying around different true assignments
37
00:03:38,333 --> 00:03:44,699
for the other variables, but you can basically immediately say, for these variables, if they should be set to true or false.
38
00:03:44,700 --> 00:03:50,832
I'm going to give you one hint for the solution, and that is that, in my opinion--and this is a bit of a subjective question--
39
00:03:50,833 --> 00:03:59,967
I think that for two of these variables here, it's rather easy to see. And I would like you to select those two.