0% found this document useful (0 votes)
70 views74 pages

Algorithm Intro

The document discusses computer architecture and algorithms. It describes how a computer works at a basic level, with a CPU that performs arithmetic operations on data stored in memory. Algorithms are sets of instructions that are executed step-by-step to solve problems. The document outlines common data structures and approaches used in algorithm design like recursion, divide-and-conquer, and dynamic programming.

Uploaded by

faaizhussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views74 pages

Algorithm Intro

The document discusses computer architecture and algorithms. It describes how a computer works at a basic level, with a CPU that performs arithmetic operations on data stored in memory. Algorithms are sets of instructions that are executed step-by-step to solve problems. The document outlines common data structures and approaches used in algorithm design like recursion, divide-and-conquer, and dynamic programming.

Uploaded by

faaizhussain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODP, PDF, TXT or read online on Scribd
You are on page 1/ 74

Concept of Algorithm

• Computer Architecture
• The Concept of Algorithm
• Complexity and Order
• NP-complete Problems
Self Introduction
Name: Takeaki Uno
Affiliation: NII, Sokendai
Age, status: 44, professor
Research Area: algorithm, data mining, bioinformatics, operations
research

Recent Studies: efficient practical algorithms for basic operations


and basic tasks on huge data in genome science, and data mining,
etc.
Goal / Evaluation / Reference
• Algorithms are efficient especially for processing big data

• The goal is to learn the skill and sense of developing algorithms, or


viewing the problems and issues from algorithmic view points

Evaluation: a report, on the end of the semester

Reference: any textbook of title “Algorithm” and/or “Data


structure”, that you feel interesting/understandable

• e-mail: uno@nii.ac.jp
Homepage : http://research.nii.ac.jp/~uno/index-j.html
(the slides are uploaded)
Contents of the Lecture
Architecture of computer, and the program
  CPU, memory, OS, memory management, instruction, register
 
Data structure (efficient ways of recording/using the data)
stack, queue, heap, binary tree, hash

Basic Design of Algorithms


recursion, divide-and-conquer, enumeration, dynamic
programming, sorting, string matching

Graph Algorithms
matching, shortest paths,…
Connection to Latest Researches
• Algorithms are connected to the many latest researches
  (algorithms are studied in the areas, in which algorithm are
related)
• Especially, having simple mathematical models and criteria, with
huge data or complicated data

+ Bioinformatics
Human genome is of 3 billion letters. They have only 4 kinds of
letters ATGC, thus computers have advantages. Genes and variations
often have some implicit rules

+ Chemoinformatics
Chemical compounds are networks of atoms, so tractable for
computers. However, the stereographical structures are difficult
Connection to Latest Researches
+ Astronomy
Computer systems of observatories generates huge amount of data,
and those data is accumulated to integrated databases

+ Physics (quantum mechanics)


data from particle accelerators are incredibly huge. Now, we can
only filter and store the data, only

+ Social science
recently, there have been progresses on understanding systems of
societies by simulations with including micro activities
Connection to Latest Researches
+ architecture
other than structural calculations to evaluate the strength of the
buildings, material optimizations and structure optimizations reduce
the costs

+ literature
large amount of literatures allows as to find interesting knowledge
by computer operations

+ other
development of simulation science, that is to understand the real
world phenomenon and mechanisms by simulations
Latest Researches in Japan
• Japanese algorithmic researches are of high levels in the world
+ interior point methods for optimization problems
+ discrete mathematics and combinatorial optimizations
+ compression and succinct indexes
+ database search and database construction
+ data mining
+ computational geometry

• Studies on algorithms are currently on theory, mainly


Engineering theorems would be derived in future
Architecture of Computer
Basic Architecture
• The center of computer is “CPU (central processing unit) that
basically does basic arithmetic operations

• CPU is like an engine of a car


Having CPU would be a definition of computer

• Other than CPU, a computer has memory that records several


(many) values (mostly 0 or 1)

CPU memory
Interfaces
• Monitor, keyboard, mouse etc. are connected to CPU, and
controlled/managed by signals given by CPU

• From CPU, these signal receive/generation seems like memory;


writing some values to a specified memory, signal will be sent

• So, everything is operated by signals, especially 0/1


Program
• CPU can do
  + read value of memory, write to memory
  + arithmetic operations such as addition and division
  + compare the values, and branch the operation

• These are called instructions (values are called operands)


• The order of instructions are written in a part of memory
This is called program

• Execution is to operate instructions according to the program

CPU memory
Execution of Program
• CPU executes instructions written in the program, sequentially

• Instructions are written in numbers, and each function is


assigned to a number

• During the execution, the program can change the memory


place to read the program (jump)
• Branching is done by this; jump or not according to the
comparison (conditional jump)

CPU memory
プログラミング言語
• CPU の命令は数値。しかも、1つ1つの処理は非常に単純(これ
をマシン語、あるいは機械語という)
• これを組合せてプログラムを作るのは、かなり大変

• そこで、通常はもう少し抽象化して、人間に見やすくした「高級
言語」とよばれるものを使う
( C , JAVA , Perl, Basic, シェルスクリプトなど)

• 高級言語で書かれたプログラムを、 CPU に実行させるには、プ


ログラムをいったんマシン語に翻訳する(この作業をコンパイ
ルという)か、プログラムどおりの処理をするプログラム(イン
タープリタという)を使う
概念的な例
1: 場所 15 に 0 を書き込む
2: 場所 16 に 2 を書き込む
3: 場所 15 と 16 の値を足し、それを場所 15 に書き込む
4: 場所 17 の値が 10 以上ならば、8へいく
5: 3へ行く

8: ...

• 2-4のように局所的に繰り返して実行される部分をループ
という

CPU メモリ
メモリのアクセス
• メモリは 1 バイトずつ数値が記憶されている
• メモリには、記憶場所に 0 から始まる番号がついていて、好き
な場所のデータに一手でアクセス(書き込み/読み出し)でき

• このように、好きな場所に一手でアクセスできるメモリをランダ
ムアクセスメモリという

• ランダムアクセスでない記憶装置は、例えば CD とかハード
ディスクとか、テープ

1 0 1 1 0 1 0 1
数値の表現(バイト)
• メモリは数値を覚えられるが、実は 0 と 1 しか覚えられな
い (ビットという)
• しかし、大量に 01 の数値を記憶できる

• 実際は、 01 しか記憶できないのでは用をなさないので、 01
の数値を 8 個セットにして、それを 2 進数とみなす。( 0-
255 が表現できる)
 - これを 1バイトという

• 足す引くかけるの算術演算の結果、 2 進数 9 桁目に繰り
上がったとき、あるいはマイナスになったときは、その部分
は無視して計算する

1 0 1 1 0 1 0 1 128+32+16+4+1 = 181
16 進数
• コンピュータに記憶された数を扱うときは、当然、2進数で話
をしたほうがわかりやすい
(例えば、 1 バイトに入る数の最大は 11111111 とか。直感的
になる)

• 01 だけで表記すると場所をとる、桁が大きくなる

• そこで、 4 桁ずつひとまとめにして、 24 = 16 進数で表記


することが多い

• 10-15 は、 abcdef で表記する。例えば、 14=d 、 255=ff


データの記憶
• メモリには、記録されている数値が整数か、文字か、といった、
データの種類を覚えておく機能はないので(数値として付加的
に記録することはできるが)、自分(プログラム)が何を書いた
か管理する

• つまり、「何のデータはどこに書いた」ということを決めておく。プ
ログラムでは、データの計算をする際には、場所を決めて書く

• 常に 0 か 1 の値が書いてあるので、「書き込まれていない」とい
うことも検出できない

点数
点数
20点数
20 1 0 1 1 0 1 0 1
20
整数小数文字
- 整数: 01 の数値をいくつかセットにして、それを 2 進数とみな
す。通常、 32 ビット = 4 バイト。 (0-40 億くらい )

- 実数: 整数と小数点の位置をセットで記憶。小数点は 2 進数の


位置で記憶。通常、整数が 56 ビット、小数点が 8 ビット( 256 桁
分)

- 文字: 文字と整数値を対応させたコード表があり、それを使っ
て整数値として記憶する

1 0 1 1 0 1 0 1 128+32+16+4+1 = 181

+ 0 1 0 1 4+1 + (16+4+1)/32 = 5.65625


負の数の表現(バイト)
- 負の数: 最上位のビットが 1 になった数は、 256 を引い
て、負の数とみなす。通常の数と思って計算したときと、負
の数と思って計算したときの結果が同じなので、結果を変
換する必要がない

例:  255 は -1 になる。 1 を足すと、


1111111 + 1 = 100000000 。 9 桁目は無視するので、 0
これは、実際の -1 +1 の答えと一致する

例:  -2 は 254 、 -5 は 251 。両者を足すと、 505 。 256 の


剰余を求めると 249 。これは -7 に対応。両者を掛けると
63754 。剰余を求めると 10 。
文字コード表 (ASCII)
- 0-255 の整数と文字の 1 対 1 対応を表したコードを ASCII
コード、そのコードに基づいて書かれた文字を ASCII 文字とい
う (だいたい世界標準)
- コンピュータに文字が記憶されるときは、このテーブルにした
がって数値に変換され、 1 文字 1 バイトで記録される(大小の
比較ができる)

| 20 sp | 21 ! | 22 " | 23 # | 24 $ | 25 % | 26 & | 27 ' |


| 28 ( | 29 ) | 2a * | 2b + | 2c , | 2d - | 2e . | 2f / |
| 30 0 | 31 1 | 32 2 | 33 3 | 34 4 | 35 5 | 36 6 | 37 7 |
| 38 8 | 39 9 | 3a : | 3b ; | 3c < | 3d = | 3e > | 3f ? |
| 40 @ | 41 A | 42 B | 43 C | 44 D | 45 E | 46 F | 47 G |
| 48 H | 49 I | 4a J | 4b K | 4c L | 4d M | 4e N | 4f O |
| 50 P | 51 Q | 52 R | 53 S | 54 T | 55 U | 56 V | 57 W |
| 58 X | 59 Y | 5a Z | 5b [ | 5c \ | 5d ] | 5e ^ | 5f _ |
| 60 ` | 61 a | 62 b | 63 c | 64 d | 65 e | 66 f | 67 g |
日本語の文字コード
• 日本語については、日本語独自のコード表がある(世界規格と
して使われているので、外国でも日本語の文章が読める)

• 歴史的な事情で、いくつかのコードがある

 - JIS コード
 - shift JIS コード
 - EUC コード
 - unicode
JIS コード
• 英語は 0-127 しか定められていないので、 128-255 の部分にか
なと記号が入っている
JIS 漢字コード
• 漢字については、制御コードという、特別の文字 2 つが入った
ら、日本語のモードになる。漢字は、 2 バイト 1 文字。もう一度
制御コードが来たら、以後は英数字に戻る

16 区( S-JIS 第一バイト= 0x88 )

  0 1 2 3 4 5 6 7 8 9 A B C D E F

9                               亜
A 唖 娃 阿 哀 愛 挨 姶 逢 葵 茜 穐 悪 握 渥 旭 葦
B 芦 鯵 梓 圧 斡 扱 宛 姐 虻 飴 絢 綾 鮎 或 粟 袷
C 安 庵 按 暗 案 闇 鞍 杏 以 伊 位 依 偉 囲 夷 委
D 威 尉 惟 意 慰 易 椅 為 畏 異 移 維 緯 胃 萎 衣
E 謂 違 遺 医 井 亥 域 育 郁 磯 一 壱 溢 逸 稲 茨
Shift-JIS コード
• Windows で使われているコード
• 制御文字を使わず、 2 バイトの文字をあらわしている

• 1 文字目が記号だと、次の
1 バイトとあわせて記号だと
みなすので、記号が使えな
くなる

• 2 バイト目は、改行コード
など、通常の文字になること
もあるので、日本語対応して
いないソフトで見ると、
ぐちゃぐちゃになる
EUC コード
• 主に UNIX で使われているコード
• 制御文字を使わず、 2 バイトの文字をあらわしている

• 1 文字目が記号、かななら、
2 文字目とあわせて、
漢字コードとみなす

• 2 文字目も、かな記号なの
で、漢字の一部ならかな記号、
と判断できる

• 記号、かなが出せない
UNICODE

• 世界の全ての言語の文字を1つのコードで表す、というもの
• 2 バイト 1 文字を使わず、 2 バイトの文字をあらわしている

• 漢字など、地域で字体が
違うものも1つとして扱うため
事実上はその国固有の
フォントが必要

• 最近は 4 バイトで表すようだ
キャッシュメモリ
• コンピュータのメモリのアクセス速度は、演算速度よりかなり遅
い( 10 倍以上)

• そのため、メモリを読み出すときは、しばらく演算をとめて待つこ
とになる。次の命令を読むときもそう。

• そこで、メモリを読むときはまとめてたくさん読む
• 読んだメモリは、 CPU に直結した速いメモリにしばらく取っておく
(この操作をキャッシュ、キャッシュに使うメモリをキャッシュメ
モリという)

キャッシュ
CPU メモリ ディスク
メモリ
キャッシュによる高速化
• メモリアクセスが、キャッシュに入っている場所ばかりだと計算は
大幅に速くなる
  キャッシュの効率を高めるような保存法が重要
  引き続いて、キャッシュの中を見ることが多い計算をさせる
と、高速化できる

• ディスクのアクセスにも、同じようにキャッシュを使う
  CPU 直結メモリの代わりに、通常のメモリを使う。メモリのほうが
ディスクよりアクセス速度が速いので、高速化できる

CPU メモリ HDD


変数

• 高級言語では、数値を記憶する際に変数というものを使う
• 変数には、何か値が1つ入る

• コンパイルして実行するときに、各変数にメモリの場所が割当て
られる。以後、変数をアクセスするときには、その場所にアクセ
スするようになる
配列

• 大量のデータを扱う場合、全てのデータに直接変数を割当てる
のは、大変。プログラムを書くのも大変

• そこで、配列を使う

• 配列を使うと、変数+添え字、という形でデータにアクセスでき
る。添え字のところは変数を入れられるので、例えば、 100 個
の変数を 0 にする、といった作業も、ループを使って楽にできる

1 0 1 1 0 1 0 1
OS
• コンピュータは、機種によって、入出力の方法が違う
• ディスプレイに文字を書くためには、文字の形になるよう、デー
タを書き込まなければいけない

• こういった、ハードウェアによる違いを吸収する、低レベルの処
理を行う、実行するプログラムの管理、などをするプログラムが
ある

• これを OS という ( Windows 、 UNIX など)


• 固有の接続機器を、標準的な入出力方法を用いて扱うプログラ
ムをデバイスドライバという

• メモリの管理も OS が行う
メモリの確保
• 普通、コンピュータでは、複数のプログラムが同時に実行されて
いる(含むデバイスドライバ)

• そのため、メモリの適当な場所に適当に数値を書き込むと、他
のプログラムの実行を阻害する(下手をすると動きが止まる)

• そのため、メモリを使いたいときには、 OS にお願いして、必要な
分だけ使える場所をあてがってもらう(これをメモリを確保する
という)
Complexity and Order
Ways to Solve

• ”How to solve” and “The solution” are different


  + Stew and the way to cook stew
  + The way to solve a puzzle ring
  + The way to stack the tower of Hanoi
  + The solution to a quadratic equation
    x2 - 2x + 1 = 0

• ”Ways to solve” is more abstract than solution;


solution is a solution to just an instance

• Consider ways to solution


Computer Algorithm

Algorithm:
  set (order) of instructions to do a specified task
  Usually, algorithm is that for computers
  Algorithm can be considered as a theory of designing the way of
computing, or theory of efficient programs

Summation from 1 to 100:


 • 1+2+,...,+100 ‥‥ 99 additions
 • (1+100) * (100/2) ‥ 3 operations
Cut a carrot into slices of star shapes
 • make slices, then cut the edges of each slice
 • make a star shaped stick, then slice it
Algorithms in Your Life
• how to cook
• calculation with figures written down on paper
• car driving skill
• employment interviews in companies

(all are not complete …)


Evaluation Criteria
• How do we evaluate algorithms?

• How simple
   simplicity of programs
• Efficiency (speed, place, cost,…)
   speed and memory usage (electric power consumption)

• Simplicity is difficult to evaluate, but speed and memory


usage can be measured by values
Evaluation of Time
• It is easy to evaluate the speed of a program; just get the
duration of execution
• Of course, the quality of the writing of the program affects this

• But, this is the evaluation of “programmer”


not that of “algorithm”
  Even though the algorithms are the same, the skill of
programmers makes the difference

• We want to have a good model of measuring the efficiency of


algorithms
Turing Machine
• Making model of algorithms means making model of computers
• We need to have quite basic one

• The base of computer is CPU and memory


The Turing machine is such a model

• A Turing machine is composed of a “tape” that records sequence


of 01 data, and a “head” that reads/writes 01 data, and an
execution unit managing the movement of tape

• A Turing machine can read/write data, move the tape, and change
the internal state of the execution unit
A Model for Computation Time

• The time of Turing machine is evaluated by the number of


read/write operations

• Basically, any operation of a computer can be simulated with


several operations of Turing machine
the number of operations is independent from the time/size of each
operation

• So, both can be evaluated by the same model


 the number of operations executed by an algorithm will be a
good measure
Abstracted Evaluation

• The idea of “the number of basic operations” is good. It is much


abstracted. However, still the skill of programmer affects

• We want to have more abstracted one that is independent from that

• Anyway, what kind of issues do we want to evaluate?

• For example, measure the efficiency on the execution of a specified


task? for a benchmark problem?
  then, basically the skill of programming still affects
Evaluation for Input Size

• Computation needs some inputs


The size of input increases, then computation time will be long

• Observe the increase of time against the increase of size


 we will get a function representing this relation

• For the input size n (#bits, in exact), we represent the number of


the necessary operations such as 10n2+5n+2

• This varies among several inputs, use average?


  to assure the time, we use the maximum
Ignoring the Coefficients

• We want to know the increase of operations


  it depends on the largest degree, mainly; 10n2+5n+2
  we focus on 10n2 , and ignore other coefficents

• Programming skills makes simple operations more simpler


  affects to coefficients such as 10n2+5n+2
  we focus only on n2, and ignore all constant factors

• This way of evaluating the cost of computation is called


“evaluation by order”
数学的な定義

• ”A function f(n) is O(g(n))” is defined by

lim f(n) < +∞


n +∞ g(n)

• We say that the order of an algorithm is O(g(n)) if the function


bounding the maximum number of basic operations for any input
of size n is of O(g(n))

• The order of computation time is called time complexity, or


simply complexity
Summary of Order

Order of computation time: O(f(n))


a (polynomial/exponential) function representing an upper bound
of the computation time in the term of input sizes
+ coeeficients are ignored since it represents the programming
skill
+ we focus only on the maximum magunitude
    (depends only on the maximum degree for larger inputs)
Find a word in a dictionary of n items
 • linear search O(n) Small order algorithms
 • binary search O(log n) are so efficient even
though the computer
Sorting n numbers
and/or programmer is
 • insertion, quick sort O(n2)
not good
 • merge sort, heap sort O(n log n)
Advantage / Disadvantage
• When the problem is big, acceleration by algorithmic theory is
drastic
• When the problem is small, practical performance is bad due to
sophisticated construction of operations
• Can not represent the average and practical performance

size of million
size of 100
2-3 times
10,000
times
Evaluation of Memory Usage

• The efficiency of memory usage can also be evaluated by the order


and complexity
+ the order represents the worst usage of memory amount for the
input size
+ the complexity is called “space complexity”

• We can also ignore the skill of programming

• For the memory usage, evaluation of the worst case is more


acceptable since the algorithm stops if the memory is short
Fundamentals of
Complexity Theory
Good Design

• When we can decrease the order of an algorithm, we could


“improve” the design of the algorithm

• …, then, some question arise; which algorithm is good/bad?


Where is the boundary

• An idea is to compare the most naïve algorithm

• So, observe the most naïve algorithm


Naïve Algorithm

• We here say non-well designed algorithm  naïve algorithm


• Suppose that here naïve means that to explorer all the possibilities

+ find all numbers from a1,…,an that are less than b


   examine all subsets of a1,…,an, and output the one that
partitions the set in the way of the statement

+ sorting
   examine all possible orders, and output the one which is
increasing order

+ find the longest decreasing subsequence of a1,…,an


   examine all combinations of a1,…,an, and output the longest
among all decreasing sequences
Time of Naïve Algorithms
+ find all numbers from a1,…,an that are less than b
+ find the longest decreasing subsequence of a1,…,an
   computation time is O(n2n)

+ sorting
   computation time is O(n n!) ≒ O(n2n logn)

• The time spent by these algorithms, that spends time exponential in


the input sizes, is called exponential time, and these algorithms
are called exponential time algorithms
In “Good” Ways
+ find all numbers from a1,…,an that are less than b
  scan the numbers, and output those less than b
  computation time is O(n)

+ find the longest decreasing subsequence of a1,…,an


  for each ai, find the longest decreasing subsequence ending at
ai 。 Find the longest among those of all ai’s
  computation time is O(n2)

+ sorting
  scan the sequence and swap the pairs of neighboring numbers
with the reverse ordering; repeat this n times
  computation time is O(n2)
“Good” Algorithms

 computation time is O(n)


 computation time is O(n2)
 computation time is O(n2)

• They are not exponential


• These times that are polynomial in the input sizes are called
polynomial time, and the algorithms are called polynomial time
algorithms

• Any polynomial, even with large degrees, is always smaller than


exponential, for sufficiently large n, thus essentially different
  hence, polynomial is good, in some sense
Difficulty of Problems
• Good algorithms solve problems in short time

• So, the existence of good algorithm means that the problem is


easy to solve (no need of testing all possibilities)
   the difficulty indicates the cost of developing algorithms

  polynomial time means that the problem would be easy


  no polynomial time is found means that the problem would
be difficult

• … then, we can compare the difficulties of two problems


Exact Comparison
• We can compare the difficulties of the problems, by comparing
the orders of the algorithms for the problems

• However, in exact, it is not sufficient


  faster algorithms would exist

• So, consider special cases in which we can surely compare,


although it is perfectly general
Problem Reduction
• Suppose that there are two types of problems A and B, and any
instance of B can be transformed to that of A, or any instance of B
can be solved by solving an instance of A plus α

+ find the kth largest value from n numbers


  after sorting the numbers, we can find it in O(n) time
  this problem can be solved by sorting

• Transforming an instance of B to that of A, or solving an instance


of B by solving that of A is said to “reduce problem B to A”

• In this case, A is more difficult, if time is no less than +α


• Input/output always needs O(n) time, thus finding kth largest value
is no harder than sorting
Difficult Problems

• Easy problems are stated by showing fast algorithms


• How to show the difficulty?

• Hard to state that “any algorithm needs exponential time”

• So, find most difficult problems among the problems having


polynomial/exponential time algorithms

  use the way to compare the difficulties; state that any


problems can be reduced to this problem in polynomial time
(hardest in polynomial time/exponential time problems

• Are there any such “useful problems”?


Satisfiability Problem
• There is a such “useful” problem, called satisfiability problem (SAT)

• SAT is to check whether there is an assignment to Boolean variables x1,


…,xn s.t. they satisfy the given formula
 we can assume that the formula is a CNF

• The trick is, any computer circuit is represented by a formula


• A computer circuit (of an exponential size) can simulate a non-
deterministic Turing machine that can examine exponentially many
possibilities at once in parallel
• Difficult combinatorial problems can be solved in polynomial time
• SAT is most difficult among them (problem seem to need exponential
time), since all the others can be reduced to SAT
Certification of Difficulty

• SAT is a kind of a landmark of difficult problems


 a problem is “difficult” if it is no easy than SAT

• Such a problem is called NP-hard (SAT is NP-hard)

• Problems that can be solved in polynomial time by non-


deterministic Turing machine are called NP problems (SAT is NP)
 the certification of “yes” answer is of polynomial size

• An NP problem that is NP-hard is called NP-complete


• The problem class of NP-complete problems is the one of
composed of problems of most difficult among all NP problems
SAT  3SAT

• SAT is NP-complete even if any clause has exactly three literals


 SAT is reducible to this 3SAT problem

Reduction: for each clause of a SAT instance,


+ if size < 3, generate 2 or 4 clauses by appending slack literals
(x)  (x∨y∨z) ∧ (x∨y∨ ¬ z) ∧ (x∨ ¬ y∨z) ∧ (x∨ ¬ y∨ ¬ z)

+ if size > 3, prepare a slack for each literal, and joint them by using
slack literals
(a∨b∨c∨d∨e∨f)  (a∨b∨x)∧ ( ¬ x∨c∨y)∧ ( ¬ y∨d∨z)∧ ( ¬ z∨e∨f)
SAT  k-Clique

• A clique is a subgraph of a graph in which any pair of vertices are


connected by an edge
 SAT is reducible to determine whether a graph includes a clique
of k vertices

• Prepare a vertex for each literal of each clause


• Draw edges between all pairs of the literals, but not if one is the
negation of the other

• There exists clique of size (#clauses)


 we can choose literals of each clauses with no conflict
(satisfiable!)
Independent Set Problem

Independent set: vertex subset of a graph G s.t. no two its


vertices are connected by an edge

• k-independent set problem is to determine whether the given


graph has an independent set of size k

• The problem is an essence of exclusion constraints


(composed only of exclusion)

• How do we prove NP-completeness?


• Reduce SAT
Simulate Selection by Exclusion

• We can simply simulate the constraint in SAT that is “we can


choose exactly one of xi or ¬ xi, for each i
 prepare vertices of xi and ¬ xi, and make edge between them
 an n independent set from these pairs is an assignment

• Next issue is satisfiability for clauses


xi ¬ xi
• We want to simulate, like,
“we can choose a vertex corresponding to a clause, if one of the
literal is in the assignment”
how do we do this by making graph?
Representing Clauses

• We first, simply, connect a clause Ci and its literals


x4
• If there is a literal in the assignment,
we can not choose the clause Ci Ci x1
¬ x3
• If Ci is not satisfied, we can choose Ci,
( not have to choose)

• … This is the opposite of what we want to do


 if everything is reversed, just give size constraint, and done
Reversing the Selection
• Let’s consider the reverse
 we consider the complement of the assignment, so we choose
the negation of the assignment

x4
• When a literal of clause Ci is chosen
 Ci is disabled Ci x1
¬ x3
• When no literal of Ci is chosen
 Ci is disabled

• Now, we can not choose Ci in both cases (failed!!!)


• One more idea is needed
Transforming Clauses
• Since we want choose a clause if one of its literals is chosen,
we prepare vertices for each literal

• We choose one of these vertices, thus we make them a clique

• … then, we can choose one of Ci x4


clause vertices when one of its literals
is chosen Ci x1
Ci ¬ x3
Summary of Reduction
• Reduce an instance of SAT
Input: variable x1,…,xn, clauses C1,…,Cm composed of xi, ¬ xi’s

• Construct the following graph


vertex set: literals xi, ¬ xi, pairs (Ci, xj) of clause Ci and its literal
xj
edge set : {xi, ¬ xi} for any i, {(Ck, xi), xi} for any Ck and xi

• The graph has an independent set of size n+m


 the SAT instance has true assignment Ci x4

Ci x1
Ci ¬ x3
How Much Can We Do Better?
• Exponential time can be reduced to polynomial time, by deriving
good algorithm
 low degree polynomials are better

• We of course try to derive algorithms with small degrees, but


where is the limit?
 there should be some explicit limits (called lower bound)

• For example, O(n) time for inputting the problem is a trivial


lower bound
Choose a Model
• For establish lower bounds we need model of computation

• For example, consider the problem of finding the maximum


among n numbers

• In usual, we need n time to input, thus it would be a lower bound,


but it is not true for parallel computation

• Extremely, humans can double their number in constant time,


thus after increasing to n humans in O(log n) time, then we can
choose the maximum in O(log n) time
Unit of Operation
• Memory read/write and conditional jump are basic operation of
computer, thus they should be units of operations

• Under this, let’s think about a lower bound of minimum number


of operations that are clearly needed to solve the problem

Search: find the one from n numbers that is maximum, minimum


or nearest to the given k

Sort: re-order the sequence of n numbers in increasing order


The Worst Case
• There would be an observation:
“conditional jump branch the execution to two, thus if a problem has
N kinds of possible solutions, it has at least N branches in the
computation tree”
• Even if we balance the computation tree, the worst case has to pass
through at least log2 N conditional jumps

Search: find the one from n numbers that is maximum…


 n possible solutions, thus we need log2 n time

Sort: re-order the sequence of n numbers in increasing order


 n! possible solutions, thus we need log2 n! = nlog n time
Summary
• Fundamentals of compute architecture
CPU, memory, + I/O. Hierarchy of storage

• Fundamentals of algorithm
a sequence of operations with conditional branches

• Definition of order
ignoring constant factors and small degrees, to capture the global
increase by the problem sizes

• Fundamentals of complexity:
polynomiality, NP-complete, lower bounds

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy