Measurement error : Kappa and it's derivation

ดังที่กล่าวในภาพรวมของ Measurment precision test แล้วว่ากรณีผลลัพท์เป็น dichotomous/nominal เราจะใช้ kappa test

หลักการของ Kappa test คือ การดูคุณสมบัติ Precision ของ test หรือ observer ที่ "agreement that beyond by chance alone" หรือ "มีความไปด้วยกันมากกว่าที่คาด"

kappa = Observed % agreement - Expected % agreement
100 % - Expected % agreement

โดยทั่วไปถือว่าเกิน 0.4 ใช้ได้ ถ้าเกิน 0.8 ถือว่าดี ( Altman,1991)

Agreement ที่เกิดขึ้น by chance alone เรียกอีกอย่างว่า " Expected agreement" ใช้หลักการความน่าจะเป็น ( Bayes' probability) อย่างเดียว เหมือนโยนหัวโยนก้อย |ดังนั้น ณ สมมติฐานว่า rater ต่าง independent ต่อกัน โอกาสที่ผลจากสอง rater ตรงกันจะเป็น (P1)(P2) และ (1-P1)(1-P2)

Expected agreement between rater A and B = (Probability D+ tell by raterA)(Probabiltiy D+ tell by raterB) + (Probability D- tell by raterA)(Probability D- tell by raterB)

สมมติว่า รังสีแพทย์สองคนอ่าน Chest X-ray ตรวจสุขภาพนักศึกษาก่อนเข้าเรียน ซึ่ง เป็นคนปกติเกือบ 100% ระดับความไปด้วยกันที่คาดไว้ต้องสูงมาก เพราะต่อให้แพทย์สองคนนี้ใช้ criteria ตัดสินไม่เหมือนกันเลย หรือหลับตาอ่าน ก็ยังมีโอกาสเกิด agreement สูง
ขณะที่หากเป็นผู้ป่วยมาด้วยไข้ ไอเรื้อรัง โอกาสฟิล์มปกติ: ไม่ปกติแบบ 50:50 ความคาดหวังให้อ่านผลได้ตรงกัน ย่อมน้อยกว่า

กรณี Prevalence ปกติต่อไม่ปกติต่างกันมาก

	Rater B +	-	Toatal
RaterA +	3	1	4
-	2	94	96
Total	5	95	100

Expected agreement ระหว่าง rater A และ B
= (0.04 X 0.05) + ( 0.96 X0.95)
= 0.92

Observed agreement ระหว่าง rater A และ B
= 0.03 + 0.94
= 0.97

kappa = 0.97 -0.92 = 63%
1.0 - 0.92

------------------------------------------------------------------

กรณี Prevalenc ปกติต่อไม่ปกติ 50:50

	Rater B +	-	Toatal
RaterA +	50	2	52
-	1	47	48
Total	51	49	100

Expected agreement ระหว่าง rater A และ B
= (0.52 X 0.51) + ( 0.48 X0.49)
= 0.50

Observed agreement ระหว่าง rater A และ B
= 0.50+ 0.47
= 0.97

kappa = 0.97 -0.50 = 94%
1.0 - 0.50

สังเกต
1. ณ ระดับ observed agreement เดียวกัน หาก expected agreement ต่ำกว่า (เนื่องจาก Prevalence ต่ำกว่า) kappa จะสูงกว่า กล่าวคือ rater สมควรได้รับ credit ความสามารถในการไปด้วยกันมากกว่า

2.ทั้งสองกรณี มี Observed agreement ในระดับสูง เนื่องจากมี " Trend" ในการวินิจฉัยไปในทางเดียวกัน คือ กรณีแรกต่างก็วินิจฉัยปกติ (D-)มากกว่าทั้งคู่ กรณีที่สองต่างก็วินิจฉัยผิดปกติ (D+)มากกว่าทั้งคู่
หาก Trend ไปคนละทางจะทำให้ได้ค่า Observe agreement ต่ำกว่า 50% ดังตัวอย่างด้านล่าง raterA วินิจฉัย D-มาก ขณะที่ raterB วินิจฉัย D+ มาก

	Rater B +	-	Toatal
RaterA +	9	1	10
-	61	29	90
Total	70	30	100

โดยสรุปค่าของ kappa จะเยอะถ้า
1. "Trend" ในการวินิจฉัยไปในทางเดียวกัน (balance agreement)
2. Prevalence ไม่เยอะมากหรือน้อยมาก
3. Weighted kappa..(จะกล่าวต่อไป)
ดังนั้น ค่า Kappa จึงไม่ค่อยเหมาะในการเปรียบเทียบระหว่าง study

Unweighted VS weighted kappa:

กรณีผลลัพท์ที่เปรียบเทียบเป็น ordinal scale กล่าวคือมีการเรียงระดับน้อยไปมาก เช่น Pathologic gradding การใช้ unweighted kappa จะไม่ยุติธรรม
เช่น การที่ rater A กับ rate B ลงความเห็น "definited normal vs definite cancer" กับ " atypical vs definite cancer" ต่างก็ไม่ตรงกัน แต่จะเห็นว่าคู่ความเห็นที่สองมีความใกล้เคียงกว่าความเห็นแรก สมควรได้รับ credit มากกว่า

ในกรณี ordinal scale เราจึงใช้ Weighted kappa ซึ่งวิธีการให้น้ำหนัก credit มีสามแบบคือ Linear , custom และ quadritic

Linear : Partial agreement ได้เครดิตต่างจาก complete agreement (1) และ complete disagreement (0) พอๆ กัน
มีสูตรการคิดน้ำหนัก ของ row i column j = |i-j| /(c-1) เมื่อ c = category of result แล้วลบด้วย 1
เช่น แถว 2 คอร์ลัมน์ 1 , c = 3 category = |2-1| / (3-1) = 1/2 ลบด้วย 1 =1/2

	Rater B normal	atypical	cancer
RaterA normal	1	1/2	0
atypical	1/2	1	1/2
cancer	0	1/2	1

Custom : Partial agreement ได้เครดิตต่างจาก complete agreement (1) และ complete disagreement (0) ไม่เป็น proportion ตามแบบด้านบน ตัวอย่างของการคิด linear weighted kappa

	Rater B normal	atypical	cancer
RaterA normal	1	2/3	0
atypical	2/3	1	2/3
cancer	0	2/3	1

Quadritic : Partial agreement ได้เครดิตต่างจาก complete agreement (1) และ complete disagreement (0) พอๆ กัน
สูตรการคิดน้ำหนัก ของ row i column j =     |i-j|  "ยกกำลังสอง"   แล้วลบด้วย 1
                                                           c - 1
เช่น แถว 2 คอร์ลัมน์ 1  , c = 3 category = |2-1| / (3-1) = 1/2  เมื่อยกกำลังสอง = 1/4   , เอา 1 ลบ = 3/4

	Rater B normal	atypical	cancer
RaterA normal	1	3/4	0
atypical	3/4	1	3/4
cancer	0	3/4	1

..มาดูตัวอย่างการคิด Linear weighted stata กันคะ..

กลุ่ม complete agreement ที่จะได้ credit เต็มๆ 1 คือ

Observed = 35 +4+25

Expected
=   (Probability normal tell by raterA)(Probabiltiy normal tell by raterB)    +
     (Probability atypical tell by raterA)(Probability atypical tell by raterB) +
     (Probability cancer tell by raterA)( Probability cancer tell by ragerB)
=   (45/150*70) + (55/150*40) +(45/150*30)

กลุ่ม parial agreement ที่จะได้ credit แค่ 1/2 คือ

Observed = 2 +32+2+0

Expected = (45/150*10) + (55/150*70) +(55/150*30) + (45/150*10)

กลุ่ม complete disagreement ที่ไม่ได้ credit เลย เป็น 0

Observed = 1+2
Expected= (45/150*30) + (45/150*70)

เราสามารถหา kappa ด้วย program stata ได้ แต่ต้องดัดแปลงตาราง 3X3 ให้กลายเป็น 9 แถวของ 2 rater ดังแสดงทางซ้ายมือ
เนื่องจาก stata ชอบตัวเลขมากกว่าตัวอักษร (string) จึงต้องเล่นแร่แปรธาตุก่อนเล็กน้อย ด้วยการแปลง N,B,C ให้เป็นตัวเลข 1,2,3 แทน

Unweighted kappa

. kap rada radb [fw= freq]

             Expected
Agreement   Agreement     Kappa   Std. Err.         Z      Prob>Z
-----------------------------------------------------------------
50.00%      33.42%        0.2491     0.0593       4.20      0.0000

Linear weighted kappa

. kap rada radb [fw= freq], w(w)

Ratings weighted by:
   1.0000   0.5000   0.0000
   0.5000   1.0000   0.5000
   0.0000   0.5000   1.0000

             Expected
Agreement   Agreement     Kappa   Std. Err.         Z      Prob>Z
-----------------------------------------------------------------
73.83%      53.29%       0.4397     0.0668       6.59      0.0000

Quadritic weighted kappa

. kap rada radb [fw= freq], w(w2)

Ratings weighted by:
   1.0000   0.7500   0.0000
   0.7500   1.0000   0.7500
   0.0000   0.7500   1.0000

             Expected
Agreement   Agreement     Kappa   Std. Err.         Z      Prob>Z
-----------------------------------------------------------------
85.74%      63.23%     0.6123     0.0816       7.50      0.0000

Custom kappa (ต้องตั้งค่าน้ำหนักก่อน)

. kapwgt xm 1 \ .67 1 \ 0 .67 1

. kap rada radb [fw=freq], wgt(xm)

Ratings weighted by:
   1.0000   0.6700   0.0000
   0.6700   1.0000   0.6700
   0.0000   0.6700   1.0000

             Expected
Agreement   Agreement     Kappa   Std. Err.         Z      Prob>Z
-----------------------------------------------------------------
81.93%      60.05%     0.5477     0.0754       7.26      0.0000

Measurement error : Kappa and it's derivation

ความเห็น

บทความในวันเดียวกัน