資料驗證工作坊議程
工作坊-5

主旨

「數據驗證」被定義為檢查所有收集的資料的完整性和合理性,以及消除錯誤的值。 此步驟將「原始數據」轉換為「經驗證的數據」。 然後處理驗證的數據以生成分析所需的報告。

醫院現在設有部門或人,被指定為負責「品質」,通常有稱為“醫療品質中心”。 不幸的是,在較小的醫院這可能限於一人“專員”;大多數的品管人員都沒有受過相關培訓,無論在大學或在畢業後的教育。 因此,當他們想要找臨床人員討論如何改善醫院指標時,就被輕視,只是被告訴“資料有問題”。

針對這一點,衛福部和醫策會在醫院評鑑和P4P稽核期間引入了“數據驗證”作為要跟踪的項目。

目的

  • 了解醫院指標相關的收集和驗證鏈。
  • 使用醫院資料庫中的示例對無效資料進行測試。
  • 實踐如何編寫有關資料驗證結果的例行報告。
  • 設計一個用於某指標(從自己醫院指標平台所選的)持續地驗證其資料的計畫。

對象 任何以下

  • 與醫院指標的資料收集和驗證有關的人
  • 醫院評鑑期間負責回答關於指標資料驗證問題的人
  • P4P稽核期間負責回答關於指標資料驗證問題的人
  • 被告知“資料有問題”又被臨床人員忽視的人

先決條件

  • 知道如何把文字檔匯入 Excel 工作表
  • 在 Excel 工作表上命名單元格範圍(例如 “A1:A543” 作為名為 “age” 的欄)
  • 知悉 Excel 基本函數
    如何輸入陣列 (array) 公式
  • 因為課程中使用更多函數,建議您提前閱讀:VALUE、MATCH、EXACT、IFERROR、ISNUMBER、TEXT 並學習如何訪問 Excel 幫助系統

上課地點:彰基福懋大樓地下 (B1)
報名時段:11月中開始(將另外公告) 11月9日開始, 報名網站
學員費用:兩仟元/天(包含便當)
重要提醒:資料都在線上,不另外提供講義(教室備有無線網路)
重要提醒:為響應環保及節能滅碳,本次活動提供開水,恕不提供紙杯。
備註:實際議程以當天資料為主
時間 模式 標題
08:30~09:00 報 到 • 大家都簽到,亦彰基體系醫院和輔導集團醫院員工:刷卡
• 領取學費的收據
• 領取便當的優惠券
• 領取網上教材資料密碼(教室備有無線網路)
09:00~09:15 講 課 (0) 品質指標平台及資料驗證:前言
09:15~09:30 講 課 (1) 資料收集和驗證的概貌
09:30~10:00 講 課 (2) 後端:資料庫設計(為非資訊部人員)
如何防止儲存時刻的錯誤
10:00~10:30 講 課 (3) 前段:如何進行資料清理(已在資料庫裡)
找出和處理與手工收集相關的問題
Excel 為資料驗證的好工具
10:30~11:00 講 課 (4) 指標系列:找出和處理指標定義所排除的元素
11:00~11:15 休息  
11:15~12:00 實踐 在設計和資料收集期間進行驗證
讀取描述(陌生)指標的簡短文章以分析其資料庫的需求。
確定要收集的資料庫欄位和各欄位的類型。
釐清資訊工程師寫程序的時候該如何防止資料輸入錯誤。
12:00~13:00 領取便當  
13:00~14:00 實踐 下載曾經收集資料的示例並使用 Excel 驗證和清理之
以例行資料驗證報告格式總結您的發現
14:00~15:00 實踐 選擇您親自處理的指標之一
規劃如何在其未來的資料收集中納入驗證作業
15:00~16:00 論壇 各組分享作業經驗。
老師講評、輔導。
16:00~16:30 合照  
16:30~16:30 賦歸

 

{Computer} http://archive.oreilly.com/pub/a/network/excerpt/spcookbook_chap03/
Recipe 3.1: Understanding basic data validation techniques
Assume all input is guilty until proven otherwise
Prefer rejecting data to filtering dta
Perform data validation both at input points and at the component level
Do not accept commands from the user unless you parse them yourself
Beware of special commands, characters, and quoting
Make policy decisions based on a "default deny" rule
You can look for a quoting mechanism, but know how to use it properly
When designing your own quoting mechansims, do not allow escapes
The better you understand the data, the betteer you can like it

{pingji batch, cch roc dates as number and unreal date combinations}
https://en.wikipedia.org/wiki/Data_cleansing (wikipedia: good outline)
Data cleansing, data cleaning, or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and then replacing, modifying, or deleting this dirty data or coarse data.[ Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting. After cleansing, a data set will be consistent with other similar data sets in the system.
The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores.
Data cleansing differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at entry time, rather than on batches of data.
The actual process of data cleansing may involve removing typographical errors or validating and correcting values against a known list of entities. Some data cleansing solutions will clean data by cross checking with a validated data set. {my satisfaction for doctor and dept codes}
The validation may be strict (such as rejecting any address that does not have a valid postal code) or fuzzy (such as correcting records that partially match existing, known records).
Also data enhancement, where data is made more complete by adding related information, is a common data cleansing practice. For example, appending addresses with phone numbers related to that address.
Data cleansing may also involve activities like, harmonization of data, and STANDARDIZATION {my capitals} of data. For example, harmonization of short codes (st, rd, etc.) to actual words (street, road, etcetera). Standardization of data is a means of changing a reference data set to a new standard, ex, use of standard codes.
Data quality: validity, decleansing, accuracy, completeness, consistency, uniformity [integrity]


excel duplicate rows https://support.office.com/en-us/article/Top-ten-ways-to-clean-your-data-2844b620-677c-47a7-ac3e-c2e157d1db19
The basic steps for cleaning data are as follows:
Import the data from an external data source.
Create a backup copy of the original data in a separate workbook.
Ensure that the data is in a tabular format of rows and columns with: similar data in each column, all columns and rows visible, and no blank rows within the range. For best results, use an Excel table.
Do tasks that don't require column manipulation first, such as spell-checking or using the Find and Replace dialog box.
Next, do tasks that do require column manipulation. The general steps for manipulating a column are:
Insert a new column (B) next to the original column (A) that needs cleaning.
Add a formula that will transform the data at the top of the new column (B).
Fill down the formula in the new column (B). In an Excel table, a calculated column is automatically created with values filled down.
Select the new column (B), copy it, and then paste as values into the new column (B).
Remove the original column (A), which converts the new column from B to A.
To periodically clean the same data source, consider recording a macro or writing code to automate the entire process. There are also a number of external add-ins written by third-party vendors, listed in the Third-party providers section, that you can consider using if you don't have the time or resources to automate the process on your own.