Data analysis Project using Excel and R (Health Information Management)

Data analysis Project using Excel and R (Health Information Management)Paper details:

Step 1: Business Understanding

1. There are wide discrepancies of charges and payments between institutions

a. Larger hospitals charge more and receive higher payments

b. Urban hospitals charge more, but do not receive higher payments

2. Are the variations due to excessive charging or lower payments?

a. Excess Charge = Charge/Payment

b. Cost-to-charge ratio = Payment/Charge

Step 2: Data Understanding

1. IPPS Data

a. Medicare Provider Utilization and Payment Data: Inpatient

i. Total Discharges

ii. Average Covered Charges

iii. Average Total Payments

2. Census Data- because of the size of this file this has been limited to NY ONLY – this file is text -CSV so it will have to be opened in EXCEL first.

a. 2010 ZCTA to Metropolitan and Micropolitan Statistical Areas Relationship File

i. Zipcode

ii. CBSA

Step 3: Data Preparation

Filter the IPPS file to only include NY
Add the CBSA from the Census Data file to the IPPS Data fileCopy the CBSA column and Paste Special as values only
Use VLOOKUP
Remove #N/A values- Use Find/Replace
Insert a new column
In the new column, use the IF function to recategorize the hospital geography
If the hospital has an identified CBSA, recategorize that hospital as urban
If the hospital does not have a CBSA, recategorize that hospital as rural
Copy the Geography column and Paste Special as values only
Calculate Excessive charges= Charge-Payment
Calculate Cost-to-charge Ratio (CRR) = Payment/Charge
Copy the Excess Charge and CCR columns and Paste Special as values only
Save the file as a .csv
Also, save a version of the file as a .xlsx
In the .xlsx version, click in any of the cells, format as a table (HOME – “Format as Table”)
In the .xlsx version, name the table (DESIGN – “Table Name” – enter “DRG”)
Save

Step 4: Modeling

1. Create a PIVOT TABLE of the count of hospitals for each geographic region (INSERT- PivotTable). REMEMBER: click the checkbox “Add this data to the Data Model”

2. Create a PIVOT TABLE to calculate the following for each geographic region:

a. Average Total discharges

b. Average Covered charges

c. Average Total Payments

d. Average Medicare Payments

e. Average Excess charges

f. Average Cost-to-charge ratio (CCR)

3. Use COUNTIF to count the number of rural and urban hospitals (compare these results to what is provided in a PIVOT TABLE

=COUNTIF(DRG[Geo],”Urban”)

=COUNTIF(DRG[Geo],”Rural”)

4. Use SUMPRODUCT to count the number of rural and urban hospitals that have a cost-to-charge ratio greater than or equal to 0.5 and those less than 0.5 (How should we normalize these results? Calculate the proportion!).

=SUMPRODUCT((DRG[Geo]=”Urban”)*(DRG[CCR]<0.5))

=SUMPRODUCT((DRG[Geo]=”Urban”)*(DRG[CCR]>=0.5))

=SUMPRODUCT((DRG[Geo]=”Rural”)*(DRG[CCR]<0.5))

=SUMPRODUCT((DRG[Geo]=”Rural”)*(DRG[CCR]>=0.5))

5. Create a PIVOT TABLE of the count of each MS-DRG

6. Create graphs to depict the above information (INSERT – CHARTS)

7. Open R

8. Open R commander

a. Type the following into R:

library(Rcmdr)

9. Import the data into R Commander using the following script:

dataset<- read.csv(file.choose())

Locate the IPPS csv data file and click “OK”

10. Activate the dataset in R commander

a. Click <No active dataset> and find “dataset”

b. Confirm the number of rows and columns as compared to the original dataset

11. Obtain a summary of the following numeric data (Statistics – Summaries – Numeric Summaries – Hold down Ctrl and click the variable names shown below – Click OK):

a. Average Covered charges

b. Average Total Payments

c. Average Medicare Payments

d. Excess charges

e. Cost-to-charge ratio (CCR)

12. Create two graphs of the “Plot of means” to compare Total Average Charges, Total Average Payment, Excess Charge, and CRR by geographic location

13. Use a two-sample T-test to determine if there are significant differences in the following data between rural and urban hospitals:

a. Count of hospitals

b. Total discharges

c. Covered charges

d. Total Payments

e. Medicare Payments

f. Excess charges

g. Cost-to-charge ratio (CCR)

Step 5: Evaluation

1. Summarize the findings

a. Are there confounding variables that we should have considered in our analysis?

i. Hint: Frequency of MS-DRG codes for each geographic location

Step 6: Deployment

1. How would these findings be relevant to your organization and what might your organization do with this sort of information?