By Sarianne Gruber
The Open Government Initiative has spurred me on to start exploring the health-related datasets that have become available on www.data.gov/health. Recently, I had the chance to hear Todd Park, the -co-founder of athenahealth and Castlight Health , former HHS Chief Technology Officer and now the second Chief Technology Officer of the US, champion the “Data Liberacion” project. There are over 250 free access databases online. This makes it advantageous for healthcare innovators and entrepreneurs to capitalize on this information. To read more on the Open Government Partnership go to Park’s September 20th 2012, blog at www.hhs.gov/open.
I decided to explore the publically available Medicare Claims data. A downloadable Excel file of Total Counts of Claims Received by Region, State and Fiscal Year seemed like a good start to look at the distribution of data via maps and diagrams. The Office of Medicare Hearing and Appeals (OMHA) supplies this data file which I will use for demonstration purposes. These claims have actually been appealed through two levels after the initial determination before they reach OMHA. The process requires the claim to have been appealed to Level 1 and found to be unfavorable (wholly or in part), appealed to Level 2 and found unfavorable (wholly or in part). OMHA adjudicates at the third level of Medicare appeals. Of note, an appeal may be made up of multiple claims. In an attempt to contain healthcare costs, there is debate as to whether claims are denied on procedures that are deemed not medically necessary. Based on claims data published in the 2008 National Health Insurance Report Card, Medicare had the highest denial rates compared to other health insurance companies (www.ama-assn.org/ama1/pub/upload/mm/368/reportcard.pdf). Other sources for Medicare datasets are: www.medicare.gov , www.cms.gov and www.resdac.org.
The database is entitled the Office of Medicare Hearing and Appeals Claims Listed by State as of January 7, 2010 and contains the following variables and categories respectively:
Region – Mid-Atlantic, Mid-West, Southern, Western and Other (for unspecified states)
State – All 50 States plus District of Columbia, Puerto Rico, Virgin Islands and Guam
Fiscal Year 06 – Claims for 2006
Fiscal Year 07 – Claims for 2007
Fiscal Year 08 – Claim for 2008
Fiscal Year 09 – Claims for 2009
Total – Total Claims summed across the four years
The raw data can be found at http://www.data.gov/communities/node/81/data_tools/333. Since 25% of 2006 claims data could not be state identified, I will limit this discussion to 2007, 2008 and 2009 data.
A first approach is to look at descriptive statistics that describe and summarize the data. This is referred to as a univariate analysis where we explore each variable separately by looking at the range of values as well as the central tendency of the values (such as means, modes and variance). The data provided is aggregated to the state level so there are limitations compared to analyzing individual or household data on the micro level. We could start with creating tables in a text format to present the information. However, visually presenting the data into graphs, chart and maps is usually easier for many people to grasp the information; and for the analyst a more creative way to convey the information with lots more thought provoking impact. There is easy to access several excellent and free data visualization software on the web. I demonstrate examples with three of these products: Tableau Public, Visualize Free and IBM’s Many Eyes.
In Visualizing Data, You Can Do Better than the Basic Bar Chart
Creating a frequency distribution of which states have the highest (and or lowest) number of level 3 Medicare appeal submitted to OMHA per year would be a useful place to begin. Graphing a bar chart with over 50 frequency bars could show ranking or alphabetical order but not regional relationships. Using maps with color ranges, a type of heatmap, to display which states have the highest concentration levels is usually the favored choice. The top five leading states for 2007 were California (42,469), Florida (17,754), Pennsylvania (8,685), Montana (6,434) and New York (5,752). This a static view of an interactive map from Visualizefree.com, the web version allows you to click on the state to review the number of OMHA claims.
[av_one_full first av_uid=’av-10mrp4′]
[av_image src=’https://www.movedbymetrics.com/wp-content/uploads/2013/11/OpenData-Figure-1-Medicare-claims.jpg’ attachment=’507′ align=’center’ animation=’no-animation’ link=” target=” av_uid=’av-uzx88′]
[/av_one_full]
One can also look at the data on regional level and compare the average number of OMHA claims per state for the three year period. The Midwestern region (as defined by the Centers of Medicare and Medicaid Services) is made up of eighteen states listed below. A Bubble Chart is a great and fun visual for data sets with lots of values, and it is best used to compare the magnitude of values for a single variable with a wide range of values. There is also a Bubble Scatterplot which gives a three dimensional perspective for comparing three variables simultaneously. Using IBM’s Many Eyes, Figure 2 shows the top three states with the highest average of contested claims is Pennsylvania (12,018), New York (9,408) and New Jersey (6,403). It appears that the diameters of the bubbles are scaled on a square root or logarithmic scale.
[av_one_full first av_uid=’av-s8a1c’]
[av_image src=’https://www.movedbymetrics.com/wp-content/uploads/2013/11/OpenData-Figure-2-Av-Number-of-Medicare-Claims-Midwestern.jpg’ attachment=’508′ align=’center’ animation=’no-animation’ link=” target=” av_uid=’av-mr6cw’]
[/av_one_full]
Plotting graphs also allows us to see the distribution of values and to identify the outermost points. Let’s say we are curious to know which states within each region had the highest increase in claims from 2007 to 2008 and from 2008 to 2009. Tableau Public allows the user to be very interactive with the data. In Figure 3, the visual illustrates the concentration of states having just a plus or minus 2% change in claims as well as those that had 4% to 10% increase in claims from the previous year. The program permits the identification of all points via point and click on the graph as shown below. There is no need to have a color and legend for each state. One may want to further investigate why Wisconsin, South Carolina, Idaho, New Jersey, North Dakota, and Oregon have the highest number of appealed Medicare claims for this region.
[av_one_full first av_uid=’av-1l714′]
[av_image src=’https://www.movedbymetrics.com/wp-content/uploads/2013/11/OpenData-Figure-3-Percent-Change-in-No-of-Claims.jpg’ attachment=’509′ align=’center’ animation=’no-animation’ link=” target=” av_uid=’av-9iuzc’]
[/av_one_full]
Storytelling with Numbers
Visualization techniques help to tell the story about your data. It increases data comprehension by looking at the trends and patterns in the data. Use cognitive maps. In our Medicare claims data example, an interactive, color-toned map of the United States makes easy recognition and recall of states with the highest claims. Also consider animating your data with a bubble chart. The size of bubble in Figure 2 made it obvious which states had the highest average number of claims. Creativity with color and shapes in graphs and scatterplots captures the reader’s attention. It can make comprehension easier and feel more real. Encourage the viewer to take closer look at the information and think about the causal dynamic responsible for the representation. The dispersion of points in Figure 3 promotes questioning and possibly some additional research to explain the apparent changes in appealed claims levels for individual states.
If you are working with data and need assistance to get better initial results and not sure what critical questions you should be asking or the right tools to use, you are welcome to contact Georgette or me for a consultation.