The dispensary category was based on self-reporting by dispensary staff in call verification

If the dispensary had any online activity within the past month , it would be considered active1 . After removing inactive businesses, businesses not selling marijuana, and businesses without storefronts during the verification procedure, the 2,121 unique records were reduced to 826 businesses . These 826 dispensaries constituted the call-verified, combined database of active brick-and-mortar dispensaries in California. Validity statistics, including sensitivity, specificity, positive predictive value , and negative predictive value were computed for each of the four secondary data sources when applicable. Definitions and calculations were described in Technical Note S1. To compute validity statistics, a gold standard must be defined that can identify the “true positive” and the “true negative”. Field census is typically considered the gold standard in retail outlet research. However, it is infeasible in this study due to budget and time constraints for a statewide census. Two gold standards were adopted alternatively to answer the two research questions. To answer the first question regarding the validity of online crowd sourcing platforms in enumerating licensed brick-and-mortar marijuana dispensaries, the first gold standard was whether a record was listed in the BCC state licensing directory . To answer the second question regarding the validity of state licensing directory and online crowd sourcing platforms in enumerating active brick-and-mortar marijuana dispensaries, the second gold standard was whether a record was included in the call-verified, combined database of active dispensaries . We must also define a test that can identify the “positive test” and the “negative test” in validity statistics calculations. Two tests were conducted. The first test was whether a record was present in a given data source after online data cleaning . We used this test to examine the validity of using a single data source with simple online data cleaning for dispensary identification,vertical farming supplies an approach requiring moderate resources.

The second test was whether a record passed call verification; in other words, whether the record was verified to be an active brick-and-mortar dispensary . We used this test to examine the validity of using a single data source with simple online data cleaning plus call verification for dispensary identification, an approach requiring much more resources. To illustrate these validity statistics in the context of this study, we provide an example below . In this example, the data source of interest is Weed maps, the gold standard is whether a record on Weed maps was present in the BCC state licensing directory, and the test is whether a record was present on Weed maps after online data cleaning. Sensitivity measures the probability of a record present on Weed maps conditional on the record being included in the BCC directory, calculated as the number of records that were present on both Weed maps and the BCC directory divided by the number of records present on the BCC directory. Specificity measures the probability of a record absent on Weed maps conditional on the record being excluded from the BCC directory, calculated as the number of records that were neither present on Weed maps nor present on the BCC directory divided by the number of records excluded from the BCC directory. PPV measures the probability of a record included in the BCC directory conditional on the record being present on Weed maps, calculated as the number of records that were present on both Weed maps and the BCC directory divided by the number of records present on Weed maps. NPV measures the probability of a record excluded from the BCC directory conditional on the record being absent on Weed maps, calculated as the number of records that were neither present on Weed maps nor present on the BCC directory divided by the number of records being absent on Weed maps. You will notice that specificity and NPV cannot be calculated in this example, because we were not able to identify a “true negative”, a record that was excluded from Weed maps and also absent in the BCC directory. In fact, not all validity statistics were applicable to a combination of a gold standard and a test with the current study design . Following tobacco outlet research , we considered validity statistics 0-0.2 to be poor, 0.21-0.4 to be fair, 0.41-0.6 to be moderate, 0.61-0.8 to be good, and 0.81-1.0 to be very good. R Version 3.5.3 was used to calculate 95% confidence intervals for all the validity statistics. We computed overall statistics as well as the statistics by dispensary category and county population size . Locations of call-verified active brick-and-mortar dispensaries in California were mapped with ArcGIS Version 10.5.

A total of 2,121 business records were combined from BCC and the three online crowd sourcing platforms after online data cleaning. BCC, Weed maps, Leafly, and Yelp had 630, 811, 535, and 1,468 records included in the combined database, respectively. The overlaps across the data sources were presented in Figure S1. Only 240 records were present in all four data sources. Following call verification, the 2,121 records were reduced to 826, which were confirmed to be active brick-and-mortar dispensaries. Among the 1,295 records removed during call verification, 56.0% were closed, 4.2% were not open yet, 38.0% were not selling marijuana, and 1.8% had no storefronts . BCC, Weed maps, Leafly, and Yelp had 486, 659, 459, and 471 records included in these 826 verified dispensaries, respectively. The overlaps across the data sources were presented in Figure S2. The 826 records included 77 recreational-only, 65 medical-only, and 684 recreational & medical dispensaries.Table 1 reports validity statistics using the BCC licensing directory as the gold standard. When the test was whether being present on each online crowd sourcing platform after online data cleaning, Leafly had good sensitivity and Weed maps and Yelp had moderate sensitivity . It indicated that 70% of the BCC licensing directory could be found on Leafly. Leafly also had very good PPV , yet Yelp’s PPV was only fair . It indicated that 83% of Leafly records were included in the BCC licensing directory. When the test was whether passing call verification, Leafly still had the highest sensitivity and PPV , and Yelp had the highest specificity and NPV . It indicated that, call-verified Leafly records performed the best for identifying truly licensed dispensaries and call-verified Yelp records performed the best for identifying truly unlicensed dispensaries in this scenario. Table 2 reports validity statistics using the call-verified, combined database as the gold standard. When the test was whether being present in each data source after online data cleaning, Weed maps had the highest sensitivity and BCC, Leafly, and Yelp all had moderate level of sensitivity ranging from .56 to .59. It indicated that 80% of the call-verified, combined database of active dispensaries could be found on Weed maps. Leafly and Weed maps had very good PPV , and Yelp’s PPV was only fair . It indicated that 86% of Leafly records were included in the call-verified, combined database of active dispensaries. When the test was whether passing call verification, sensitivity statistics remained the same as when the test was whether being present in each data source. This was because call-verified businesses in each data source were a subset of the businesses included in each data source before call verification, such that the numerators and denominators for sensitivity calculation remained the same. Yelp had the highest NPV and Leafly had the lowest NPV . It indicated that call-verified Yelp records performed the best for identifying truly not active brick-and-mortar dispensaries.Table 3 reports the agreement between BCC, online crowd sourcing platforms, and call verification in terms of the category of the 630 licensed dispensaries.

Approximately 25% of the licensed dispensaries on Weed maps and 29% of the licensed dispensaries on Leafly posted their category that disagreed with what was approved in the BCC license. Approximately 12% of the call-verified, licensed dispensaries stated their category in call verification that disagreed with what was approved in the BCC license. Most of the businesses that stated an unapproved category on online crowd sourcing platforms and/or in call verification claimed themselves to be recreational & medical when they were only licensed for recreational-only or medical-only. Table S3 quantifies category-specific validity statistics when the gold standard was whether being present in the BCC licensing directory. Leafly had the highest sensitivity in recreational-only and recreational & medical categories and Weed maps had the highest sensitivity in medical-only category,cannabis indoor greenhouse regardless of the definition of a test. Table S4 quantifies category-specific validity statistics when the gold standard was whether being present in the call verified, combined database. When the test was whether being present in each data source after online data cleaning, Weed maps had the highest sensitivity in identifying recreational-only and medical-only dispensaries, yet BCC had the highest sensitivity in identifying recreational & medical dispensaries. When the test was whether passing call verification, Weed maps overall had the highest sensitivity in all three categories. In 2019, California had 16 counties with a population size above one million and 42 counties with a population size below one million. Table S5 reports validity statistics by county population size when the gold standard was whether being present in the BCC licensing directory. Leafly had the highest sensitivity regardless of test definition and county population size. Table S6 reports validity statistics by county population size when the gold standard was whether being present in the call-verified, combined database. Regardless of test definition, Weed maps had the highest sensitivity in more populated counties and BCC had the highest sensitivity in less populated counties. This study is the first to assess the validity of secondary data sources in identifying brick and-mortar marijuana dispensaries across a large state. We reported the validity of online crowd sourcing platforms in enumerating licensed dispensaries and the validity of state licensing directory and online crowd sourcing platforms in enumerating active dispensaries. Regarding the validity of using online crowd sourcing platforms in identifying the BCC licensing directory, all three online crowd sourcing platforms were able to include over 50% records in the BCC directory, with Leafly containing the largest number of licensed dispensaries . These findings suggested that the online crowd sourcing platforms could serve as a reasonable proxy for the licensing directory. It evidences the validity for many existing and future studies to utilize online crowd sourcing platforms for dispensary identification, especially if a licensing system is not open to the public or is updated infrequently.

It should be noted, however, that the dispensary category registered in the BCC directory may be mismatched with the “de facto” category in which dispensaries operated. Over 25% licensed dispensaries on online crowd sourcing platforms posted their category that disagreed with the BCC license and over 10% call-verified, licensed dispensaries stated their category in call verification that disagreed with the BCC license. Particularly, most of such dispensaries claimed themselves to be recreational & medical while they were only licensed for recreational only or medical only. Such disagreement might be intentionally used as a means of attracting customers or be reflective of how dispensaries operate in practice. Regarding the validity of using the state licensing directory in identifying active brick and-mortar dispensaries, over 20% licensed dispensaries did not pass call verification. This indicated that business licenses may not accurately represent businesses’ operation status in reality. For instance, a business may have been closed before its license is expired and a business may not be open yet even though its license has been approved. In the final 826 call-verified dispensaries, 58.8% were included in the BCC licensing directory. This indicated that the BCC directory failed to capture unlicensed dispensaries, which accounted for over 40% of the total active dispensaries in California. Solely relying on a state licensing directory would overestimate active, licensed dispensaries whereby overlook active, unlicensed dispensaries. Regarding the validity of using online crowd sourcing platforms in identifying active brick-and-mortar dispensaries, Weed maps had a nearly very good sensitivity; it contributed 80% of the records in the final call-verified, combined database. It had the highest sensitivity in identifying recreational-only and medical-only dispensaries. It was also the most sensitive database in identifying dispensaries in more populated counties, which were mostly urban areas. The high concentration of dispensaries and intense competition in urban areas may motivate more businesses to promote themselves on this highly visible and popular platform . Leafly had the lowest sensitivity in identifying active dispensaries. It also had the lowest sensitivity in identifying all three dispensary categories. It is likely because the costs of advertising on Leafly were substantially higher than other online crowd sourcing platforms specialized in marijuana .