LibGuides: LA 6940.003: Special Topics in Landscape Architecture (Spring 2023): Organizing your data

Organizing your data

There is no single "right" way to organize your data. The goal is to organize it in such a way that you are easily able to find the data on your computer, use it in your maps without running into problems, and be able to cite where the data came from.

One way to organize a collection of data is by theme, something like this:

- my_project
    - data
        - boundaries
            - ny_counties.shp
            - ny_towns.shp
        - energy
            - gas
                - pipelines.shp
                - gas_wells.shp
            - solar
                - solar_farms.gpkg
                - solar_potential.tif
        - landcover
            - nlcd_2019.tif
            - cropland_2022.tif
        - transportation
            - ny_streets.shp
            - historical_railroads.shp
    - my_arcmap_file.mxd
    - my_arcgispro_file.aprx
    - README.txt

Notice that the main "my_project" folder contains everything: all the data files as well as a software-specific project file like an .mxd (ArcMap) or .aprx (ArcGIS Pro) file. This helps to make sure that your project is not dependent on some other files that might exist elsewhere on your computer. Keep in mind is that your GIS project file doesn't include the data, but rather points out to where the data file is located. If you rename or move a data file in relation to the project file, that link will be broken and you'll have to spend time fixing it. So it's worth doing a little organization up front as you download datasets, before you even start to look at the data.

Also notice that there are no spaces or punctuation in the folder or file names (except for underscores and the period before the file extension). ArcMap especially has trouble with any characters other than letters, numbers, and underscores -- sometimes things will work for a while, but then suddenly give you an error when you try to run a processing tool (and the error won't actually say that the problem is the filename!)

Sometimes data comes in larger datasets that you need. For example, you might download a dataset of all gas wells in the United States, but just need those in New York. When saving a subset to a new file, it's a good idea to add some information to the filename, such as "counties_ny.shp". Similarly, if you need to reproject a file to a UTM coordinate system, you might add "_UTM" to the filename to distinguish it from the original. You may want to keep the original versions around just in case you need to create a new or different derivatives.

Finally, notice the "README" file. This is a great place to keep notes about the datasets and where they came from. If possible, include the URL to the specific webpage from where you downloaded the data. If you make a derivative file, make a note about the original dataset. If you examine a dataset, and then decide not to use it, make a note about your reason ("only includes active wells", "missing data for my region", or whatever). It's even helpful to keep notes about places you searched for data, but didn't find what you were looking for, so that you don't waste time searching the same sites over and over. When you are finishing your project, it will be important to cite the data sources you used (see the "Citing data sources" tab), so recording this information up front will save a lot of time later. Many citation styles include the date accessed, so be sure to keep a note of that as well.