Hello all,
I am interested in finding out how people are dealing with the large data files in a network environment. Our IT people aren't crazy about me dumping terabytes of data on the servers. I'm curious what others are doing as far as storage and back up. What works and doesn't.
Thanks!
Ed
Data Handling - Storage and Backup
-
- I have made 20-30 posts
- Posts: 28
- Joined: Tue Feb 24, 2015 10:22 pm
- 9
- Full Name: Edward M Reading
- Company Details: MNS Engineers
- Company Position Title: Supervising Project Surveyor
- Country: United States
- Linkedin Profile: No
- Location: San Luis Obispo, CA
- jcoco3
- Global Moderator
- Posts: 1724
- Joined: Sun Mar 04, 2012 5:43 pm
- 12
- Full Name: Jonathan Coco
- Company Details: Consultant
- Company Position Title: Owner
- Country: USA
- Linkedin Profile: No
- Has thanked: 70 times
- Been thanked: 157 times
Re: Data Handling - Storage and Backup
Hi Ed,
I think you will find these post relevant:
http://www.laserscanningforum.com/forum ... =43&t=4703
http://www.laserscanningforum.com/forum ... =57&t=7395
http://www.laserscanningforum.com/forum ... =49&t=7870
From what I have seen and heard, people are doing many different things for storage. Everything from a bunch of portable flash drives, to incredibly expensive high reliability systems that have site to site mirroring. My suggestion is to use a solution that has redundancy but fits your current project size and workload. Also give yourself room to grow by budgeting for larger and more robust storage solutions in the future. The data can really stack-up over time and from my experience you want to keep most of it indefinitely.
Wait here is one more link http://www.laserscanningforum.com/forum ... t=qnap+nas Sorry for all the reading, but it is all good stuff that you will want to know.
I think you will find these post relevant:
http://www.laserscanningforum.com/forum ... =43&t=4703
http://www.laserscanningforum.com/forum ... =57&t=7395
http://www.laserscanningforum.com/forum ... =49&t=7870
From what I have seen and heard, people are doing many different things for storage. Everything from a bunch of portable flash drives, to incredibly expensive high reliability systems that have site to site mirroring. My suggestion is to use a solution that has redundancy but fits your current project size and workload. Also give yourself room to grow by budgeting for larger and more robust storage solutions in the future. The data can really stack-up over time and from my experience you want to keep most of it indefinitely.
Wait here is one more link http://www.laserscanningforum.com/forum ... t=qnap+nas Sorry for all the reading, but it is all good stuff that you will want to know.
-
- I have made 20-30 posts
- Posts: 28
- Joined: Tue Feb 24, 2015 10:22 pm
- 9
- Full Name: Edward M Reading
- Company Details: MNS Engineers
- Company Position Title: Supervising Project Surveyor
- Country: United States
- Linkedin Profile: No
- Location: San Luis Obispo, CA
Re: Data Handling - Storage and Backup
Thanks a lot Jonathan!
Oddly, I did a search for "Data" and "Storage" and couldn't find them.
Oddly, I did a search for "Data" and "Storage" and couldn't find them.
-
- V.I.P Member
- Posts: 201
- Joined: Sun Oct 27, 2013 6:50 pm
- 10
- Full Name: Arash Yaghoubi
- Company Details: Hypsometric
- Company Position Title: Director of Cartography
- Country: USA
- Linkedin Profile: No
- Been thanked: 3 times
Re: Data Handling - Storage and Backup
another solution is to fire everyone and use a gigabit ethernet crossover cable straight to the server.
- Dedken
- V.I.P Member
- Posts: 370
- Joined: Fri Mar 15, 2013 10:28 am
- 11
- Full Name: Kenneth Bazley
- Company Details: Sir Robert McAlpine
- Company Position Title: Senior Geospatial Engineer for HDS
- Country: UK
- Linkedin Profile: Yes
- Location: London
Re: Data Handling - Storage and Backup
Dropbox. That way your IT department doesn't even have to touch the data.
All views are my own and are not representative of my employer, The King, God or anyone else for that matter.
"we need an instrument, to take a measurement" - I.MacKaye 1992
"we need an instrument, to take a measurement" - I.MacKaye 1992
- andrew.grigg
- V.I.P Member
- Posts: 170
- Joined: Tue Aug 07, 2012 1:29 pm
- 11
- Full Name: Andrew Grigg
- Company Details: 40SEVEN
- Company Position Title: Senior Land Surveyor
- Country: England
- Linkedin Profile: Yes
Re: Data Handling - Storage and Backup
Wow, that must take hours to upload projects. You must have a decent upload speed. I suppose, you could leave a PC on over the weekend whilst it uploads...Dedken wrote:Dropbox. That way your IT department doesn't even have to touch the data.
- Dedken
- V.I.P Member
- Posts: 370
- Joined: Fri Mar 15, 2013 10:28 am
- 11
- Full Name: Kenneth Bazley
- Company Details: Sir Robert McAlpine
- Company Position Title: Senior Geospatial Engineer for HDS
- Country: UK
- Linkedin Profile: Yes
- Location: London
Re: Data Handling - Storage and Backup
We haven't done it yet but it's in the pipeline (probably)... It would be done from the main hub. When you buy professional Dropbox accounts you get preferential upload speeds - they can afford to throttle the free acounts. That's what I was told by IT anyway!
All views are my own and are not representative of my employer, The King, God or anyone else for that matter.
"we need an instrument, to take a measurement" - I.MacKaye 1992
"we need an instrument, to take a measurement" - I.MacKaye 1992
-
- I have made 10-20 posts
- Posts: 10
- Joined: Sun Feb 15, 2015 12:32 am
- 9
- Full Name: Mark Dwyer
- Company Details: Shearspace Pty Ltd
- Company Position Title: Founder
- Country: Australia
- Linkedin Profile: Yes
- Contact:
Re: Data Handling - Storage and Backup
Hello,
I'm going to say this as a former architect for massive scale data systems. When I say that, I'm talking about storing tens of petabytes per year. Some of this was as a supercomputing expert (medical, engineering, GIS and those damn physicists), some of it as a big data specialist for LiDAR (8TB raw + 2TB cooked per day, 7 days per week, 365 days per year). Lidar was compressed, but it didn't really matter ... the raw was an order of magnitude larger in size.
The only way to store the type of data you guys generate is with magnetic tape. Now, I'm not saying you use tape for everyday IO trashing but for the long term storage. Tape costs about $20 per TB for dual backups ($10 per TB single copy .. but a copy is not strictly a backup). The second tape you store offsite. Tape lasts for 25 years. It is the longest lasting storage medium that we currently know, and can prove. Once stored, it does not require electricity to keep active ... it can be stored on a shelf.
You need a tiered system. The top level ideally has a couple terabytes of fast disk (maybe SSD, I've found 10K raptors are also sufficient (and reliable) .. people have complained about slowness but when you look at the log files, you see that network switch was the bottleneck, not the disks ... you'll need a 10Gbps switch or above to notice disk performance degradation). This level is your everyday thrashing level. You work it as hard as you need. The second level has about 10x the storage but with the slower, standard, commodity HDD but an array that reflects your data 'backup' strategy (RAID of some description). The final layer is a tape system of some description. Data written to the first tier is mirrored to the second tier. Data that ages on the second tier is automatically pushes to the tape tier.
If you are a smaller organisation, you can probably remove the top tier.
Tape robots are surprisingly cheap. The expensive bit is the tape drives. When I was maintaining a system of 10TB of recording per day, 2 tape drives were needed to guarantee that data was written to two tapes daily (LTO6 will write at 160MB/s maximum). A tape drive retails roughly for $1K. It will last 5 years, minimum. After five years, the tape density will have increased 4x, justifying a purchase review.
A problem I noticed was the price of tape backup software. Stupidly, and needlessly, expensive. A days programming will do this. Laser data is very easy to push to tape. The files are large so writing to tape is actually perfect (long, continuous streams, as opposed to standard business backup which involves millions of files, each a couple of KB, lots of starting and stopping .. 'shoe shining'). If anybody wants to start a github project, I'd be happy to contribute.
All the above scales wide and deep.
I'm going to say this as a former architect for massive scale data systems. When I say that, I'm talking about storing tens of petabytes per year. Some of this was as a supercomputing expert (medical, engineering, GIS and those damn physicists), some of it as a big data specialist for LiDAR (8TB raw + 2TB cooked per day, 7 days per week, 365 days per year). Lidar was compressed, but it didn't really matter ... the raw was an order of magnitude larger in size.
The only way to store the type of data you guys generate is with magnetic tape. Now, I'm not saying you use tape for everyday IO trashing but for the long term storage. Tape costs about $20 per TB for dual backups ($10 per TB single copy .. but a copy is not strictly a backup). The second tape you store offsite. Tape lasts for 25 years. It is the longest lasting storage medium that we currently know, and can prove. Once stored, it does not require electricity to keep active ... it can be stored on a shelf.
You need a tiered system. The top level ideally has a couple terabytes of fast disk (maybe SSD, I've found 10K raptors are also sufficient (and reliable) .. people have complained about slowness but when you look at the log files, you see that network switch was the bottleneck, not the disks ... you'll need a 10Gbps switch or above to notice disk performance degradation). This level is your everyday thrashing level. You work it as hard as you need. The second level has about 10x the storage but with the slower, standard, commodity HDD but an array that reflects your data 'backup' strategy (RAID of some description). The final layer is a tape system of some description. Data written to the first tier is mirrored to the second tier. Data that ages on the second tier is automatically pushes to the tape tier.
If you are a smaller organisation, you can probably remove the top tier.
Tape robots are surprisingly cheap. The expensive bit is the tape drives. When I was maintaining a system of 10TB of recording per day, 2 tape drives were needed to guarantee that data was written to two tapes daily (LTO6 will write at 160MB/s maximum). A tape drive retails roughly for $1K. It will last 5 years, minimum. After five years, the tape density will have increased 4x, justifying a purchase review.
A problem I noticed was the price of tape backup software. Stupidly, and needlessly, expensive. A days programming will do this. Laser data is very easy to push to tape. The files are large so writing to tape is actually perfect (long, continuous streams, as opposed to standard business backup which involves millions of files, each a couple of KB, lots of starting and stopping .. 'shoe shining'). If anybody wants to start a github project, I'd be happy to contribute.
All the above scales wide and deep.