3 bookmaker.py is a helper for optimizing PDFs of books for the production of small self-printed, self-bound physical books. Towards this goal it offers various PDF manipulation options that may also be used indepéndently and for other purposes.
8 Concatenate two PDFs A.pdf and B.pdf to COMBINED.pdf:
9 bookmaker.py --input_file A.pdf --input_file B.pdf --output_file COMBINED.pdf
11 Produce OUTPUT.pdf containing all pages of (inclusive) page number range 3-7 from INPUT.pdf:
12 bookmaker.py -i INPUT.pdf --page_range 3-7 -o OUTPUT.pdf
14 Produce COMBINED.pdf from A.pdf's first 7 pages, B.pdf's pages except its first two, and all pages of C.pdf:
15 bookmaker.py -i A.pdf -p start-7 -i B.pdf -p 3-end -i C.pdf -o COMBINED.pdf
17 Crop each page 5cm from the left, 10cm from the bottom, 2cm from the right, and 0cm from the top:
18 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --crops "5,10,2,0"
20 Include all pages from INPUT.pdf, but crop pages 10-20 by 5cm each from bottom and top:
21 bookmaker.py -i INPUT.pdf -c "10-20:0,5,0,5" -o OUTPUT.pdf
23 Same crops for pages 10-20, but also crop all pages 30 and later by 3cm each from left and right:
24 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "10-20:0,5,0,5" -c "30-end:3,0,3,0"
26 Rotate by 90° pages 3, 5, 7; rotate page 7 once more by 90% (i.e. 180° in total):
27 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --rotate 3 -r 5 -r 7 -r 7
29 Initially declare 5cm crop from the left and 1cm crop from right, but alternate direction between even and odd pages:
30 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "5,0,1,0" -s
32 Quarter each OUTPUT.pdf page to carry 4 pages from INPUT.pdf, draw stencils into inner margins for cuts to carry binding strings:
33 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --nup4
35 Same --nup4, but define a printable-region margin of 1.3cm to limit the space for the INPUT.pdf pages in OUTPUT.pdf page quarters:
36 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --print_margin 1.3
38 Same --nup4, but draw lines marking printable-region margins, page quarts, spine margins:
39 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --analyze
43 For arguments like -p, page numbers are assumed to start with 1 (not 0, which is treated as an invalid page number value).
45 The target page shape so far is assumed to be A4 in portrait orientation; bookmaker.py normalizes all pages to this format before applying crops, and removes any source PDF /Rotate commands (for their production of landscape orientations).
47 For --nup4, the -c cropping instructions do not so much erase content outside the cropped area, but rather zoom into the page in a way that maximes the cropped area as much as possible into the available per-page area between printable-area margins and the borders to the other quartered pages. If the zoomed cropped area does not fit in neatly into its per-page area, this will preserve additional page content.
49 The --nup4 quartering puts pages into a specific order optimized for no-tumble duplex print-outs that can easily be folded and cut into pages of a small A6 book. Each unit of 8 pages from the source PDF is mapped thus onto two subsequent pages (i.e. front and back of a printed A4 paper):
58 To facilitate this layout, --nup4 also pads the input PDF pages to a total number that is a multiple of 8, by adding empty pages if necessary.
60 (To turn above double-sided example page into a tiny 8-page book: Cut the paper in two on its horizontal middle line. Fold the two halves by their vertical middle lines, with pages 3-2 and 7-6 on the folds' insides. This creates two 4-page books of pages 1-4 and pages 5-8. Fold them both closed and (counter-intuitively) put the book of pages 5-8 on top of the other one (creating a temporary page order of 5,6,7,8,1,2,3,4). A binding cut stencil should be visible on the top left of this stack – cut it out (with all pages folded together) to add the same inner-margin upper cut to each page. Turn around your 8-pages stack to find the mirror image of aforementioned stencil on the stack's back's bottom, and cut that out too. Each page now has binding cuts on top and bottom of its inner margins. Swap the order of both books (back to the final page order of 1,2,3,4,5,6,7,8), and you now have an 8-pages book that can be "bound" in its binding cuts through a rubber band or the like. Repeat with the next 8-pages double-page, et cetera. (Actually, with just 8 pages, the paper may curl under the pressure of a rubber band – but go up to 32 pages or so, and the result will become quite stable.)
67 def handled_error_exit(msg):
68 print(f"ERROR: {msg}")
74 handled_error_exit("Can't run at all without pypdf installed.")
76 # some general paper geometry constants
77 POINTS_PER_CM = 10 * 72 / 25.4
78 A4_WIDTH = 21 * POINTS_PER_CM
79 A4_HEIGHT = 29.7 * POINTS_PER_CM
80 A4 = (A4_WIDTH, A4_HEIGHT)
82 # constants specifically for --nup4
83 A4_HALF_WIDTH = A4_WIDTH / 2
84 A4_HALF_HEIGHT = A4_HEIGHT / 2
85 CUT_DEPTH = 1.95 * POINTS_PER_CM
86 CUT_WIDTH = 1.05 * POINTS_PER_CM
87 MIDDLE_POINT_DEPTH = 0.4 * POINTS_PER_CM
88 SPINE_LIMIT = 1 * POINTS_PER_CM
89 QUARTER_SCALE_FACTOR = 0.5
90 PAGE_ORDER_FOR_NUP4 = (3,0,7,4,1,2,5,6)
93 class HandledException(Exception):
97 parser = argparse.ArgumentParser(description=__doc__, epilog=help_epilogue, formatter_class=argparse.RawDescriptionHelpFormatter)
98 parser.add_argument("-i", "--input_file", action="append", required=True, help="input PDF file")
99 parser.add_argument("-o", "--output_file", required=True, help="output PDF file")
100 parser.add_argument("-p", "--page_range", action="append", help="page range, e.g., '2-9' or '3-end' or 'start-14'")
101 parser.add_argument("-c", "--crops", action="append", help="cm crops left, bottom, right, top – e.g., '10,10,10,10'; prefix with ':'-delimited page range to limit effect")
102 parser.add_argument("-r", "--rotate_page", type=int, action="append", help="rotate page of number by 90° (usable multiple times on same page!)")
103 parser.add_argument("-s", "--symmetry", action="store_true", help="alternate horizontal crops between odd and even pages")
104 parser.add_argument("-n", "--nup4", action='store_true', help="puts 4 input pages onto 1 output page, adds binding cut stencil")
105 parser.add_argument("-a", "--analyze", action="store_true", help="in --nup4, print lines identifying spine, page borders")
106 parser.add_argument("-m", "--print_margin", type=float, default=0.43, help="print margin for --nup4 in cm (default 0.43)")
107 return parser.parse_args()
109 def validate_inputs_first_pass(args):
110 for filename in args.input_file:
111 if not os.path.isfile(filename):
112 raise HandledException(f"-i: {filename} is not a file")
114 with open(filename, 'rb') as file:
115 pypdf.PdfReader(file)
116 except pypdf.errors.PdfStreamError:
117 raise HandledException(f"-i: cannot interpret {filename} as PDF file")
119 for p_string in args.page_range:
120 validate_page_range(p_string, "-p")
121 if len(args.page_range) > len(args.input_file):
122 raise HandledException("-p: more --page_range arguments than --input_file arguments")
124 for c_string in args.crops:
125 initial_split = c_string.split(':')
126 if len(initial_split) > 2:
127 raise HandledException(f"-c: cropping string has multiple ':': {c_string}")
128 page_range, crops = split_crops_string(c_string)
129 crops = crops.split(",")
131 validate_page_range(page_range, "-c")
133 raise HandledException(f"-c: cropping does not contain exactly three ',': {c_string}")
138 raise HandledException(f"-c: non-number crop in: {c_string}")
140 for r in args.rotate_page:
144 raise HandledException(f"-r: non-integer value: {r}")
146 raise HandledException(f"-r: value must not be <1: {r}")
148 float(args.print_margin)
150 raise HandledException(f"-m: non-float value: {arg.print_margin}")
152 def validate_page_range(p_string, err_msg_prefix):
153 prefix = f"{err_msg_prefix}: page range string"
154 if '-' not in p_string:
155 raise HandledException(f"{prefix} lacks '-': {p_string}")
156 tokens = p_string.split("-")
158 raise HandledException(f"{prefix} has too many '-': {p_string}")
159 for i, token in enumerate(tokens):
162 if i == 0 and token == "start":
164 if i == 1 and token == "end":
169 raise HandledException(f"{prefix} carries value neither integer, nor 'start', nor 'end': {p_string}")
171 raise HandledException(f"{prefix} carries page number <1: {p_string}")
175 start = int(tokens[0])
179 if start > 0 and end > 0 and start > end:
180 raise HandledException(f"{prefix} has higher start than end value: {p_string}")
182 def split_crops_string(c_string):
183 initial_split = c_string.split(':')
184 if len(initial_split) > 1:
185 page_range = initial_split[0]
186 crops = initial_split[1]
189 crops = initial_split[0]
190 return page_range, crops
192 def parse_page_range(range_string, pages):
194 end_page = len(pages)
196 start, end = range_string.split('-')
197 if not (len(start) == 0 or start == "start"):
198 start_page = int(start) - 1
199 if not (len(end) == 0 or end == "end"):
201 return start_page, end_page
203 def read_inputs_to_pagelist(args_input_file, args_page_range):
207 for i, input_file in enumerate(args_input_file):
208 file = open(input_file, 'rb')
209 opened_files += [file]
210 reader = pypdf.PdfReader(file)
212 if args_page_range and len(args_page_range) > i:
213 range_string = args_page_range[i]
214 start_page, end_page = parse_page_range(range_string, reader.pages)
215 if end_page > len(reader.pages): # no need to test start_page cause start_page > end_page is checked above
216 raise HandledException(f"-p: page range goes beyond pages of input file: {range_string}")
217 for old_page_num in range(start_page, end_page):
219 page = reader.pages[old_page_num]
220 pages_to_add += [page]
221 print(f"-i, -p: read in {input_file} page number {old_page_num+1} as new page {new_page_num}")
222 return pages_to_add, opened_files
224 def validate_inputs_second_pass(args, pages_to_add):
226 for c_string in args.crops:
227 page_range, _= split_crops_string(c_string)
229 start, end = parse_page_range(page_range, pages_to_add)
230 if end > len(pages_to_add):
231 raise HandledException(f"-c: page range goes beyond number of pages we're building: {page_range}")
233 for r in args.rotate_page:
234 if r > len(pages_to_add):
235 raise HandledException(f"-r: page number beyond number of pages we're building: {r}")
237 def rotate_pages(args_rotate_page, pages_to_add):
239 for rotate_page in args_rotate_page:
240 page = pages_to_add[rotate_page - 1]
241 page.add_transformation(pypdf.Transformation().translate(tx=-A4_WIDTH/2, ty=-A4_HEIGHT/2))
242 page.add_transformation(pypdf.Transformation().rotate(-90))
243 page.add_transformation(pypdf.Transformation().translate(tx=A4_WIDTH/2, ty=A4_HEIGHT/2))
244 print(f"-r: rotating (by 90°) page {rotate_page}")
246 def pad_pages_to_multiple_of_8(pages_to_add):
247 mod_to_8 = len(pages_to_add) % 8
249 old_len = len(pages_to_add)
250 for _ in range(8 - mod_to_8):
251 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
252 pages_to_add += [new_page]
253 print(f"-n: number of input pages {old_len} not required multiple of 8, padded to {len(pages_to_add)}")
255 def normalize_pages_to_A4(pages_to_add):
256 for page in pages_to_add:
257 if "/Rotate" in page: # TODO: preserve rotation, but in canvas?
258 page.rotate(360 - page["/Rotate"])
259 page.mediabox.left = 0
260 page.mediabox.bottom = 0
261 page.mediabox.top = A4_HEIGHT
262 page.mediabox.right = A4_WIDTH
263 page.cropbox = page.mediabox
265 def collect_per_page_crops_and_zooms(args_crops, args_symmetry, pages_to_add):
266 crops_at_page = [(0,0,0,0)]*len(pages_to_add)
267 zoom_at_page = [1]*len(pages_to_add)
269 for c_string in args_crops:
270 page_range, crops = split_crops_string(c_string)
271 start_page, end_page = parse_page_range(page_range, pages_to_add)
272 crop_left_cm, crop_bottom_cm, crop_right_cm, crop_top_cm = [float(x) for x in crops.split(',')]
273 crop_left = crop_left_cm * POINTS_PER_CM
274 crop_bottom = crop_bottom_cm * POINTS_PER_CM
275 crop_right = crop_right_cm * POINTS_PER_CM
276 crop_top = crop_top_cm * POINTS_PER_CM
277 prefix = "-c, -t" if args_symmetry else "-c"
278 suffix = " (but alternating left and right crop between even and odd pages)" if args_symmetry else ""
279 print(f"{prefix}: to pages {start_page + 1} to {end_page} applying crops: left {crop_left_cm}cm, bottom {crop_bottom_cm}cm, right {crop_right_cm}cm, top {crop_top_cm}cm{suffix}")
280 cropped_width = A4_WIDTH - crop_left - crop_right
281 cropped_height = A4_HEIGHT - crop_bottom - crop_top
283 zoom_horizontal = A4_WIDTH / (A4_WIDTH - crop_left - crop_right)
284 zoom_vertical = A4_HEIGHT / (A4_HEIGHT - crop_bottom - crop_top)
285 if (zoom_horizontal > 1 and zoom_vertical < 1) or (zoom_horizontal < 1 and zoom_vertical > 1):
286 raise HandledException("-c: crops would create opposing zoom directions")
287 elif zoom_horizontal + zoom_vertical > 2:
288 zoom = min(zoom_horizontal, zoom_vertical)
290 zoom = max(zoom_horizontal, zoom_vertical)
291 for page_num in range(start_page, end_page):
292 if args_symmetry and page_num % 2:
293 crops_at_page[page_num] = (crop_right, crop_bottom, crop_left, crop_top)
295 crops_at_page[page_num] = (crop_left, crop_bottom, crop_right, crop_top)
296 zoom_at_page[page_num] = zoom
297 return crops_at_page, zoom_at_page
299 def build_single_pages_output(writer, pages_to_add, crops_at_page, zoom_at_page):
300 print("building 1-input-page-per-output-page book")
302 for i, page in enumerate(pages_to_add):
303 crop_left, crop_bottom, crop_right, crop_top = crops_at_page[i]
304 zoom = zoom_at_page[i]
305 page.add_transformation(pypdf.Transformation().translate(tx=-crop_left, ty=-crop_bottom))
306 page.add_transformation(pypdf.Transformation().scale(zoom, zoom))
307 cropped_width = A4_WIDTH - crop_left - crop_right
308 cropped_height = A4_HEIGHT - crop_bottom - crop_top
309 page.mediabox.right = cropped_width * zoom
310 page.mediabox.top = cropped_height * zoom
311 writer.add_page(page)
312 odd_page = not odd_page
313 print(f"built page number {i+1} (of {len(pages_to_add)})")
315 def resort_pages_for_nup4(pages_to_add):
321 for page in pages_to_add:
328 for n in PAGE_ORDER_FOR_NUP4:
329 new_i_order += [8 * n_eights + n]
330 new_page_order += [eight_pack[n]]
332 return new_page_order, new_i_order
334 def nup4_inner_page_transform(page, crops, zoom, bonus_shrink_factor, printable_margin, printable_scale, nup4_position):
335 crop_left, crop_bottom, crop_right, crop_top = crops
336 page.add_transformation(pypdf.Transformation().translate(ty=(A4_HEIGHT / zoom - (A4_HEIGHT - crop_top))))
337 if nup4_position == 0 or nup4_position == 2:
338 page.add_transformation(pypdf.Transformation().translate(tx=-crop_left))
339 elif nup4_position == 1 or nup4_position == 3:
340 page.add_transformation(pypdf.Transformation().translate(tx=(A4_WIDTH / zoom - (A4_WIDTH - crop_right))))
341 page.add_transformation(pypdf.Transformation().scale(zoom * bonus_shrink_factor, zoom * bonus_shrink_factor))
342 if nup4_position == 2 or nup4_position == 3:
343 page.add_transformation(pypdf.Transformation().translate(ty=-2*printable_margin/printable_scale))
345 def nup4_outer_page_transform(page, bonus_shrink_factor, nup4_position):
346 page.add_transformation(pypdf.Transformation().translate(ty=(1-bonus_shrink_factor)*A4_HEIGHT))
347 if nup4_position == 0 or nup4_position == 1:
348 y_section = A4_HEIGHT
349 page.mediabox.bottom = A4_HALF_HEIGHT
350 page.mediabox.top = A4_HEIGHT
351 if nup4_position == 2 or nup4_position == 3:
353 page.mediabox.bottom = 0
354 page.mediabox.top = A4_HALF_HEIGHT
355 if nup4_position == 0 or nup4_position == 2:
357 page.mediabox.left = 0
358 page.mediabox.right = A4_HALF_WIDTH
359 if nup4_position == 1 or nup4_position == 3:
360 page.add_transformation(pypdf.Transformation().translate(tx=(1-bonus_shrink_factor)*A4_WIDTH))
362 page.mediabox.left = A4_HALF_WIDTH
363 page.mediabox.right = A4_WIDTH
364 page.add_transformation(pypdf.Transformation().translate(tx=x_section, ty=y_section))
365 page.add_transformation(pypdf.Transformation().scale(QUARTER_SCALE_FACTOR, QUARTER_SCALE_FACTOR))
367 def ornate_nup4(writer, args_analyze, is_front_page, new_page, printable_margin, printable_scale, bonus_shrink_factor, canvas_class):
370 packet = io.BytesIO()
371 c = canvas_class(packet, pagesize=A4)
373 c.line(0, A4_HEIGHT, A4_WIDTH, A4_HEIGHT)
374 c.line(0, A4_HALF_HEIGHT, A4_WIDTH, A4_HALF_HEIGHT)
375 c.line(0, 0, A4_WIDTH, 0)
376 c.line(0, A4_HEIGHT, 0, 0)
377 c.line(A4_HALF_WIDTH, A4_HEIGHT, A4_HALF_WIDTH, 0)
378 c.line(A4_WIDTH, A4_HEIGHT, A4_WIDTH, 0)
380 new_pdf = pypdf.PdfReader(packet)
381 new_page.merge_page(new_pdf.pages[0])
382 printable_offset_x = printable_margin
383 printable_offset_y = printable_margin * A4_HEIGHT / A4_WIDTH
384 new_page.add_transformation(pypdf.Transformation().scale(printable_scale, printable_scale))
385 new_page.add_transformation(pypdf.Transformation().translate(tx=printable_offset_x, ty=printable_offset_y))
386 x_left_spine_limit = A4_HALF_WIDTH * bonus_shrink_factor
387 x_right_spine_limit = A4_WIDTH - x_left_spine_limit
388 if args_analyze or is_front_page:
389 packet = io.BytesIO()
390 c = canvas_class(packet, pagesize=A4)
394 c.line(x_left_spine_limit, A4_HEIGHT, x_left_spine_limit, 0)
395 c.line(x_right_spine_limit, A4_HEIGHT, x_right_spine_limit, 0)
398 draw_cut(c, x_left_spine_limit, (1))
399 draw_cut(c, x_right_spine_limit, (-1))
400 if args_analyze or is_front_page:
402 new_pdf = pypdf.PdfReader(packet)
403 new_page.merge_page(new_pdf.pages[0])
405 def draw_cut(canvas, x_spine_limit, direction):
406 outer_start_x = x_spine_limit - 0.5 * CUT_WIDTH * direction
407 inner_start_x = x_spine_limit + 0.5 * CUT_WIDTH * direction
408 middle_point_y = A4_HALF_HEIGHT + MIDDLE_POINT_DEPTH * direction
409 end_point_y = A4_HALF_HEIGHT + CUT_DEPTH * direction
410 canvas.line(inner_start_x, A4_HALF_HEIGHT, x_spine_limit, end_point_y)
411 canvas.line(x_spine_limit, end_point_y, x_spine_limit, middle_point_y)
412 canvas.line(x_spine_limit, middle_point_y, outer_start_x, A4_HALF_HEIGHT)
416 validate_inputs_first_pass(args)
419 from reportlab.pdfgen.canvas import Canvas
421 raise HandledException("-n: need reportlab.pdfgen.canvas installed for --nup4")
422 pages_to_add, opened_files = read_inputs_to_pagelist(args.input_file, args.page_range)
423 validate_inputs_second_pass(args, pages_to_add)
424 rotate_pages(args.rotate_page, pages_to_add)
426 pad_pages_to_multiple_of_8(pages_to_add)
427 normalize_pages_to_A4(pages_to_add)
428 crops_at_page, zoom_at_page = collect_per_page_crops_and_zooms(args.crops, args.symmetry, pages_to_add)
429 writer = pypdf.PdfWriter()
431 build_single_pages_output(writer, pages_to_add, crops_at_page, zoom_at_page)
433 print("-n: building 4-input-pages-per-output-page book")
434 print(f"-m: applying printable-area margin of {args.print_margin}cm")
436 print("-a: drawing page borders, spine limits")
437 printable_margin = args.print_margin * POINTS_PER_CM
438 printable_scale = (A4_WIDTH - 2 * printable_margin)/A4_WIDTH
439 spine_part_of_page = (SPINE_LIMIT / A4_HALF_WIDTH) / printable_scale
440 bonus_shrink_factor = 1 - spine_part_of_page
441 pages_to_add, new_i_order = resort_pages_for_nup4(pages_to_add)
445 for j, page in enumerate(pages_to_add):
447 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
448 new_i = new_i_order[j]
449 nup4_inner_page_transform(page, crops_at_page[new_i], zoom_at_page[new_i], bonus_shrink_factor, printable_margin, printable_scale, i)
450 nup4_outer_page_transform(page, bonus_shrink_factor, i)
451 new_page.merge_page(page)
453 print(f"merged page number {page_count} (of {len(pages_to_add)})")
456 ornate_nup4(writer, args.analyze, is_front_page, new_page, printable_margin, printable_scale, bonus_shrink_factor, Canvas)
457 writer.add_page(new_page)
459 is_front_page = not is_front_page
460 for file in opened_files:
462 with open(args.output_file, 'wb') as output_file:
463 writer.write(output_file)
466 if __name__ == "__main__":
469 except HandledException as e:
470 handled_error_exit(e)